Announcing Collate 1.9

·Sep 8, 2025·

8 min read

Building on the robust Model Context Protocol (MCP) integration introduced in version 1.8, we've continued focusing on delivering features needed by modern data teams. Today, we’re announcing Collate 1.9 of our managed OpenMetadata service that addresses the biggest challenges facing data teams today. This release introduces industry-leading data contracts that improve team collaboration between data producers and consumers, flexible data governance options for distributed ownership, and enhanced experiences across landing pages, data quality tests, service insights, and glossary management. We’ve also improved platform scalability, expanded connector support, optimized search performance, and more — all designed to help modern data organizations better manage their data.

https://www.youtube.com/watch?v=vAwEIOqqaXw

What's New in Collate 1.9

Data Contracts: Formalizing Data Agreements

Keeping data producers and consumers aligned has always been challenging. Poor handoffs often lead to breaking changes, unclear ownership, and late-night firefighting. Collaborative Data Contracts solve this by formalizing agreements between parties about what to expect from data assets, including

Schema validations: Prevent breaking schema changes by ensuring specific columns are available between upstream and downstream assets
Semantic validations: Eliminate unclear ownership and asset ambiguity by defining custom metadata rules, such as requiring ownership, domain, or description information
Quality validations: Stop bad data from cascading downstream with no-code or SQL tests to check technical and business rules across tables and columns

All Data Contracts are automatically validated daily, with clear status visibility during each execution. Users can also run validations after making changes to ensure no agreements are breached. Contracts are built using an open and extensible JSON schema that integrates with modern DevOps workflows and ensures interoperability across different systems.

Data Contracts are currently available in the UI and via YAML for Tables, with support for other data assets coming in future releases, along with SLA and security information.

Why this matters: Data contracts help prevent the cycle of broken pipelines and emergency fixes caused by poor collaboration between data teams. By establishing clear agreements around the schema, semantics, and quality of data, organizations can reduce downtime and improve data consumer productivity.

https://www.youtube.com/watch?v=thLYeMx2sxs

Multi-Domain Support & Data Asset Rules

Collate's Domains were built on Data Mesh foundations, but some teams need greater flexibility for managing distributed ownership. Collate 1.9 introduces two key changes:

Multi-domain entities: All entities now support multiple domains instead of just one, available through API, SDKs, and UI
Configurable rules: While single-domain remains the default, admins can enable multi-domain support through platform preferences, giving teams the option for one-to-one or one-to-many relations based on organizational needs.

Organizations now have the option of adopting stricter data mesh principles for data ownership, or more flexible models for aligning organization and access controls to their business structure.

This update is part of a larger set of new data asset rules that provide granular validations for users, teams, domains, data products, and glossary terms. Organizations can adopt industry best practice 1:1 relationships, or opt for more flexibility based on their specific needs. For example, admins can configure assets to be allowed in only one data product, or allow more flexibility to be in many data products. These settings ensure governance compliance, and assets that don’t follow the selected rules cannot be created or updated.

Why this matters: Every organization has different data needs and may have data that flows through different groups and business units. The flexibility to adhere to strict single-domain data mesh principles or flexible multi-domain ownership gives organizations the option to scale their governance approach to their unique culture.

https://www.youtube.com/watch?v=az6UtZ1jx3Q

Landing Page Revamp

We've redesigned the landing page layout and enhanced it with persona-based customization to guide users to find their most relevant data assets more quickly, while also providing an easier overview of the data landscape’s overall scope and health.

The landing page can be customized by persona to surface the most relevant content for each team member—data engineers can see pipelines and assets, while business and data analysts access articles and metrics
Explore a library of pre-built widgets, including asset types, KPIs, domains, tasks, activity feed, articles, pipeline status, owned/following assets, and more
Custom-built widgets can display specific assets based on defined criteria, such as “Show all dashboards, in the Marketing Domain, that are Tier 1” to help you focus on the assets that are most important to you.
Pick up where you left off with recently visited assets to resume ongoing work more easily.
Centralized search for faster discovery across databases, dashboards, pipelines, and more

Beyond the landing page, this update is part of our comprehensive persona-based platform experience, which allows full customization of navigation bars, panels, and tabs to be added, removed, and reorganized based on the different needs of data team members. Organizations can set default personas to enable centralized control over customization.

Why this matters: The persona-driven landing page dramatically improves accessibility for non-technical users while ensuring data engineers, analysts, and business users can quickly access the tools and insights most relevant to their workflows. This reduces time-to-discovery and increases user engagement across all data team roles.

https://www.youtube.com/watch?v=uDFabrGBSf4

Data Quality UI Improvements

This release brings a complete Data Quality UI redesign focused on:

More intuitive test creation and visualization directly from your data assets and data quality dashboards
Streamlined processes for fewer clicks required to deploy tests to make it easier for users to get started
Unified data quality test & dashboard look and feel with the platform UX from release 1.8

Why this matters: Our goal is to make it accessible for data users of all backgrounds to participate in the shared ownership of data quality. Making it even easier to create data quality tests enhances data democratization, and leads to improved quality coverage and greater business trust in data.

https://www.youtube.com/watch?v=RHACCOKB7bo

Service Insights Live Updates

The insights tab on service details pages now provides real-time updates when AutoPilot workflows are running, including:

Live coverage percentages for descriptions, PII tags, tier classifications, ownership, and data asset health
Real-time status updates for AutoPilot agents and Collate AI agents
Live monitoring of workflow instances showing progress of automated classification, tiering, and data quality processes

Why this matters: This real-time visibility helps users better understand the progress and work completed by agents and workflows. This eliminates guesswork and builds confidence in the automation, which leads to more reliable data operations.

https://www.youtube.com/watch?v=YvzTrdme3AY

Spark Integration for Profiler

For organizations with large datasets, Profiler and Data Quality workflows can experience long execution times due to limitations in the source system. The new Spark integration allows Collate customers to leverage existing Spark infrastructure for profiling and data quality at scale.

Version 1.9 supports workflow management via YAML configuration, with full UI support planned for the next release.

Why this matters: This integration enhances enterprise-scale data profiling capabilities, with organizations now able to profile large data sets faster without impacting operational performance.

Apache Ranger Sink for Reverse Metadata

Organizations centralizing policy management in Apache Ranger can now configure it as the destination for metadata changes, allowing governance and compliance teams to apply all policies directly in Collate. This integration eliminates policy management silos by ensuring metadata remains synced across systems. For example, auto-classify PII in Collate, sync PII tags to Apache Ranger, and automatically apply governance policies like masking and access controls.

This Ranger integration extends our support for centralized governance alongside other platforms like Snowflake, Redshift, BigQuery, and more.

Why this matters: Data governance silos between disconnected systems can cause compliance gaps and manual metadata processes. This integration ensures consistency in policy enforcement across systems while reducing administrative overhead.

https://www.youtube.com/watch?v=x4BvgSMitL0

Glossary Workflow History

The introduction of the Glossary Approval Workflow brought transparency and flexibility to users, letting them better understand what steps the platform had taken when a glossary term was created or updated. We wanted to provide even better visibility into the current status and history of how a glossary term had changed.

In this release, we are introducing a new component in the Glossary UI for workflow progress, failures, and task creation. This audit trail provides users with a comprehensive history of changes, approvals, and task assignments to ensure compliance and accountability.

Why this matters: This enhanced interface improves visibility around glossary workflow progress, as well as tracking of compliance and audit requirements.

https://www.youtube.com/watch?v=3H191jylyPo

Search Reindex Auto Tune

The reindexing process now includes automatic and intelligent configuration, making it easier for admins while ensuring optimal performance across different deployment sizes.

Why this matters: This removes the guesswork of tuning for admins and ensures users have a reliable search experience.

New Connectors

Collate 1.9 adds four new connectors to expand integration capabilities:

ThoughtSpot: Agentic analytics platform support
Google Sheets: New Drive Service UI (coming in an upcoming release)
Epic FHIR: Standard support for health record systems
Grafana: Monitoring and dashboard visualization

Why it matters: Comprehensive connector coverage across your data landscape ensures you can manage all your data assets, which is essential for making sure key data isn’t hidden from discovery, observability, and governance processes.

Breaking Changes

Multi-Domain Support: All entities now support multiple domains. The domain field has been renamed to domains and is now modeled as a list instead of a single domain.

If you're using the API or SDK, update your code to use the new domains field. The patch_domain implementation has a new signature to support this change.

While schema and APIs are updated, multi-domain support is disabled by default. To allow assets to belong to multiple domains, go to Settings > Preferences > Data Asset Rules and disable the "Multiple Domains are not allowed" rule.

Ready to get started? Sign up for the Collate Free Tier of our managed OpenMetadata Service, or visit the Product Sandbox to try out the product out with demo data.

James Nguyen

Sep 8, 2025·

8 min read