Announcing Collate 1.10

··

8 min read

Cover Image for Announcing Collate 1.10

We’re excited to announce Collate 1.10, the latest release of our managed OpenMetadata service. This release delivers new capabilities to help teams accelerate troubleshooting, improve team accountability, enhance security, and more. Impact Analysis gives you comprehensive upstream and downstream visibility before making changes, reducing production incidents and speeding up troubleshooting. Enhanced Data Contracts now include Service Level Agreements, Terms of Service, and Security specifications, establishing clear accountability between data producers and consumers while enabling machine-readable enforcement. Together with SDK 2.0, Metadata Exporter, streamlined SSO configuration, Drive support, and other platform enhancements, this release empowers your organization to scale governance without slowing down innovation.

Impact Analysis & Enhanced Lineage

The Impact Analysis interface provides a comprehensive visualization of upstream and downstream dependencies across your lineage. Quickly identify impacted assets with stakeholder and domain information to immediately know who to contact when making schema changes, addressing quality issues, or planning other table modifications.

  • Column-level granularity: Assess impact at the table level or the individual column level with visual node depth tracking

  • Customizable details: Track details across owners, domains, tiers, tags, glossary terms, descriptions, and more

  • Search and filter: Quickly find assets and columns, and isolate scope across a variety of filter criteria

  • Export and share: Download impact analysis as CSV or PNG for offline review, sharing, and presentations

Why this matters: Data teams need to understand the downstream impacts before making changes, often discovering issues only after problems have occurred. Now, teams can proactively assess changes to tables, ML models, and dashboards from any source table. This reduces production incidents, speeds up troubleshooting, and enables more confident operations.

Data Contract Improvements

Our industry-leading data contract feature adds three new specifications: Service Level Agreements, Terms of Service, and Security. Combined with the existing Schema, Semantic, and Quality specifications, these contracts deliver better alignment and collaboration between data producers and consumers. The UI has been redesigned to match a legal contract, providing PDF and print-friendly export options for data teams. Additionally, these contracts have been built to be machine-readable. They can inform AI agents about contract compliance and enforcement through lineage analysis, such as whether customer data is being used in the training of downstream ML models.

  • Terms of Service (TOS): Define data usage policies, GDPR implications, customer consent requirements, and AI agent usage guidelines

  • Security: Specify security requirements, including data classification, access policies, and row filters

  • Service Level Agreements (SLA): Ensure data freshness, with automatic anomaly detection based on data refresh frequency

Why this matters: Data reliability issues aren't just technical problems; they're business problems between data teams that erode trust and cause costly downtime. Collate’s Data Contracts solve the fundamental challenge that has limited other implementations: the need for both cross-functional collaboration and automated enforcement. These enhancements strengthen the standards of accountability for data teams.

SDK 2.0

The SDK has been redesigned for even greater usability, featuring a fluent API design that utilizes Java builder patterns for intuitive method chaining and simplified operations. Abstracted HTTP methods, selective field retrieval, and async support for large-scale operations make integration dramatically easier. The Python SDK will receive similar enhancements in future releases.

  • Simplified setup: Two-line client initialization with server URL and token, no complex configuration required

  • Builder patterns: Intuitive method chaining for creating database services, schemas, and tables

  • Bulk operations: CSV import/export for glossaries with dry run capabilities and async support with WebSocket notifications

  • Smart data retrieval: Selective field fetching (include/exclude options) reduces bandwidth and improves performance

  • Built-in safety: Delete confirmation patterns prevent accidental data loss

Why this matters: Developers can struggle with API operations, which require an understanding of HTTP methods and manual state management for large imports. The new SDK abstracts complexity away and adds safety features, enabling developers to integrate OpenMetadata into their workflows with minimal code while maintaining enterprise-grade capabilities.

Enhanced SSO Setup & Configuration

Easily configure SSO providers entirely within the Collate UI with embedded documentation and step-by-step guides. The inline configuration eliminates the need for external navigation and back-and-forth communication with DevOps teams. When combined with Collate’s SCIM support, teams can automate user provisioning and group mapping for integrated access management.

  • Multiple provider support: Includes Okta, Azure, Google, OneLogin, Auth0, Amazon Cognito, Keycloak, and Custom OIDC

  • Single source of truth: Corporate directory becomes the authority for user access

  • Unified user management: Automatic user provisioning and deprovisioning synced with corporate directory (Collate SCIM)

  • Group-based permissions: Map organizational groups (e.g., "Finance Team") to Collate roles (e.g., "Data Owners") automatically (Collate SCIM)

Why this matters: SSO is an industry best practice for centralized user access and authorization. Setting it up can require coordination between different teams and systems, creating delays and configuration errors. The streamlined SSO process enables teams to implement enterprise authentication in minutes, rather than days, while SCIM ensures that new employees receive immediate access and departing employees automatically lose access.

Metadata Exporter

Send the metadata generated in Collate directly to your data warehouse of choice with the new Collate Metadata Exporter. This enables custom analytics on your metadata and dashboard creation with your preferred BI tool. This Collate add-on supports data quality and profiling metadata export, with full metadata access coming in a future release.

  • Drive accountability with governance KPIs: Run analytics on PII coverage, description completeness, active users, and other metrics to drive organizational change

  • Dashboard flexibility: Leverage the native Collate Data Insights dashboards, or directly access the data with BI tools like Tableau, Power BI, or Looker

  • Warehouse integration: Export to your existing data warehouse, including Snowflake, Databricks, or BigQuery

  • Works with Collate Hybrid Runner: Export metadata to your warehouse without your data ever leaving your environment, meeting strict security requirements

Why this matters: Collate provides native Data Insights dashboards; however, data teams may wish to combine Collate metadata with other data sources or utilize existing BI tooling. Aligned with our ethos of openness and interoperability, teams can now ship metadata to their chosen warehouse and build the BI reports and KPIs that matter to their organization.

Drive Service Support

Connect to Drive services to manage key business metadata from spreadsheets and other files. Collate automatically parses spreadsheet tabs, identifies table columns, and detects data types. Lineage integration shows how file data flows upstream and downstream through pipelines. Google Sheets is initially supported, with additional Drive services planned for the future.

  • Automatic schema inference: Detects data types and table structures from spreadsheet columns without manual configuration

  • Multi-file support: Ingest metadata from spreadsheets, presentations, documents, PDFs, and more

  • Directory structure support: Full folder hierarchy ingestion and organization

Why this matters: Critical business data often lives in spreadsheets and documents scattered across drive services, creating blind spots in data governance and lineage tracking. Drive support brings these assets into Collate, enabling complete visibility into how business users create and share data, even when it exists outside traditional data platforms.

Additional Platform Enhancements

Spark Profiler UI: Users can choose to deploy data quality and profiling pipelines to Spark infrastructure through the updated pipeline creation interface. This enables enterprises to improve performance and scalability for large-scale data quality operations.

Improved Domains & Data Products User Experience: The redesigned interface offers both list and card views with customizable navigation hierarchy, allowing data products to be elevated as top-level navigation or nested under domains with dropdown access. Administrators can tailor the look and feel based on organizational preferences and user personas. This makes domains & data products more accessible to business stakeholders while maintaining flexibility for technical teams.

Landing Page Enhancements: The widget library has been expanded with a new data product widget, an enhanced data quality widget that allows users to follow tests beyond owned assets, and an improved curated assets widget that supports additional asset types, such as the Knowledge Center. Persona selection and management capabilities have also been refined. This provides even more flexibility for users to personalize the information most critical to their role and to improve their workflow efficiency.

Data Quality UI Improvements: The redesign of the data quality interface has been extended to the entity details page. This modernization of the UX improves usability and information architecture.

Lineage-Based Alert Propagation: Automatically notifies downstream asset owners when upstream data quality tests fail. If Asset A is upstream of Assets B and C, the owners of B and C are automatically notified about Asset A test case failures. Users can toggle this feature on/off and configure the depth for how far the propagation extends. This capability enables proactive communication when upstream data issues occur, reducing downstream incidents and failures.

Profiler Cardinality: New cardinality metric support has been added for string-type fields so that users can understand value distribution. This can reveal patterns, such as identifying which enum values appear most frequently or uncovering unexpected values in constrained fields. This helps data engineers and analysts quickly spot data anomalies for debugging and quality validation.

Hybrid Runner Enhancements: The setup process has been streamlined for faster deployment and simplified permissions requirements. This re-architecture reduces time and effort, while maintaining enterprise-grade security and scalability.

New Connectors: Added support for ServiceNow, Snowplow, as well as TimescaleDB , expanding metadata collection capabilities across workflow management, event analytics, and time-series database platforms. Connectors are a key investment area for Collate, ensuring organizations have complete visibility into metadata across their entire data landscape.

Breaking Changes

  • Ingestion Framework: All workflows have an integrated workflow.print_status() inside the workflow.execute() call. This change was necessary to improve the handling of logger lifecycles. If you’re using the Ingestion Framework directly, manage workflows via the usual process:

workflow_config = yaml.safe_load(CONFIG)

workflow = MetadataWorkflow.create(workflow_config)

workflow.execute()

workflow.raise_from_status()

workflow.print_status() # Not necessary anymore

workflow.stop()

You can now remove the print_status() call. Note that the only side effect would be temporarily getting duplicated summary logs.

  • Changed field from status to entityStatus for glossaryTerm and dataContract, as we introduced it for different data assets. For Data Contracts, the value also changed from Active to Approved.

Ready to get started? Sign up for the Collate Free Tier of our managed OpenMetadata service, or visit the Product Sandbox to try out the product with demo data.

;