Back

Managed OpenMetadata from Collate: 2024 Year in Review

··

11 min read

Cover Image for Managed OpenMetadata from Collate: 2024 Year in Review

Throughout 2024, Collate made incredible progress bringing new capabilities to our customers and to the OpenMetadata open source community. We’ve shipped new features and improvements to accelerate AI automation for data discovery, observability, and governance.

"Our vision at Collate is to transform the way data teams work together with AI. We’ve had incredible growth with data platform teams, governance teams, and data leaders looking to change how they work with their data, across cloud-native startups and global enterprises looking to modernize their data engineering practices and culture with metadata.” noted Sriharsha Chintalapani, CTO of Collate.

In this post, we'll review some of the new capabilities and Collate-specific enhancements made this year, and how these improvements can help advance or initiate your organization's data strategy.

Collate AI

At Collate, we see an incredible opportunity to make life easier for every data stakeholder by tapping into the power of Generative AI. In 2024, we took a significant leap forward by rolling out a suite of AI-powered features that strengthen data governance, data discovery, and data observability—particularly by simplifying how data assets get documented.

Collate has quickly become indispensable to data scientists, engineers, and business users thanks to its rich knowledge graph that powers critical workflows like discovering relevant data assets, ensuring data quality, and maintaining trust in certified datasets. Building on this foundation, Collate AI capabilities are able to feature that taps into this deeper knowledge context, and address key workflows like automatic documentation, natural language querying, query optimization, anomaly detection, and more.

Automatic Data Documentation

Comprehensive and up-to-date documentation is critical for every organization looking to understand, leverage, and govern its data assets effectively. Yet, manually creating and maintaining data documentation is often time-consuming, error-prone, and can quickly become a bottleneck. With Collate AI’s automated documentation generation, teams can easily produce descriptive overviews of their entire data landscape at the click of a button.

Natural language query

CollateAI includes a chatbot for non-technical users to generate SQL queries by asking questions in natural language. This feature democratizes access to data insights and reduces the burden on technical teams.

Copilot query optimization

Collate AI assists SQL users with query building, refinement, table joins, relationships, and performance optimization. It provides guidance and recommendations to generate insights faster and optimize query efficiency.

Anomaly Detection for Data Quality

Collate provides a rich set of native capabilities for data quality, including no-code and SQL test cases, incident management, and alerting & notifications. However, understanding your data's behavior can require knowledge of its business context or technical specifications that data practitioners may not have, and this only becomes more complex as data evolves.

Release 1.5 introduced a new Anomaly Detection AI for data quality. The platform will learn the patterns of your data and dynamically assess spikes or drops out of bounds from the normal behavior of your data. Instead of updating your tests based on when your business grows, Collate will automatically adapt to the data, ensuring continuous and accurate monitoring of data quality.

These new capabilities help improve data reliability and trust in data, while reducing manual work for data teams. Instead of having to define all the different scenarios, anomaly detection can develop the pattern matching, and evolve it over time with changing data trends.

Collate Free Tier

Collate stands out as the first and only metadata platform offering a robust free managed service, making enterprise-grade data governance accessible to companies of all sizes. We've observed startups and companies with smaller data estates struggling with data management even at modest scales, since these problems appear for any organization trying to use data in any meaningful way. Early-stage companies have a unique opportunity to establish robust data practices from the ground up. Rather than waiting until problems compound and manual processes break, organizations can instill a strong data culture and build data management discipline from day one.

This proactive approach helps teams maintain control over their data assets by implementing essential practices that can scale as their data scales: proper documentation, clear lineage tracking, robust governance frameworks, regulatory PII compliance, and comprehensive data quality testing. Whether you're a growing startup or a small team, you can leverage the Collate platform to build a solid data foundation that scales with your business. You can start today for free by signing up for Collate Free Tier.

Data Governance

Automated Glossary Approval Workflows

Organizations often struggle with data governance due to rigid, pre-defined manual workflows. In Collate 1.6, Glossary Approval Workflows were migrated to a new automated framework.

Users can now create Custom Approval Processes with specific conditions and rules and easily visualize them through intuitive workflow UI. You can also create smart approval processes for glossary terms with real-time state changes and task creation to save time and streamline work.

Future Custom Governance Workflows

Our vision is to build additional automated workflows on this framework and expand it to allow our users to create their own Custom Governance Workflows. We want to enable data teams to implement and automate data governance processes that perfectly fit your organization, promoting data quality and compliance.

Knowledge Center

In the 1.2 release, we introduced the Knowledge Center, designed to enhance your data documentation and centralize tribal knowledge. Instead of users jumping between different tools or wikis to learn about data architecture, getting started, or other processes, all the relevant information about data can be kept in a single place.

We added support for hierarchical pages, making it easier to organize your articles. We also simplified how to associate knowledge articles with data assets, which gains even more value since we display the related articles of a data asset on the asset pages. Finally, we improved the search support for Knowledge Center, allowing filtering by Owner or Tags and showcasing articles and Quicklinks previews.

Metadata Automations

Managing data at scale can be challenging and resource-intensive, requiring consistent documentation, glossary terms, tags, and ownership updates. Open source OpenMetadata simplifies this with its API-based structure, enabling teams to automate these tasks seamlessly. In Collate release 1.4, we extended this with the introduction of Metadata Automations—a no-code framework that lets users quickly build workflows directly from the UI. These workflows can add owners, tiers, domains, descriptions, glossary terms, and more, as well as propagate these attributes using column-level lineage.

Bulk Upload Data Assets

Collate has made it easier for data governance teams by allowing the import of glossary terms. This feature helps manage glossaries within the platform more efficiently. Collate expanded its capabilities to include importing databases, schemas, and tables. This enhancement simplifies updating descriptions, ownership details, tags, and other metadata for many tables, schemas, and databases directly from the user interface.

Customizable Data Insights

Collate recognizes that understanding your data is crucial for improving organization's data culture, by using KPIs to track critical metrics such as documentation and ownership coverage. In version 1.5, users could now create their own custom insights dashboards, providing data governance teams and data leadership more visibility to drive initiatives and hold data teams accountable.

Default data insights dashboards have long been available in Collate, covering a wide range of KPIs, such as data asset growth rate, data usage reports, user activity, description coverage, and more. By creating visibility into these health metrics, leaders can drive data culture changes for every user on the data platform. These reports improve overall data hygiene and stewardship, though different teams may have different needs with different metrics. With these new customizable data insights dashboards, teams can tailor these reports for their specific requirements.

Data Discovery

Automated Workflows for Bronze, Silver, & Gold Data Certification

Collate 1.6 also leverages the new framework for a new Data Certification Workflow, allowing you to define your organization's rules to certify your data as Bronze, Silver, or Gold. Certified assets are a great way to help users discover the right data and inform them which data has been properly curated.

Landing Page Widgets

The OpenMetadata Landing Page is customizable with widgets like Activity Feeds, My Data, and Data Insights, giving users the flexibility to view important information upon logging in. In the Collate 1.4 release, we added to the landing page a new Data Quality Widget, allowing users to see the performance of their test cases.

Entity Relationship (ER) Diagrams

Understanding complex database schemas can be challenging without clear visualizations. Collate 1.6 introduced ER diagrams as a new feature to enable you to:

  • Understand the big picture: Quickly see how tables are connected to other tables within your database based on constraints like primary and foreign keys.

  • Discover hidden connections: Navigate through your data assets, jumping across tables to uncover relationships and gain valuable insights.

  • Take control of your data map: Use the built-in UI editor to modify existing relationships or add new ones, ensuring your ER diagrams are always accurate and up-to-date.

Understand and manage your data better with ER diagrams to visualize how your database tables relate to each other.

Data Observability

Data Observability Dashboards

Data leaders across different teams need visibility into the quality and health of the data across the data estate. Collate has been a strong pillar for data quality implementations, with its ability to create no-code tests from the UI, native observability alerts, and Incident Manager.

In 1.6, we introduced organization-wide observability dashboards that allow you to track overall data quality coverage trends and analyze incident response performance across your entire data estate while still being able to drill into per-table-level data quality insights for troubleshooting. Additionally, enhanced asset and lineage views help identify upstream root causes and enable proactive data quality management.

Data Quality Dashboard

Losing trust in your data hurts credibility and productivity, and Collate tackles this problem by allowing both technical and non-technical users to collaborate on creating data quality tests directly from the UI. The new data quality dashboard organizes tests into different groups, making it easier to understand the data quality coverage of your tables and the potential impact of each test failure.

The dashboard and tests make it easier to ensure the quality of your data across different dimensions:

  • Integrity: Validate the data to ensure it remains correct throughout transformation processes, such as checking the number of rows or seeing if a critical column is still present.

  • Accuracy: Guarantee that data represents reality and is a trustworthy source of information—for example, data freshness or ensuring that the number of orders stays above 0.

  • Completeness: Check if essential data is missing.

  • Uniqueness: Validate records do not appear more than once.

  • Validity: Ensure data follows company business rules, such as by creating regular expressions to check emails or phone numbers.

  • Consistency: Ensure different representations of the same data match each other across different tables.

  • SQL: Run business rules and technical validations that are written using custom SQL queries.

By lowering the entry barrier to implementing data quality, more data practitioners can contribute their technical understanding and business knowledge to ensure the shape, structure, and reliability of important data. These shared responsibility and collaboration workflows help bring data teams together to reduce friction and increase productivity. In addition, Collate goes beyond creating tests to make metadata more actionable, with observability alerts and the Incident Manager help bring your teams together to resolve any ongoing issues.

Data Freshness Test

Working with stale data can lead to bad decision making and business risk. With the new freshness data quality test, you can validate that the data comes from a defined time window. For example, if data arrives late due to an integration issue or scheduling problem, this test can catch these issues and prevent the old data from causing issues downstream. Additionally, by combining these tests with lineage information and the Incident Manager, your team can quickly detect issues related to missing data or stuck pipelines.

Data Quality Sample Rows

Open source OpenMetadata’s Data Quality feature includes a time series view of test case success and failure, helping users track failures and their timing. Each test also shows the number of failing rows, offering insights into issues. In Collate 1.4, we made debugging even more accessible by further displaying captured samples of failed rows directly in the UI, allowing users to identify and address problems quickly.

Connectors

Our connector ecosystem grew by 19 connectors in 2024, bringing our total count to 80+ connectors. Releases include:

  1. API Services: OpenAPI

  2. Dashboard Services: Sigma, Power BI Report Server, Qlik Cloud

  3. Database Services: Exasol, Teradata, SAP ERP, Iceberg, SAS Viya, Apache Doris

  4. Pipeline Services: Matillion, Azure Data Factory, Stitch, Apache Flink, Spark

  5. ML Services: Vertex AI

  6. Storage: GCS Storage Connector

  7. Messaging: Kafka

  8. Metadata: Alation

OpenMetadata Year in Review

This recap only touches on some of the Collate-specific commercial features from this year; you can find even more capabilities in the Collate’s managed OpenMetadata service in the open source OpenMetadata 2024 year in this blog post here. Select highlights include enhanced data governance through domain & RBAC controls, glossary improvements, and data insights, improved discovery capabilities with lineage maps, data explore, and API & metric entities, expanded data observability with alerting, Incident Manager, and data profiling.

The Year Ahead

The year 2024 marked significant strides in making metadata management more intelligent, automated, and accessible for every data team. We've deepened our capabilities for data discovery, observability, and governance, and brought them into a single platform for data practitioners to work together with shared context and accelerated productivity.

As we look ahead to 2025, we're doubling down on our commitment to transform how data teams collaborate through AI. Building on the foundation of MetaPilot and expanding our multi-agent AI strategy, we'll continue to push the boundaries of what's possible in metadata management. No matter what data team you're on-data platform/engineering, data governance, data science & analysts, business users, or data & executive leaders—we want to transform your data culture and deliver even more new features that change the way you work with data.

;