Collate Blog

OpenMetadata 1.0 Release

··

8 min read

Cover Image for OpenMetadata 1.0 Release

Improved Schemas & APIs, Ingestion Improvements, Storage Services, Dashboard Data Models, Auto PII Classification, Localization, and much more.

Written By: Teddy Crépineau, Shailesh Parmar, Nahuel, Chirag M, Suresh Srinivas, Mayur Singal, Pere Miquel Brull, Mohit Yadav, Shilpa Vernekar, Sriharsha Chintalapani, Karan Hotchandani, Sachin chaurasiya and OpenMetadata Community.

The much-awaited OpenMetadata 1.0 Release is here. This has been a long exhilarating journey over 1.5 years, with 40 releases and 100s of features built with the help of 160 community developers and validation in 100s of installations across the world. With release 1.0, the majority of our vision is delivered — a best-in-class metadata platform that supports Discovery, Lineage, Collaboration, Governance, Data Quality, and Data Insights in a single platform.

The focus of this release has been to provide a stable and sturdy product, along with many improvements around metadata ingestion. A lot of work has gone into stabilizing and improving the Schemas and APIs, ensuring that the APIs are backward compatible. OpenMetadata’s ingestion framework has always supported a wide range of service connectors, and we are introducing a new one — Apache Impala. Moreover, based on community feedback, the overall UI/UX has been improved for the ingestion and glossary.

The current release covers essential automation to tag PII and sensitive data to help companies comply with GDPR. Finally, we now support the concept of Data Models for Dashboard services to define and manage data within the dashboard tool itself.

Community Update

A big THANK YOU to everyone involved in the OpenMetadata community, using the product and continuously giving us feedback, and helping us improve. We could not be happier with the growth and care of this community since we open-sourced OpenMetadata. All of this could not have been possible without you.

  • Crossed 2200+ GitHub stars

  • The Slack community reached 2700+ members

  • 161 Open source GitHub developers

  • 866 Commits were merged into the 1.0 Release

OpenMetadata 1.0 Release Highlights

APIs & Schema

In the 1.0 Release, we’ve stabilized and improved the Schemas and APIs. OpenMetadata has always been particular about the power of metadata APIs to transform the world of data. Well-designed APIs enable the reuse of the central standards-based metadata to simplify the existing tools currently burdened with maintaining their own duplicated fragmented copy of the metadata. Many applications and automation can be built using our APIs. With this release, we have stable and backward-compatible APIs for developers to innovate.

Ingestion

Connecting to your data sources has never been easier. You can directly find all the necessary permissions and connection details in the UI. The overall UX/UI has been improved to create new connections to your data sources. When integrating your systems, you have detailed contextual documentation within the app.

When testing the connection, every internal step of the metadata extraction process is tested to let you know which of the required permissions are missing. We now have a comprehensive list of validations to let you know which pieces of metadata can be extracted with the provided configuration. There’s a clear status displayed if all or only partial metadata will be ingested.

We have improved the performance when extracting metadata from sources such as Snowflake, Redshift, Postgres, and dbt. We are fetching as much bulk information as possible. We are providing more levers so you can tune how the ingestion behaves, allowing you to enable or disable the ingestion of tags or owners for any service.

We have improved the parsing process and the overall performance of dbt workflows. Support for the Apache Impala connector is available. We will remove the Impala schemes from the Hive connector in the next release. OpenMetadata now supports secure LDAP connection without the MTLS (verification of server) requirement.

Storage Services

Based on your feedback, we created a new service to extract metadata from your Cloud storage. The earlier Data Lake connector ingested one table per file, which covered only some of the use cases. Storage Services help us represent containers (e.g., Buckets) and their data structure, if any. The first implementation has been done on Amazon S3, wherein you can specify your tables and partitions and see them reflected with the rest of your metadata. This has been a major contribution from Cristian Calugaru, Principal Engineer at Forter. We will keep adding support for other storage sources in the upcoming releases.

Dashboard Data Models

Dashboard Services now support the concept of Data Models: data that can be directly defined and managed in the Dashboard tooling itself, e.g., LookML models in Looker. Data Models will help us close the gap between Engineering and Business by furnishing crucial metadata from sources typically used and managed by business users or analysts. The first implementation of data models has been done for Tableau and Looker.

Query as an Entity and UI Overhaul

The 1.0 Release has an Improved UI for SQL Queries, with faster loading times and allowing users to vote for popular queries. Users can now create and share a Query directly from the UI, linking it to multiple tables if needed. Users can also discuss and react to the other queries in each table.

Localization

With more and more users from across the globe using OpenMetadata, we’ve added Localization Support in the UI. Now you can use OpenMetadata in English, French, Chinese, Japanese, Portuguese, and Spanish. We would greatly appreciate community contributions to the above-mentioned languages.

OpenMetadata uses ElasticSearch or OpenSearch to index all the metadata and provide the search options. Earlier, we only provided options to index and analyze documents in the English language. In this release, we’ve worked on a pluggable analyzer. The mapping is specific to each language and allows users to configure based on their language of choice.

Glossary

The 1.0 release has a new and improved UI for Glossary. Global search has been enabled for Glossary terms and Tags, making it easier to discover data. Keeping in mind further changes and improvements that can take place in an organization, the Glossary Terms are no longer restricted to the original glossary they were created in. Users can now drag and drop Glossary Terms within and across Glossaries, as required. Instead of searching and tagging their assets individually, users can add Glossary Terms to multiple assets from the Glossary UI.

Auto PII Classification

The European Union’s data protection law General Data Protection Regulation (GDPR) requires companies to follow strict rules for data protection, especially around personal data. Keeping the GDPR requirements in mind, OpenMetadata has implemented an automatic way to tag PII and sensitive data. The auto-classification is an optional step of the Profiler workflow. We will analyze the column names, and if sample data is being ingested, we will run NLP models on top of it.

Several improvements have been made to data discovery. We’ve Improved Relevancy with added support for partial matches. The ranking has been improved with the most used or higher Tier assets at the top of the search. Support has been added for Classifications and Glossaries in the global search.

Security

In this release, SAML support has been added.

Apart from that, we have an important Deprecation Notice: The SSO Service accounts for Bots will be deprecated. Going forward, JWT authentication will be the preferred method for creating Bots.

Lineage

The Lineage UI has been enhanced to display a large number of nodes. Users can view nearly 1000+ nodes. The UI has been improved for better navigation. Also, improvements have been made to the SQL parser to extract lineage in the Lineage Workflows.

Chrome Browser Extension

All the metadata is at your fingertips while browsing Looker, Superset, etc., with the OpenMetadata Chrome Browser Extension. The Chrome extension supports Google SSO, Azure SSO, Okta, and AWS Cognito authentication. You can Install the Chrome extension from the Chrome Web Store.

Other Changes

  • We now support long entity names (E.g., the S3 paths)

  • Support has been added to import as well as export all the entities, along with the ability to propagate tags during the process.

  • The Explore page cards will now display a maximum of ten tags. Users will still be able to view the other tags if needed.

  • Entity names support apostrophes.

  • Improvements have been made to the Summary panel to be consistent across the UI.

Planned for 1.1 Release

For the latest updates on our next release, please refer to the Roadmap.

Entities

  • Support for NoSQL, MongoDB, ElasticSearch connectors

  • Support for registering Services/Applications along with OpenAPI schema

Landing Page — Customizable Landing page for different personas of the Users.

Bulk Functionality

  • Upload descriptions, and tags to all the entities via import/export functionality

Data Quality

  • Quality Page UI improvements

  • Test failure propagation and view quality in the lineage UI

Thanks to our Contributors

A huge shoutout to our contributors for their code contributions and super helpful feedback from the start of this open-source project. Your active participation has made a major difference to the progress of OpenMetadata. Thank you for all that you have done.

Thank you Cristian Calguru for building the much-awaited Storage Service and S3 connector. Thank you Cristian Osiac, Keith Sirmons, Hemal Mamtora, Bohan Hou, Hawkins, Boluwatife Victor, Deepa Rao, Noe Alejandro Perez Dominguez, Georg Heiler, Krotonet, Ragul Balaji Ravichandran, Joe Shajan, Yu Ishikawa, JINZHANG2017, Stéphane Sol, VolkovGeoPhy, Austin Witt, Sasha Chung, Kcd83, Kevin R, Charlie Menke, John McCormick, Megumi AIKAWA, Jan-Pieter van den Heuvel, Anuj359, and Luis Felipe Almeida Nogueira.

Thanks to Allen Haozi, Cristian Osiac, cvlendistry, Emin Uzun, Eric Hausig,

Francisco J. Jurado Moreno, Georg Heiler, Haithem Souala, Klaus Noerregaard, Laila Patel, Martin Trillhaas, Mukhesh Narra, Najd Alqahtani, Rogério Ferreira Dos Santos, and Siva Arumalla for raising GitHub issues that made it to the 1.0 release. Thank you abcabhishek, Chethan B S, and Vitor Fortunato for your feedback on the GitHub issues included in this release.

Please reach out to us on Slack if you have any questions about code, installation, and docs. For feature requests, please file a GitHub issue or reach out to us on Slack. Interested in contributing code? Here are some good starting issues to get you going.

Please give us a GitHub star if you like what we are doing. That will help OpenMetadata to reach a wider audience and to build a community to collaboratively solve data problems.


OpenMetadata 1.0 Release was originally published in OpenMetadata on Medium, where people are continuing the conversation by highlighting and responding to this story.