Introducing Collate MetaPilot: Generative AI for Your Metadata


4 min read

Cover Image for Introducing Collate MetaPilot: Generative AI for Your Metadata

The Collate team is excited to announce the launch of MetaPilot, a new generative AI application built on the Collate platform (powered by the open-source OpenMetadata project). MetaPilot accelerates key workflows for data teams by leveraging standardized metadata across your data estate: Automate data documentation for data engineers, generate SQL queries for non-technical users with natural language and perform query troubleshooting and performance tuning for SQL users. MetaPilot helps boost data team productivity, reduce query processing costs, and democratize data team access to insights while providing administrative controls for data privacy and usage. Try it out today in the Live Sandbox.

Automation to address data team pain points

MetaPilot leverages large language models (LLMs) and generative AI to address specific challenges in metadata management and data team workflows.

  1. Tedious labeling of data: Creating and maintaining high-quality metadata, such as column descriptions, can easily become an endless task.

  2. Difficulty discovering data: Users lose time trying to find the right datasets for their queries and then must figure out their structure to query against it

  3. Challenges writing and tuning SQL queries: Composing queries can be complex for non-technical users. Even SQL analysts can need help with complex queries, performance tuning, or troubleshooting.

Building on a Unified Metadata Graph

Collate MetaPilot is built upon OpenMetadata's standard metadata language and APIs, which define the structure of what it means to be a table, a schema, or any other data asset.

These metadata structures and interfaces, coupled with additional metadata insights around data observability, governance, and user-generated knowledge, create a Unified Metadata Graph that is crucial for understanding your data and is necessary for generative AI to provide meaningful results. This generated metadata further enriches the Unified Metadata Graph, and this cycle enables MetaPilot to deliver the best possible generative AI experience for data teams.

Key features of Collate MetaPilot

Natural language SQL query generation

MetaPilot includes a chatbot interface that allows non-technical users to generate SQL queries by asking questions in natural language. This feature democratizes access to data insights and reduces the burden on technical teams.

MetaPilot chat

Automatic bulk data documentation

You can schedule a process to let MetaPilot automatically generate suggested descriptions for data schemas, eliminating the need for manual documentation. MetaPilot ensures that data remains well-organized and easily understandable, and saves time for data engineers from manual labeling.

Copilot query optimization

MetaPilot assists SQL users with query building, refinement, table joins, relationships, and performance optimization. It provides guidance and recommendations to generate insights faster and optimize query efficiency.

Tailored Results, Data Controls, Commitment to Privacy

MetaPilot leverages natural language processing and machine learning techniques to analyze your metadata, such as database schemas and join information, to provide high-quality results tailored to your data. Given the sensitivity of data usage for generative services, MetaPilot was built around administrative controls and core design principles over how metadata is used to power these services.

MetaPilot is an add-on service that customers can request to install within their Collate instance. It will not function if it is not enabled. Administrators have control over which databases MetaPilot can access.

Collate maintains a strong commitment to customer data privacy. From the beginning, Collate has maintained the posture of not having any employee access to customer data in any of our deployment models, and this also applies to MetaPilot.

Extending the AI capabilities of the Collate Platform

MetaPilot is just the latest example of how Collate is leveraging AI and metadata to empower data teams. Other AI-powered features already in the Collate platform include:

  • PII Classification: Automatic tagging of PII powered by natural language processing to improve compliance and reduce risk.

  • Incident Severity Classification: Inferring incident severity using data quality and lineage information for faster issue triage.

Become a MetaPilot Design Partner

MetaPilot is a Collate commercial enhancement available today for all Collate customers. Contact your account manager for more information. We are seeking a limited number of Collate customers to serve as design partners for MetaPilot. Design partners will have the opportunity to provide feedback and shape the future development of MetaPilot. Interested parties can contact

The Future of MetaPilot

We have an exciting roadmap of additional MetaPilot features planned. This is just the first iteration of bringing generative AI and large language models to the OpenMetadata platform, with more semantic search, copilot chatbot, and anomaly detection experiences planned for the future. Collate's investments to automate common data team workflows are not just to improve data team productivity but as part of a larger vision of transforming data culture at every company.

Check out our demo video to see it in action, and try MetaPilot out today in the live sandbox.