Announcing Collate 1.8

Enterprise‑grade context for AI, contracts for data you can trust, and new Microsoft ecosystem connectors
Over the past month, our team has been heads down on the capabilities that matter most to modern data teams: making metadata securely consumable by LLMs, formalizing expectations between data producers and consumers, and meeting enterprise provisioning requirements while continuing to expand our connector catalog.
Today, we’re thrilled to introduce Collate 1.8 of our managed OpenMetadata service delivering:
Model Context Protocol (MCP) server: The open, standard way for any AI agent to query your unified metadata graph.
First‑class Data Contracts: Machine‑readable schemas, SLAs, and quality guarantees that can be enforced automatically.
SCIM user provisioning: Friction‑free and secure identity lifecycle management for enterprises.
New Microsoft connectors: SSIS & SSAS : End‑to‑end lineage from SQL Server ETL packages and Analysis Services semantic models.
Below, we unpack each highlight and explain why an AI-powered, unified metadata platform is the best place to run them at scale.
Model Context Protocol: Bringing trusted context to every AI workflow 🚀
Collate delivers an AI-powered platform for unified data discovery, observability, and governance. It provides a single source of truth by centralizing metadata for all data assets – from databases and dashboards to pipelines, machine learning models, and even API services – in one place. At its core, Collate is built on a schema-first, API-driven architecture: a collection of JSON Schemas defines a standard metadata model, which is automatically translated into APIs and client code in multiple languages. This open, extensible design enables robust metadata management across use cases, empowering organizations to manage data assets at scale with rich context and relationships.
Today, Collate is taking this foundation a step further by introducing support for the Model Context Protocol (MCP), becoming the first enterprise-grade MCP server for metadata. MCP is a new open standard designed to connect AI models (like LLM-based assistants) with the systems and data where organizational knowledge lives.
By integrating MCP, Collate brings your users enterprise-ready context for data models, making existing metadata readily accessible to all data stakeholders – from engineers and analysts to data stewards and governance leads – with the following benefits:
Ease of use with the MCP server is directly embedded into the Collate server, meaning there’s no need to host and maintain additional infrastructure, unlike other MCP implementations.
A richer graph spanning discovery, observability, and governance metadata, not just lineage and docs. This allows even non-technical users to discover data, perform Root Cause Analysis and even update assets and glossaries through a conversational chat.
Enterprise‑grade security with an authentication layer and fine‑grained RBAC. The same attribute‑based policies you already manage in Collate are now governing LLMs to only see and act on permitted metadata.
Collate 1.8 ships a fully spec-compliant MCP server. Any LLM agent can now:
Search entities, glossary terms, and usage stats with a single endpoint
Traverse column‑level lineage to reason about dependencies and ease RCA
Create glossaries and glossary terms
Update metadata about existing assets
Because MCP sits on top of our Unified Metadata Graph, AI tools see the same truth your analysts and governance teams rely on: no extra backends or syncs required. That’s the power of a unified platform.
Collate MCP Architecture
We have developed the MCP server to be exposed natively in the same Collate server. This architectural choice brings two major benefits:
- Simplicity: Many MCP implementations require every user to deploy and maintain their own MCP server locally to connect to tools like Claude or Cursor. While MCP and LLMs are meant to help non-technical users, this introduces a technical barrier to running and configuring MCP servers.
With Collate, users can connect LLM platforms to Collate without having to maintain any other piece of infrastructure, as the MCP server is directly embedded within the Collate server.
- Security: The LLM will only have access to metadata and actions that the user is permitted to, thanks to the powerful RBAC controls in Collate.
When creating the connection to the MCP Server, users will pass their Personal Access Token (PAT), which ensures that each agent can only see and act on the metadata they have inherited from a user.
Collate AI Vision
We created Collate with a clear goal: to become the source of truth for everything related to data. We’ve come from a siloed and fragmented data industry, where every question was being answered with a different tool: Where can I find the data? How can I trust the data? What is the context of this data to our business?
LLMs face the same fragmentations we do, having to jump between multiple MCP servers to provide a comprehensive answer about your data. Collate’s Unified Knowledge Graph is the cornerstone of reliable answers and insights for your data, providing richer data context and high quality assets at scale, to both users and AI agents.
Data Contracts: Codifying trust, not just schemas 📜
Data-driven teams often struggle with informal agreements around data quality, schema changes, and SLAs. Collate 1.8 introduces formalized Data Contracts to define and enforce clear, actionable expectations between data producers and consumers.
Integrated Enforcement: Contracts are natively integrated with Collate’s Test Suite and Incident Manager.
Unified Visibility: Seamless integration with lineage views helps teams understand downstream impacts clearly.
Open Standards: Built using extensible JSON Schemas, aligning with modern DevOps and CI/CD workflows.
Collate, as a Unified Metadata Platform, brings together all data personas—data engineers, data scientists, governance teams, business users, and legal teams—to collaboratively manage data. This makes Collate uniquely positioned to implement Data Contracts, which provide a clear, structured way for all stakeholders to establish agreements around data expectations.
By centralizing Data Contracts within Collate, organizations ensure data is consistently defined, reliably governed, and transparently communicated. Data Contracts serve as a foundational mechanism to achieve trust, alignment, and clarity across the entire data lifecycle, significantly reducing data downtime and enhancing overall data quality.
Issue #21078 set the bar for a JSON Schema‑based DataContract entity that any table, topic, dashboard, or Data Product can reference. In 1.8, we ship:
Specification 1.0 – covering schema, semantics, security, quality assertions, and SLAs
REST APIs – create, version, and attach contracts programmatically
Think of it as OpenAPI for data: producers publish a contract, consumers integrate with confidence, and the platform automates compliance.
In our upcoming releases, we will be shipping:
UI integration - build contracts right within Collate using UI or YAML
Import Data Contracts - if you have any existing contracts, they can be imported into Collate and enforced
Native enforcement – violations flow into our Test Suite and Incident Manager, so owners are paged immediately
Visual diff & lineage impact – see exactly which downstream assets are at risk when a contract breaks
SCIM Support 🔐
Enterprises asked for hands‑off user lifecycle management. With 1.8, Collate includes a SCIM 2.0 endpoint so your Identity Provider can provision, update, and de‑provision Collate accounts automatically. Combine SCIM with our existing SAML/OIDC SSO, and you get:
Zero‑touch onboarding as teams grow
Guaranteed revocation when employees leave
Group‑to‑role mapping that keeps access aligned with org structure
Self‑hosted users can enable SCIM via an add‑on module later this year.
New Connectors: SQL Server Integration & Analysis Services 🛠️
Collate 1.8 adds two highly‑requested connectors:
SSIS: Already available in Collate, it parses XML to extract task dependencies and build pipeline-to-table lineage.
SSAS: This connector will be released in Collate 1.8.1 and will introspect models, dimensions, and measures so BI users can search and tag cubes.
Both support authentication via Azure AD and feed directly into the lineage graph, joining our 90+ turnkey connectors.
Improvements:
Large column pagination: Improved performance in loading large columns through pagination
Improved PII detection: Better data filtering to reduce false positives
Improvement for Glossary: Schema improved to be client-friendly for Bulk Asset APIs
Classification ownership: Added support for classification-level ownership assignment.
Asset Certification: Users can now add certifications (e.g., Gold, Silver, Bronze) to assets
Lineage Performance: Announcing Collate 1.8
Tableau & PowerBI Refactor: Significant improvements to metadata ingestion performance
Platform Updates:
Java 21 – In 1.8.0, we have upgraded to Java 21 LTS to continue supporting our security and using the latest dependencies
Dropwizard 4.X and Jetty 11 – Dropwizard provides the framework for our APIs, and we use Jetty as our server framework. Both of these went through a major upgrade in this release, stabilizing our platform