22 07
2025
In data governance, one of the key elements is being able to track data as it moves across systems and gets transformed, so that everyone can be certain they’re using the right data. Here’s how technical lineage works in Snowflake and how to use it with your Collibra data catalog.
Let’s start with the basics first and clear up some definitions around Snowflake and technical lineage, as lineage tends to sometimes get confused.
(We talk about types of lineage in detail in our article “Data catalog vs. data lineage.”)
The terms data lineage and technical lineage are closely related, but they refer to slightly different scopes of metadata tracing within a data ecosystem.
Here’s the difference in a nutshell:
Data lineage
Data lineage refers to the life cycle of data – how it moves through the system from origin to destination, including transformations, processes, and business logic applied along the way.
The focus of data lineage includes:
Technical lineage is a subset of data lineage and dives into the underlying technical components. In other words, it shows how data flows across tables, columns, files, pipelines, scripts, and job executions at a more granular and system-specific level.
It focuses on:
What’s ETL in lineage?
The ETL acronym stands for Extract, Transform, Load. It’s a process used to integrate and move data from multiple sources into a single destination, convert it into a usable format, and store it in a central repository like a data warehouse.
Here’s a very simple breakdown of each stage:
Now, with definitions out of the way, let’s take a look at Snowflake and technical lineage.
Or, let’s clarify one more definition first.
Snowflake is a cloud-based data platform providing a data warehouse as a service (DWaaS). It allows businesses to store, manage, analyze, and share data, offering a scalable and efficient solution for different data workloads.
It’s known for its unique architecture that separates storage and compute, enabling independent scaling and high concurrency. And it uses standard SQL for data querying and manipulation, making it accessible to a wide range of users and tools.
Snowflake also offers lineage, allowing you to track data flow and transformations within your Snowflake environment and providing insights into how data is created, modified, and consumed. It helps with data governance, impact analysis, debugging, and documentation by visualizing the relationships between Snowflake objects.
Lineage is obviously crucial for data analytics and reporting, as it supports accurate insights and decision-making. With a clear view of data sources and transformations, analysts can trust the data they work with.
Collibra provides complete visibility and transparency of all the data stored in Snowflake Data Cloud. Technical lineage, including column-level lineage and transformations, gives you a complete picture of how data transforms, flows, and is used from system to system.
End-to-end lineage is, more often than not, a very complex process that involves connecting to BI and ETL tools, as well as SQL data sources. Collibra offers automatic stitching capabilities for most of its technical lineage sources, creating a connection between the data objects in your technical lineage and the relevant assets in the Collibra Data Catalog.
To see how Collibra and Snowflake both work in action, take a look at this presentation from The Data Intelligence Conference.
In one of our latest projects for a leading company in the pharmaceutical industry, we worked on applying Snowflake’s custom technical lineage in Collibra.
We needed to obtain information about data objects, attributes, and ETL processes from Snowflake in Collibra data catalog. We also wanted to establish and streamline the flow of information between the data objects.
To translate it into business goals, this is really important for:
In general, there are several data ingestion methods when it comes to technical lineage:
Snowflake lets you use a mix of ingestion method, and what we ended up doing was the SQL-API ingestion method combined with the shared storage method.
But it wasn’t smooth sailing at first.
We ran into issues with data parsing in Collibra when using the SQL-API method first. We reached out to Collibra, and they responded quickly to make the necessary updates in the system.
Collibra also shared Python scripts on their marketplace for the shared storage method. You collect data object information in a CSV file, and then the script parses it and adds it to the Edge connector.
What then happens in Collibra is stitching, connecting the data objects in your technical lineage to the relevant assets in the Collibra Data Catalog. As data attributes are mapped in the catalog, you gain insight into how data flows and a complete understanding of the data landscape and all essential metadata.
Using a combination of the two methods – SQL-API and shared storage – we managed to deal with 65k queries for the client in just 3 hours.
The Snowflake custom technical lineage in Collibra turned out to be a way to optimize business processes, with no need to ever log in to Snowflake, and all metadata flows easily visible in the data catalog, creating a logical schema.
And while Collibra experts are currently working on making the ingestion methods more robust and easier to apply, we’ve managed to devise a way to apply Snowflake’s technical lineage in Collibra more efficiently at scale.
Which, by the way, is a huge advantage of working with external Collibra experts, who know what to ask for and where to improve. We’ve also submitted several ideas on Collibra’s Ideation Platform that are meant to make this process less complex in the near future.
As data sources and business processes change, it’s key to regularly update your technical lineage information in your data catalog. This is exactly why revisiting the process is so important as you introduce new data sources and modify existing ones.
It’s how you can make sure that stakeholders have access to reliable and timely data lineage information, understand the impact of data changes across the organization, and can make informed decisions based on that data.
If you need help connecting your Snowflake lineage to your Collibra data catalog, reach out to our experts – we’ll find the best way to set up the process for your organization’s needs.
© 2025 Murdio - All Rights Reserved - made by Netwired