It’s never too late to build a data catalog. If you’re planning data catalog implementations for your enterprise, here are some key steps to follow to make the process efficient and make sure the catalog does its job, helping everyone in your company use the data to its full potential.
Should you build a data catalog in-house or hire external experts?
Let’s start with a question many companies ask before they get to building a data catalog. Obviously, we’re writing this from the perspective of specialized Collibra consultants, but this doesn’t mean we’re going to convince you to always hire an external team to create a data catalog.
What this means, though, is that we know the reality of implementing Collibra from dozens of international projects. We’ve also worked with companies that both outsourced data catalog creation and built data catalogs internally. Usually, only to then hire external consultants anyway – either to expand and customize the catalog or to make it more usable, as they realized they don’t have enough specialized Collibra expertise in-house.
And we’ll say this:
- Build a data catalog in-house if you have the right experts on board. This means both people who know the ins and outs of Collibra (or other data governance software you select) and where the data you want to catalog comes from, plus data experts who can contribute their data governance and management expertise.
- If you don’t have enough specialized team members, you can do several things:
– Hire individual experts proficient in the tool you’re using to build the data catalog to complement your team,
– Bring an external technical team on board to build the data catalog for you,
– Or outsource customized development to an experienced and knowledgeable team of experts.
Data intelligence platforms like Collibra are pretty complex pieces of software. And on top of that, they need to seamlessly integrate with your existing data landscape, plus be ready for further customizations and integrations. So, even though building a data catalog sounds pretty simple at first glance, a lot of behind-the-scenes processes go into it that you need to think of while building it.
This is not to discourage you, by any means, but to make sure your data catalog fits the needs of your business and serves as a reliable base for other data management processes and workflows, such as data lineage.
We’ve already described some data catalog best practices and data catalog benefits in other articles. In this one, let’s focus on the steps you need to take to build your data catalog.
Key steps to build a data catalog
A successful data catalog implementation calls for a structured plan that aligns business needs with its technical execution. Here are some things to consider.
Step 1: Prepare a data catalog implementation plan
- Clearly define what the data catalog should accomplish for your business. For instance, an ecommerce company could aim to improve product data consistency across global marketplaces, while a banking institution would need to consider compliance with very specific privacy regulations involving sensitive data.To put this into specifics, let’s take a look at a story of one of our Clients. For a Swiss private bank, we were tasked with cataloging and classifying sensitive critical data elements (SCDEs) as required by the FINMA Circular 2023/01 regulation.What you’ll also see when you read the case study is that the bank had already had an instance of Collibra deployed but left inactive, as the company did not have processes established around it and experts to take care of implementing them using the platform. The data management was decentralized, and there was no structured way to govern, track, and manage sensitive data.
- Engage stakeholders – data owners, analysts, IT, and compliance teams early in the process. Depending on your industry, you might need to adhere to specific guidelines and pursue business goals that you’ll need to define.Also, assign ownership to all data assets to make sure all the right people are involved and later will be included and alerted in various automated workflows implemented in the data catalog.
- Identify how the catalog will connect with existing data sources, governance tools, and reporting platforms. The catalog might need seamless integration with data platforms, analytics dashboards, ETL tools, business intelligence tools, ERP and CRM systems, and more.In the project for the Swiss bank we mentioned above, we had to integrate over 100 applications that contained the mentioned SCDEs, so that no instance was missed. Depending on your industry, scale, and data setup, you might have to include multiple sources of data to feed the catalog.
- Plan implementation in stages, starting with a pilot phase (or proof of concept) to refine the process before scaling organization-wide. If you’re a multinational corporation, you could, for example, start with a single department, such as finance, before expanding to HR and marketing.Also, focus on data that’s most in demand first – you don’t need to include all the data all at once from the very beginning of your data catalog implementation. This will make the implementation process significantly easier.
- Develop training programs and user adoption strategies to make sure everyone’s on board with the data catalog and knows how (and why) to use it. Documentation, tutorials, and workshops will all come in handy – for example, talking about metadata management and data stewardship.The data catalog is meant to serve everyone in your organization – also the people who are not data or tech experts (which reduces the strain on data and tech teams to extract the data for other teams.) Training them to use the new software will be key to achieving those goals.
- Establish business context for all the data, with data glossaries created in your data catalog. This is another way to foster adoption and make sure non-technical users can find what they are looking for.According to Bozhena Baranovskaya, one of our top Collibra experts, it’s key to add business definitions to physical datasets, making adoption and searching for data easy even when there are no data products yet to warrant complete data self-service.
Step 2: Choose a data catalog tool
All the points in the previous section will already help you define software requirements for your data catalog. Other things you should consider include:
- Scalability. Can the tool handle large, diverse datasets? If you’re a telecom company managing billions of call records, will the tool be able to handle that?Yes, we’ve just advised you to start small, but you’ll eventually want to expand your data catalog and its capabilities – and you should have a tool that will make that possible and relatively easy.
Integration capabilities. Does it connect seamlessly with your existing data platforms, BI tools, and governance frameworks? If out-of-the-box integrations are not available, how easy is it to build customized integrations with your data ecosystem? For example, if you provide financial services, is deep integration with risk management systems possible?
- Automation features. Does the data catalog tool offer AI-powered metadata discovery and automated lineage tracking? Basic data catalogs are usually not enough for robust data management, so you probably need to look beyond the basic functionality.
- User experience and business context. Is the interface intuitive for both technical and business users? Plus, can you easily create a business context so that non-technical users can find and access what they’re looking for?
- Security and compliance. Does the platform support role-based access controls and regulatory compliance requirements? Again, depending on the industry, your company likely has to comply with multiple regulations, so check whether that’s possible with your data catalog tool.
- Ease of implementation. Let’s face it, building a data catalog will always pose its challenges. But when you’re selecting your tool, consider the skillsets and expertise you currently have in-house and possible skills you need to get elsewhere, from external service providers like Murdio.
If you want our advice on a specific tool, we’ll most likely say Collibra. That’s because Collibra stands out as a leading enterprise data catalog solution, letting you:
- Create a unified view of your data assets
- Automate time-consuming data stewardship tasks, including classification and generating descriptions using AI)
- Enrich data with helpful business context
- Collaboratively build reusable data products
- Simplify access to curated, trusted data
As certified Collibra consultants, we can customize your data catalog for a specific business case and enrich it with features your company needs that can fulfill basically any data management need.
Step 3: Find a data catalog implementation team
The better you’re prepared with strategic considerations, business goals, and data governance processes, the easier the implementation should be – at least theoretically.
But implementation is often when things don’t go as planned. That’s why this step is really important. It’s not so much about in-house vs. external Collibra consultants. But it’s very much about the expertise and experience of whoever takes care of your data catalog implementation.
In many cases, it will be your in-house team working with external consultants to form the perfect implementation team, which should include:
- Data governance leads, who define policies, standards, and compliance requirements. This might be a Chief Data Officer to oversee all of your governance efforts.
- Data stewards who maintain data quality, curate metadata, and ensure proper classification. You might assign product managers as data stewards for SKU-level metadata.
- IT and data architects, who design integrations and oversee technical implementation, with the right technical expertise, for example, in multi-cloud environments.
- Business analysts, who keep track of the alignment with business use cases and user needs. For example, you could involve analysts to enhance audience segmentation through better metadata tagging.
- Collibra experts (or experts in another tool you’re using) who can provide best practices, tool configuration, and implementation support. And who can save you hours or even days of work troubleshooting because they know what and where to look for.
For specific examples of why the right expertise is absolutely crucial, take a look at this case study of a leading DACH retailer that we partnered with in the past. Our client had already worked with an external Collibra team, but that team lacked expertise in advanced implementations and didn’t adhere to industry best practices, leaving the company unable to proceed with its data management initiatives and goals fully.
Of course, this project involved things that were much more advanced than the initial setup of a data catalog. But, as we already said before (and will probably repeat many times more), the data catalog is always your foundation for more complex data management workflows.
On its own, it’s not nearly enough for most (if not all) enterprises and their data governance initiatives and goals. But setting it up properly opens up opportunities for more advanced features and workflows that will be.
Plus, whenever you hire a Murdio technical implementation team – like in the example above – you can adjust your team composition depending on your roadmap and specific project needs. You can then select team members with the exact skill sets that are required at any given time, which guarantees efficient use of resources and targeted expertise.
Creating a data catalog is the first step towards better data management
But it’s just the beginning.
That’s also why building a data catalog is a strategic initiative, and it requires careful planning, the right technology, and strong governance processes in place.
When you follow data catalog best practices and leverage tools like Collibra, your data catalog will enhance data visibility, compliance, and decision-making across the company. And that’s whether you’re a financial institution seeking compliance, a healthcare provider improving data accessibility, or a tech company scaling analytics – it all starts with a data catalog.
And if you’re looking for top experts to support you in data catalog implementation and enhancing it with more features including data lineage and software integrations, we’d love to hear from you.
Frequently asked questions
How do you build a data catalog?
To build a data catalog, start by:
- defining business and data governance objectives
- assessing data assets and providing context for them
- setting data governance standards
- selecting a suitable tool like Collibra
- engaging stakeholders
- gathering an internal team and/or hiring external experts to help.
Start with a proof of concept and focus on critical, in-demand data first. Implement your data catalog in phases across the company and continuously optimize it.
What should be in a data catalog?
A data catalog should include:
- metadata
- data lineage
- classification structures
- data ownership information
- access controls
- business glossary terms to ensure clarity and usability
For instance, if you provide financial services, your data catalog may document transaction metadata, customer segmentation rules, and regulatory compliance classifications.
A data catalog should also include up-to-date, well-documented definitions of data assets to prevent ambiguity and improve cross-departmental collaboration.
Why do you need a data catalog?
A data catalog is essential if you want to enhance data discovery, governance, and compliance while improving collaboration between business and technical teams.
It enables companies to manage data as a strategic asset and make the most of it, rather than sitting on tons of data that can’t be efficiently used.
A well-implemented data catalog also reduces data silos, minimizes redundancy, and lets business users quickly access trusted, high-quality data for decision-making.
With a data catalog in Collibra, you gain:
- A unified view of all your data assets across your entire ecosystem, connecting all data sources and increasing visibility and transparency.
- A productivity boost with automated curation and data stewardship tasks, including automatic classification and AI-driven asset-description generation.
- Valuable business context for your data, with glossary terms, policies, data contracts, quality, metrics, stakeholder ownership, and more, fostering enhanced data understanding.
- The ability to collaboratively build and manage data products with a flexible operating model and workflows.
- Simplified, self-serve access to curated, trusted data using the Collibra Data Marketplace.