19 08
2025
Imagine this: a promising e-commerce company launches its biggest marketing campaign of the year. The team has spent weeks crafting the perfect message, designing ads, and segmenting their audience for a personalized push. The campaign goes live. The initial numbers look fantastic – but a week later, the disaster unfolds.
Thousands of high-value promotional packages are returned, marked “undeliverable.” A crucial product line is almost sold out in one region while warehouses overflow in another. The personalization engine, a key investment, sent recommendations for hiking gear to customers living in downtown Manhattan.
The culprit wasn’t a failed marketing strategy. It was a silent killer: poor data quality. A corrupted customer data import, inconsistent product codes between the inventory and sales systems, and incomplete customer profiles were all it took to turn a multi-million dollar campaign into a logistical nightmare and a customer service crisis.
This scenario isn’t just a scary story; it’s a daily reality for businesses worldwide. At its core, data quality is a simple measure of how fit your data is to serve its specific purpose.
It’s not an abstract technical concept – it’s the measure of your data’s health and its ability to fuel correct, profitable, and intelligent business outcomes.
Every decision you make, from strategic financial planning to a simple customer service email, is built upon a foundation of data. If that foundation is cracked, everything built upon it is at risk.
Ignoring data quality isn’t just risky; it’s incredibly expensive. The financial drain of bad data manifests in obvious and hidden ways – wasted resources, missed opportunities, and operational friction. But the true cost is staggering.
According to research from Gartner, poor data quality costs organizations an average of $12.9 million every year (Source: Gartner). Another study highlighted in the MIT Sloan Management Review estimates the cost can be as high as 15% to 25% of a company’s revenue (Source: MIT Sloan Management Review).
Now that we understand the immense cost of bad data, how do we begin to diagnose and fix it?
The first step is to learn how to inspect it. Think of a master chef preparing for a dinner service. They don’t just grab ingredients from the pantry; they meticulously inspect each one. Is the fish fresh? Are the vegetables crisp? Are the spices correctly labeled?
In the world of data, this inspection process is guided by what are known as the 6 data quality dimensions.
These are the foundational lenses through which you can evaluate the health of your data. Mastering them is the first step toward building trust in your information and the decisions you make with it.
Achieving high data accuracy is a primary goal because business decisions rely on facts, not just well-formatted information.
These missing fields are a failure of data completeness, limiting the data’s usefulness.
The goal is to achieve and maintain data consistency across the enterprise, creating a single, reliable view.
The principle of data uniqueness is focused on eliminating harmful data duplication from your systems.
This structural check, known as data validity, ensures that data values conform to the required syntax.
To make these concepts easier to remember, here is a simple table that summarizes the core question each dimension asks.
Data Quality Dimension | Core Question It Asks | Common Failure Example |
Accuracy | Is the data true? | A customer’s address is listed as their old home. |
Completeness | Is all the necessary data present? | A phone number field, critical for sales, is empty. |
Consistency | Does the data match across systems? | A customer is “Active” in the CRM but “Inactive” in the support tool. |
Timeliness | Is the data available when needed? | Website traffic data from yesterday isn’t available until next week. |
Uniqueness | Is this the only record for this entity? | Three records exist for the same customer under slight name variations. |
Validity | Is the data in the correct format? | An email address field contains “no-reply” without an “@” symbol. |
Mastering the six core dimensions of data quality will put you far ahead of the competition. It provides a robust foundation for building trust and reliability in your data assets.
However, in today’s complex data landscape – with information flowing from cloud applications, IoT devices, and third-party systems – a truly mature data strategy requires a slightly wider lens.
While the original 6 data quality dimensions focus on the intrinsic characteristics of the data itself, modern data governance also demands that we consider how data is handled and who can use it.
To address this, we introduce two more dimensions that are critical for turning data into a true enterprise-wide asset.
Embracing all eight of these dimensions gives you a complete, 360-degree view of your data’s health. It allows you to move beyond simply cleaning data to building a robust governance framework that ensures your data is not only correct but also secure, trustworthy, and actively fueling business growth.
Understanding the dimensions of data quality is the first step. But to make a real business impact, you must move from abstract concepts to concrete numbers.
The old management adage, “You can’t manage what you can’t measure,” is the fundamental principle of any successful data quality initiative.
This section provides a practical guide to stop guessing about your data’s health and start scoring it.
We’ll give you the metrics to build a data quality scorecard and a step-by-step framework to put it into action.
A Data Quality Scorecard is a report that provides objective, numeric scores for your data. By assigning Key Performance Indicators (KPIs) to each dimension, you can establish a baseline, track improvements over time, and clearly communicate the state of your data to business stakeholders. Here’s how to calculate the core metrics:
This KPI measures the percentage of data that is present against the potential of being 100% complete. You calculate it by dividing the number of fields that have data by the total number of fields you are evaluating.
This metric checks how much of your data conforms to your predefined business rules and formatting. It’s calculated by comparing the number of records that pass your validation rules against the total number of records.
This KPI identifies the level of duplication within a dataset. You find it by dividing the number of records that are truly unique by the total number of records.
Accuracy is the measure of truthfulness and is often the hardest to quantify automatically. The KPI is the percentage of records that are verified to be correct.
This metric doesn’t use a percentage; instead, it measures the gap or delay in your data’s availability. It’s the difference between when data is needed for a business process and when it actually becomes available.
This KPI measures the alignment of data about the same entity across different systems. You calculate it by identifying the number of records where a critical attribute is the same across all systems and dividing by the total number of records.
With these KPIs, you can now implement a simple, repeatable process to begin your data quality journey.
You can’t measure everything at once. Start with the data that has the biggest impact on your business. Is it customer contact information? Product pricing data? Financial transaction records? Choose one or two high-value datasets to begin.
For your chosen CDEs, define what “good” looks like. What fields are mandatory (Completeness)? What format must they be in (Validity)? What are the acceptable values for a field like “Order Status”? Document these rules clearly.
Apply the formulas from the scorecard to your CDEs. This will give you your initial baseline data quality score. This number is your starting point.
A score of “75% completeness” is a symptom. The real value comes from investigating the disease. Why are the records incomplete? Is a web form confusing? Is there a bug in an API integration? Is the sales team not trained on why that field is important?
Implement solutions to fix the root causes you identified. This could be a technical fix (updating a form) or a process change (new training). Then, run your data quality assessment again and watch your scores improve. This creates a continuous cycle of improvement.
You can read more on data quality framework in another article here.
This framework is a powerful way to start, but as you can imagine, applying it manually across millions of data points, dozens of systems, and multiple teams becomes incredibly complex, if not impossible. Manual scoring is not sustainable, and tracking rules in spreadsheets doesn’t scale.
This is where a dedicated data governance platform becomes essential. It automates the process of discovering data, defining rules, monitoring quality, and managing remediation workflows at an enterprise level.
At Murdio, we specialize in implementing Collibra, the industry-leading data intelligence platform. We help organizations take this exact framework and embed it into a powerful, automated system that provides a single source of truth for your data’s quality. This ensures lasting trust and empowers your entire business to make decisions with confidence.
You understand the dimensions, and you have a framework for measurement. The next logical question is, “What do I use to actually do all of this?” Manually running checks in spreadsheets is a recipe for failure. To execute a data quality strategy effectively, you need dedicated tools.
The good news is that there is a thriving ecosystem of solutions available. Broadly, these tools fall into two main categories: powerful open-source frameworks designed for technical teams and comprehensive commercial platforms built for enterprise-wide governance.
For organizations with strong data engineering teams, open-source tools provide a flexible and powerful way to embed data quality directly into their code. They are excellent for automated validation and are highly customizable.
Two popular examples are Great Expectations, which allows teams to define data tests in a clear, declarative way within a pipeline, and Deequ, a library built by Amazon for profiling and monitoring quality in very large datasets within a Spark ecosystem.
While open-source tools are excellent for technical tasks, they often lack the user-friendly interfaces and holistic governance features that businesses need. This is where commercial platforms excel.
A data catalog platform like Collibra, for example, is a comprehensive data Intelligence solution. It embeds data quality within a complete governance framework, connecting technical rules to a business glossary, assigning ownership to data stewards, and visualizing end-to-end data lineage.
Choosing between these options depends entirely on the problem you are trying to solve. To make the choice clearer, the table below maps common data quality challenges to the type of solution best suited to address them.
Your Goal or Challenge… | Open-Source Tools (e.g., Great Expectations) |
Commercial Platform (e.g., Collibra) |
“I need to stop bad data from entering my data warehouse pipeline.” | ✓ Ideal Fit | ✓ Can do this, but it’s part of a much larger system. |
“I need to create a business glossary and assign official owners for our data.” | ✗ Not Designed For This | ✓ Ideal Fit |
“I need business users to see data quality scores without writing code.” | ✗ Not Designed For This | ✓ Ideal Fit |
“I need to see the full lineage of my data, from source to my BI report.” | ✗ Limited / Manual | ✓ Ideal Fit |
“I need a flexible, code-first way to define and version-control data tests.” | ✓ Ideal Fit | ✗ Less Flexible (Uses a UI-driven approach) |
As the table illustrates, the right tool is the one that best supports your organization’s specific needs and data culture.
For technical teams wanting to harden their pipelines with code-based tests, open-source is a fantastic and powerful start.
However, for organizations committed to building a true, enterprise-wide culture of data trust – one that involves business users, establishes clear ownership, and provides end-to-end visibility – a comprehensive platform is the strategic choice.
And an expert implementation is the key to unlocking its full value. That’s where Murdio comes in. We implement leading enterprise data catalog solutions like Collibra to help you achieve that vision.
We’ve traveled from the high-stakes cost of bad data, through the six (and even eight) dimensions that define its health, and into the practical, real-world application of measuring and managing it with frameworks and tools.
If there is one key takeaway, it is this: understanding the dimensions is just the first step.
True success, and the immense business value that follows, comes from systematically measuring, managing, and fostering a culture where data quality is a shared responsibility.
Data quality isn’t a one-time project to be checked off a list; it’s a continuous, business-critical discipline that protects your organization and powers its most ambitious goals.
You now have the blueprint for data quality. If you’re ready to move from theory to transformation and implement a world-class data governance program without the friction, our experts can help.
Contact Murdio today to learn how our Collibra implementation services can accelerate your journey to complete data trust.
To help clarify some common points, here are answers to a few frequently asked questions about data quality.
Data quality management is the overarching business process for acquiring, implementing, and overseeing a framework (like the 5-step process we discussed) and technology to ensure the health of an organization’s data. It’s the active, continuous discipline of applying rules and monitoring data to ensure it meets established standards.
You create data quality standards by doing the work outlined in the “How to Measure” section. It involves:
A very common issue is when incorrect data values are entered into a system. For instance, a sales representative might enter “999-999-9999” as a customer’s phone number just to get past a required field. The value is properly formatted (passing a data validity check) but it is factually wrong (failing a data accuracy check). This is one of the most frequent quality issues organizations face.
Data duplication, which harms data uniqueness, means you have multiple records for the same single entity (e.g., three records for one customer). Data consistency, on the other hand, deals with contradictions. You could have just one unique record for a customer (no duplication), but if it lists their address as “123 Main St” in your CRM and “456 Oak Ave” in your billing system, you have a data consistency problem.
© 2025 Murdio - All Rights Reserved - made by Netwired