20 08
2025
In July 2024, U.S. regulators fined Citigroup $136 million for failing to fix long-standing data governance and reporting issues. This penalty came on top of an earlier $400 million fine in 2020, underscoring a persistent pattern of weak data controls and inaccurate reporting. Investigations revealed that Citi’s loan files contained basic errors—from wrong maturity dates to inconsistent collateral values—that undermined the integrity of its regulatory stress tests.
The consequences went beyond fines. Regulators restricted Citi’s ability to return capital to shareholders until it proved it could get its data under control, forcing the bank to restructure its leadership team and appoint a chief data officer to oversee governance reforms (Reuters, Financial Times).
Citigroup’s case is a stark reminder: poor data quality isn’t a technical inconvenience—it’s a strategic, financial, and reputational risk. And it’s not unique to banks. Across industries, organizations that fail to manage their data as a critical enterprise asset expose themselves to operational breakdowns, compliance failures, and lost trust.
That’s why a Data Quality Assessment (DQA) is so powerful. Far from being a one-off clean-up exercise, it is a structured, repeatable process to evaluate whether your data is fit for purpose—and to turn it from a liability into a foundation for confident decision-making.
To truly grasp the power of a DQA, you must first shift your perspective. It’s not about achieving perfect data; it’s about ensuring your data is trustworthy enough to drive the outcomes your business depends on. This means connecting every data metric back to a tangible business result.
To make this connection clear, we’ll follow ConnectiMart, a hypothetical e-commerce company we’ve created for this guide. Their leadership team is frustrated by conflicting reports, and they realize they cannot make sound decisions on inventory, staffing, or expansion while key departments are working with different versions of the truth.
They decide to perform their first data quality assessment. Their journey will illustrate the concepts we discuss below.
Without a formal assessment of data quality, key business decisions often rely on guesswork, gut feelings, or reports that are quietly known to be “a bit off.”
This creates a culture of uncertainty. A Data Quality Assessment replaces this uncertainty with objective facts. It’s the first step in establishing a robust system of data governance – a central framework that defines rules, responsibilities, and processes to manage an organization’s data as an enterprise asset.
Building this framework ensures that data quality is not a one-time fix but a continuous, organization-wide commitment.
Trustworthy data has a direct and measurable return on investment (ROI). For a company like ConnectiMart, it means:
In the age of analytics, Artificial Intelligence (AI), and Machine Learning (ML), the quality of your input data dictates the quality of your output.
Your sophisticated BI dashboards and predictive AI models are useless – or worse, dangerously misleading – if they are built on a foundation of flawed data.
A Data Quality Assessment is the essential preparatory step to ensure any future-looking technology investment has a chance to succeed.
Before you can assess the health of your data, you need to know what vital signs to check.
In data quality, these vital signs are formally known as data quality dimensions or pillars. While academics might debate the exact number, six core pillars provide a comprehensive framework for any business to understand its data’s fitness for purpose.
Think of these as the foundational pillars of a temple; if any one of them is weak, the entire structure of “Data Trust” becomes unstable. During a data quality assessment, you will measure your data against each of these pillars.
Accuracy measures whether your data correctly reflects the real-world object or event it describes. It’s the most fundamental dimension of data quality.
At ConnectiMart: A customer’s name is misspelled as “Jhon Smith,” or their shipping address is an old one they haven’t used in years. A product’s weight is listed as 5kg when it is actually 15kg.
The business impact: Inaccurate data leads directly to real-world errors. It results in wasted shipping costs from failed deliveries, incorrect invoices being sent to clients, and frustrated customers receiving the wrong product or communications.
Completeness checks for missing information. Data is complete if there are no empty fields or null values in data records where information is expected.
At ConnectiMart: A staggering 40% of their customer records are missing a phone number, making it impossible for the support team to make proactive follow-up calls. Many product listings are missing dimensions, preventing customers from knowing if a product will fit in their space.
The business impact: Incomplete data stalls business processes. It creates a fragmented view of the customer, prevents sales and support outreach, and leads to flawed analysis because entire segments of data have to be excluded.
Consistency ensures that data representing the same entity is identical and harmonious across different systems or data stores. The modern business runs on multiple applications, and consistency is the pillar that bridges these silos.
At ConnectiMart: A customer’s address is listed as “123 Main St.” in their CRM but “123 Main Street, Apt 2” in the billing system. This causes confusion and requires manual effort to reconcile for financial reporting.
The business impact: These kinds of data quality problems are a primary cause of the boardroom scenario we described earlier. It leads to conflicting reports, a lack of a single source of truth, operational confusion, and wasted hours trying to figure out which number is correct.
Timeliness measures how up-to-date the data is relative to the needs of the business. Data that was accurate yesterday might be useless today if it’s not delivered in time to inform a decision.
At ConnectiMart: Their inventory management system only syncs with the e-commerce website once every 12 hours. A customer purchases an item shown as “in stock,” but it had actually sold out five hours earlier, leading to an oversell.
The business impact: A lack of timely data leads to broken customer promises, poor inventory management, and missed opportunities. In fast-moving markets, stale data is functionally equivalent to wrong data.
Uniqueness, or deduplication, ensures that there is only one record representing a single real-world entity. Duplicate records are a common issue that bloats databases and skews analytics.
At ConnectiMart: “John Smith” and “J. Smith,” both living at the same address and with similar purchase histories, exist as two separate customer records.
The business impact: Duplicates lead to wasted marketing spend by targeting the same person multiple times, a skewed understanding of the customer base (making it seem larger than it is), and a frustrating experience for customers who receive redundant communications.
Validity confirms that data is stored in a specific, standardized format and follows defined business rules. While accuracy is about being correct, validity is about being in the correct form.
At ConnectiMart: A sale_date field is supposed to be in the YYYY-MM-DD format, but hundreds of entries from a legacy system are formatted as MM/DD/YY. These records are ignored by their new analytics software, making sales reports inaccurate.
The business impact: Invalid data can cause entire systems and processes to break. It leads to application errors, an inability to process or analyze crucial data, and deeply flawed reporting that excludes any data that doesn’t fit the expected format.
Now that we understand the pillars of data quality, it’s time to put that knowledge into action.
A data quality assessment is not a vague exploration; it’s a structured investigation with a clear, repeatable process.
This 7-step guide provides a data quality framework that can be adapted for any organization, from a startup to a global enterprise, to move from data chaos to data clarity.
The most common mistake is to begin a DQA with the goal of “cleaning all our data.” This is a recipe for a project that never ends.
A successful assessment is not about boiling the ocean; it’s about solving a specific, painful business problem.
At ConnectiMart: The leadership team doesn’t ask for “better data.” They want to solve two key issues identified in the boardroom: “Reduce shipping errors and costs by 50%” and “Improve the accuracy of our quarterly sales forecast to over 90%.”
This immediately gives the DQA a clear purpose and scope. The project will focus primarily on customer contact/shipping data and recent sales transaction data, rather than getting lost in less critical datasets like website clickstream logs.
Your action step: Identify and write down the top 1-2 business problems your DQA will address. Define which specific data sets (or sets, like ‘Customer CRM Data,’ ‘Sales Ledger,’ ‘Product Inventory’) is in scope for this investigation.
Within your scoped datasets, not all data fields are created equal.
You need to identify the Critical Data Elements (CDEs) that most directly impact your business objective.
Once identified, you must define what “good” looks like for each one by setting a clear quality standard.
At ConnectiMart: For their “Reduce shipping errors” objective, the team identifies the CDEs as Street Address, City, Postal Code, and Country. For the Postal Code field, they set a quality standard: “Must have a completeness score of 100% and a validity score of 100% against the national postal system format.”
Your action step: For your objective, list the 5-10 CDEs that matter most. For each one, define a simple, measurable quality standard (e.g., ‘Completeness > 98%,’ ‘Uniqueness = 100%,’ ‘Timeliness < 24 hours’).
This is the initial discovery phase. Data profiling is the process of using automated tools to scan your data and create a summary of its current state.
Think of it like a doctor taking a patient’s vital signs before making a diagnosis. It’s not the full assessment, but it gives you an essential, high-level overview of where the problems lie.
At ConnectiMart: They run a profiling tool on their customer table. It instantly generates a report revealing that the Postal Code field has 15% null (empty) values, and the Country field contains multiple variations like “US,” “USA,” and “United States,” highlighting a major consistency issue.
Your action step: Use a data profiling tool (many databases and BI platforms have built-in functions, or dedicated software can be used) to run an initial scan of your scoped data. Focus on discovering frequencies, null counts, and value distributions for your CDEs.
With your standards defined and your profiling results in hand, you can now systematically evaluate data quality.
This is a methodical process where you compare the actual state of your data (from Step 3) against your desired state (from Step 2) and score the results.
At ConnectiMart: The team creates a simple scorecard. For Postal Code, the standard was 100% completeness, but the actual score is 85% (a 15% gap). For Country, the standard required one consistent format, but profiling found three variations, resulting in a low consistency score. This process of formally scoring and documenting the gaps is how you assess data quality objectively.
Your action step: Create a simple spreadsheet or scorecard. For each CDE, list its quality standard and the actual measured score from your profiling. Highlight the biggest gaps between your target and your actuals.
Finding an error is useful. Understanding its root cause is transformative. A DQA that only identifies problems without investigating their origins is incomplete.
You need to dig deeper and ask “why” these errors are occurring in the first place.
At ConnectiMart: Their investigation reveals that the missing postal codes are primarily from customers who sign up via a mobile app checkout flow where the field isn’t mandatory. The inconsistent country codes are traced back to a faulty API integration with a third-party marketing tool.
Your action step: For your top 3 worst-performing data elements, play detective. Trace the data’s lineage through its entire data pipeline. Is the root cause a flawed system integration? A missing validation rule? A confusing user interface that encourages human error during data entry?
Remediation typically follows two paths.
The first is data cleansing – the short-term, reactive fix to correct existing errors.
The second is process improvement – the long-term, proactive solution to prevent new errors from happening.
This involves improving your overall data collection and management procedures.
At ConnectiMart: The short-term fix is to run a script to standardize all country variations to a single format (“USA”). The crucial long-term solution is to fix the faulty API and, more importantly, make the Postal Code field mandatory in the mobile app’s code. This stops the bleeding at the source.
Your action step: For each root cause you identified, define both a short-term cleansing action (e.g., ‘Manually correct invalid entries’) and a long-term process improvement (e.g., ‘Add validation rules to the data entry form’ or ‘Fix the faulty API’).
You cannot fix everything at once. The final step is to prioritize your list of solutions based on which actions will deliver the most business value for the least amount of effort.
At ConnectiMart: Fixing the mobile app to require a postal code is a medium-effort, high-impact task, as it directly addresses their shipping objective and prevents future errors. It becomes Priority #1. Manually cleansing thousands of old, misspelled names is a high-effort, low-impact task, so it’s placed much lower on the priority list.
Your action step: Create a simple 2×2 grid plotting Business Impact vs. Implementation Effort. Place your proposed solutions on this grid to visually determine your top priorities. Use this to create a simple roadmap outlining what will be fixed, by whom, and by when.
The most brilliant data analysis is useless if it sits unread in a forgotten folder. After completing your assessment, the final – and most crucial – step is to present your findings.
A common failure point for technical projects is delivering a dense, jargon-filled document that business leaders can’t understand or act upon.
A successful assessment report is not a single document; it’s a layered story tailored to different audiences. It must provide a high-level, bottom-line summary for the C-suite while also offering the detailed evidence required by the technical teams who will implement the fixes.
The key is to structure your report to serve both needs without overwhelming either party.
Audience: C-Suite, VPs, Business Directors.
Goal: Communicate the business impact and secure buy-in for action in under five minutes.
This should be the very first page of your report. It must be highly visual, concise, and completely free of technical jargon.
A good executive summary often looks more like a dashboard than a document. It translates data quality scores into the language of business: money, risk, and opportunity.
At ConnectiMart: Their executive summary leads with a large, color-coded grade: “Overall Customer Data Health: D+.” It immediately follows with the bottom-line impact: a chart linking their 15% postal code error rate to an “Estimated $250,000 Annual Loss from Shipping Errors & Failed Deliveries.” It concludes with the top three recommended actions in plain English, such as “Fix the customer checkout process on our mobile app.”
Audience: Data Architects, Engineers, Analysts, IT Managers.
Goal: Provide the detailed, undeniable evidence needed to understand the problem and design a solution.
This is the body of your report. It contains all the supporting evidence for the conclusions presented in the executive summary.
Here is where you include the detailed scorecards from your assessment, specific examples of bad data (screenshots are highly effective), lists of affected database tables and systems, and the thorough root cause analysis for each major issue.
At ConnectiMart: This section of the report includes the full spreadsheet showing the quality scores for every Critical Data Element. It has screenshots highlighting the inconsistent “Country” field values and includes the technical notes that trace the error back to a specific, faulty API endpoint in their marketing software.
Audience: All stakeholders, especially Project Managers and Department Heads responsible for implementation.
Goal: Translate findings into a clear, prioritized action plan.
This final section bridges the gap between diagnosis and cure. It should be a clear, concise presentation of the prioritized roadmap you developed in Step 7 of the walkthrough. It moves the conversation from “Here’s the problem” to “Here’s how we’re going to fix it.” It should clearly outline what will be done, who is the responsible owner for each action, and the expected timeline for completion.
At ConnectiMart: Their roadmap is presented as a simple table. The top item is “Fix Mobile App Validation,” listed as Priority 1, with the “Mobile Development Team” as the owner and a deadline of “End of Q3.”
A well-structured report like this builds consensus, aligns teams, and turns the insights from your data quality assessment into funded, supported, and successful projects.
A successful data quality assessment will provide a powerful snapshot of your data’s health at a single moment in time.
But data is not static.
New customers are added, transactions occur every second, and system migrations introduce new complexities. A one-time cleanup effort is like a crash diet; without changing your daily habits, the problems will inevitably return.
The true goal is to move from a reactive project to a proactive culture of quality. The findings from your DQA report are the business case for building this sustainable, long-term program.
The most effective way to maintain data quality is to establish a formal data governance program. This program takes the standards, rules, and priorities identified in your DQA and operationalizes them.
It’s a permanent, cross-functional commitment to strategic data management as a critical enterprise asset.
At ConnectiMart: Seeing the potential $250,000 annual loss, the leadership team doesn’t just approve the recommended fixes. They approve the creation of a permanent Data Governance Committee, chaired by the COO, to oversee the health of their most critical data assets.
A core principle of data governance is that data quality is a business responsibility, not just an IT problem.
This is achieved by appointing Data Stewards – business leaders or subject matter experts who are made formally accountable for the data in their specific domain.
At ConnectiMart: The Head of E-commerce is assigned as the official “Data Steward” for all Product Information data, while the Marketing Director becomes the steward for Customer data. Their responsibility is to ensure the ongoing quality, completeness, and data integrity – the structural and logical soundness – of their respective data domains.
You cannot maintain quality at scale through manual spot-checks. The data quality standards you defined in your assessment should be configured into automated monitoring tools.
These tools act as permanent watchdogs, continuously scanning data as it enters your systems, flagging anomalies, and automatically alerting the appropriate Data Steward when a rule is broken. This shifts the paradigm from finding old errors to preventing new ones.
Building a successful data governance program and implementing the technology to support it is a significant undertaking.
This is why many industry leaders choose to partner with specialists to accelerate their journey and ensure success. A firm like Murdio excels at implementing enterprise-grade data governance platforms such as Collibra, which provides a central command center for defining policies, automating data quality standards, and managing the entire data lifecycle.
For organizations like ConnectiMart, taking the step from an initial assessment to a mature program managed in a platform like Collibra is what truly turns data into a lasting, trustworthy asset.
This evolution from reactive, manual checks to proactive, holistic monitoring is the core of modern data observability.
The journey from data chaos to data clarity begins not with a massive technological overhaul, but with a single, focused commitment: to ask, “Can we trust our data?” and to have a structured process to find the answer.
This guide has walked you through the entire lifecycle of that commitment. We’ve moved beyond the technical jargon to anchor data quality to tangible business ROI.
We’ve explored the six core pillars that define “good” data, provided a detailed 7-step walkthrough to perform an effective assessment, and shown how to communicate the findings in a report that drives action.
Ultimately, a data quality assessment is more than a diagnostic tool; it’s the catalyst for a cultural shift towards lasting data governance. It provides the undeniable business case and the operational roadmap for this transformation.
Whether your organization takes the first step with a targeted internal assessment or accelerates its journey by partnering with specialists like Murdio to implement a comprehensive platform like Collibra, the destination is the same: a future where data is no longer a source of boardroom uncertainty, but is instead your most reliable and powerful asset.
Build that foundation of trust, and every decision, every strategy, and every customer interaction that follows will be stronger for it.
© 2025 Murdio - All Rights Reserved - made by Netwired