Insights

Case Study: Discovering, classifying and cataloging unstructured data for a European bank

Managing and protecting sensitive data hidden in unstructured sources is a major challenge for financial institutions. Our client, a leading European bank, stored critical information in thousands of PDF contracts, legal documents, and emails spread across multiple systems. This made it difficult to locate sensitive data, ensure compliance with data privacy regulations, and maintain an… Continue reading Case Study: Discovering, classifying and cataloging unstructured data for a European bank

Karolina Fox Profile Karolina Fox
Published on: Updated on:
Case Study: Discovering, classifying and cataloging unstructured data for a European bank

Managing and protecting sensitive data hidden in unstructured sources is a major challenge for financial institutions. Our client, a leading European bank, stored critical information in thousands of PDF contracts, legal documents, and emails spread across multiple systems. This made it difficult to locate sensitive data, ensure compliance with data privacy regulations, and maintain an accurate data catalog.

By leveraging the Collibra-Ohalo Data X-Ray integration and Murdio’s team of technical experts, the bank automated data discovery, classification, and cataloging, transforming how unstructured data is governed at scale.

The Challenge

Key challenges included:

  1. Unstructured data across multiple sources
    Critical data was hidden in PDFs, emails, and file shares, making it nearly impossible to discover and govern manually.
  2. Limited visibility into sensitive and personal data
    The bank needed to identify PII (Personally Identifiable Information) and confidential information to reduce compliance and data security risks.
  3. Manual, time-consuming data discovery
    Without automation, reviewing documents was slow, inconsistent, and prone to human error.
  4. Compliance and retention pressure
    Banking regulations (e.g., GDPR, records retention) required accurate classification and control of unstructured data.
  5. Collibra not fully leveraged
    Collibra was used as a data catalog but it could not natively scan and classify content inside PDF files.

The Solution

Murdio implemented the Collibra + Ohalo Data X-Ray integration, enabling automated discovery, classification, and governance of unstructured data.

Automated data discovery and classification

  • Ohalo Data X-Ray used OCR (Optical Character Recognition) and AI to scan PDFs, emails, images, and network drives.
  • Detected entities were mapped to data classes/terms (e.g., PII categories) and surfaced as findings; curators can promote or refine mappings in the catalog.

Integration with Collibra Data Catalog

  • Detected data was synchronised into Collibra Data Catalog as findings and/or technical assets.
  • Findings were mapped to business categories and physical locations (source system + exact path/URI; region when available).
  • This provided end-2-end visibility across documents and systems (who/where/what), enabling policy and remediation.

Automated file and directory governance

  • Hierarchies were automatically built from file structures.
  • Triggered Collibra workflows for review and actions.
  • Enabled scalable governance without manual effort.

Dynamic metadata management

  • Data X-Ray continuously updated metadata in Collibra Data Catalog.
  • Ensured accuracy, freshness, and alignment with policies.
  • Supported ongoing compliance and audit readiness.

AI governance enablement

  • Leveraged Data X-Ray’s role-based access, monitoring, audit trails, and data lineage.
  • Established trustworthy data foundations for AI initiatives.

Ensuring Compliance and Risk Mitigation

Murdio’s implementation delivered full transparency and control over unstructured data:

  • Identification and classification of sensitive and personal data
  • Data lineage and audit trails for regulatory reporting
  • Support for file retention and disposition policies
  • Reduced legal and operational risk
  • Strong foundation for GDPR and banking compliance

Results

The Collibra-Ohalo integration delivered measurable benefits:

  • All PDF-based data automatically discoverable in Collibra Catalog
  • Centralized view of sensitive and high-risk information
  • Compliance risks significantly reduced
  • Massive time savings compared to manual document review
  • High accuracy and scalability validated by Murdio’s experts
  • Collibra evolved from a static catalog to an active governance platform for unstructured data

Conclusion

This project demonstrates how combining Collibra Catalog, Ohalo Data X-Ray, and Murdio’s technical expertise enables banks to transform unstructured documents into governed, searchable, and trusted data assets.

By synchronizing technical metadata, then automating discovery and classification – and feeding the results back into Collibra – the bank gained full visibility into sensitive data, reduced compliance risk, and accelerated data governance maturity.

With a scalable integration, continuous metadata updates, and automated workflows, the bank is now equipped to govern unstructured data with the same rigor as structured data while preparing for AI, regulatory change, and future growth.

Share this article