1 Introduction
Data is essential to the Digital Business. It fuels business decisions at all levels, from the strategic to the operational. Data is essential in marketing automation. It is essential for automating manufacturing processes. It is essential for everything around AI (Artificial Intelligence), given that AI and the related ML (Machine Learning) build on large amounts of data.
With the evolution of the past years, the need for managing data has changed. Relational SQL databases or, even earlier, hierarchical databases, have shifted away from the center of attention. Cloud databases of many kinds are common today, with large CSPs (Cloud Service Providers) such as AWS (Amazon Web Services) or Microsoft Azure providing not just one, but sometimes more than 10 kinds of databases. Big data approaches and analytics solutions have become the focus of attention.
However, with the immense growth of both data and data stores, and the multitude of technical solutions, another challenge became apparent: Data that is not known to the users and data that cannot be accessed is of little value. Furthermore, with regulations such as the EU GDPR (General Data Protection Regulation), the need for knowing where certain type of data such as PII (Personally Identifiable Information) resides, has become essential. From a business perspective, though, it is important to note that this is not just a regulatory mandate, but a business mandate. Data can only be used when it is known, and data can only be protected when it is known.
Another trend over the past few years has been what some call "data democratization". Behind that is the fact that there is a need for and concrete use of data by more people than ever before. Making data available helps people at all levels in the business to use that data for their job.
The broadened and the new use cases around data also require new categories of solutions: Data Catalogs, Metadata Management, and Data Governance. These three terms are closely related, increasingly resulting in integrated solutions.
- Metadata Management refers to solutions that enable organizations to manage data across a range of systems, maintaining metadata of the data. This also includes capabilities such as data lineage, i.e., analyzing and documenting the flow of data between various systems.
- Data Catalogs are where this metadata is stored and managed. A data catalog is the central repository that provides a view on the enterprise data across all managed data stores. It enables the use of such data and delivers the capabilities for "data democratization".
- Data Governance builds on Metadata Management and enables control of the use and flow of data.
For businesses, it is becoming increasingly important to have solutions for Data Governance, Data Catalogs, and Metadata Management in place. Only then will organizations succeed in utilizing the potential that is in the data they have and collect. Only then will they succeed in having the data on hand that is needed for business applications, decision support, automation, and AI/ML. And only then, will organizations be able to control the sprawl of data, with new cloud data stores, new analytical solutions, and AI.
Management of cloud data stores; governance of data and fulfilment of regulatory compliance requirements; governance and explainability of AI/ML models; a common understanding of data and its usage; and the efficient utilization of data: All that will only work with strong data management in place. Data Catalogs and Data Governance are the cornerstones of the modern, digital business.