1 Introduction
A very significant portion of critical corporate data resides in what commonly is named “unstructured data stores”. These stand in contrast to structured data held in systems such as SAP, in the CRM, in databases, and others. Unstructured data stands for data held in document formats such as text documents, slide decks, spreadsheets, or PDF files, and stored on file servers, collaboration platforms, or even worse on file storage in the Internet. Factually, a significant portion of data held in central IT systems such as the ERP or CRM transforms, sometimes even automated and on schedule, into such unstructured data. Having 100’s of regular exports from the various SAP systems in place is the norm, not the exception.
That data always has been critical:
- Information from the R&D department can contain valuable intellectual properties of the organization, being obviously interesting to external parties, but being relevant also in the context of product liability.
- Financial data, e.g. when preparing annual statements, is sensitive and must not leak before a certain date when a company is listed.
- PII (Personally Identifiable Information) always has been sensitive. Nowadays, in the age of GDPR, this risk has grown massively. Not only must such data not leak, but organizations also need to well understand where it resides, to comply with the GDPR data subject rights.
Unfortunately, the vast majority of organizations still are not in control of unstructured data. IAG (Identity and Access Governance) as the core discipline of IAM (Identity & Access Management) comes to its limits when it is about identifying, managing, and protecting unstructured data. It neither supports the identification nor the fine-grain management of entitlements e.g. on Windows file servers or Microsoft SharePoint environments. The focus is on cross-system management of identities and access, not on the level of granularity required for environments such as the ones named above.
Solutions supporting administrators of such environments, e.g. Windows administrators or SharePoint administrators, commonly are too technical and specialized on a single environment, while unstructured data sprawls across a variety of different systems.
Over the years, a specific market segment of tools has evolved, which frequently is named Data Governance, but also titled Data Access Governance or Entitlement & Access Governance. The target of such solutions is implementing some form of governance and control for unstructured data, spanning various systems that hold such data. The focus is on identifying and classifying data and the data owners, and managing access entitlements for unstructured data.
Given the fact that standard approaches such as IAG commonly fail in delivering these capabilities, while there is an urgent need for having adequate tools in place, we consider Data Governance as a mandatory element in a comprehensive IAM and Data Protection strategy, complementing IAG and other core components of IAM. Such solutions must support a broad range of repositories for unstructured data, in particular file storage both on premises and in the cloud.
MinerEye DataTracker enables organizations to automatically identify and classify data across a variety of data sources. It makes use of applied AI (Artificial Intelligence) and machine learning technologies to classify data based on binary, pictorial and textual patterns. In contrast to other solutions, it is not limited to analyzing text but can work on any type of data. MinerEye names this approach “AI Powered Information Governance”.