Today, organizations are capturing trillions of bytes of data every day on their employees, consumers, services and operations through multiple sources and data streams. As organizations explore new ways to collect more data, the increased use of a variety of consumer devices and embedded sensors continue to fuel this exponential data growth. Large pools of data, often referred to as data lakes, are created as a result of this massive data aggregation, collection and storage – which remains the easiest of all processes in a Big Data and BI value chain.
What’s concerning is the complete ignorance of data owners, data privacy officers as well as security leaders towards a defined scope for collection and use of this data. Very frequently, not only the scope for use of this data is poorly defined but the legal implications that might arise from the incompliant use of this data remain unknown or are ignored in broad daylight.
An example that recently made it to the news was the storage of the millions of user passwords by Facebook in clear text. There was no data breach involved, nor the passwords were abused but ignoring the fundamentals of data encryption outrightly puts Facebook in an undeniable defiant position against cybersecurity basics. The absence of controls for restricting users’ access to sensitive customer data further violates the data privacy and security norms by allowing the user passwords to be freely accessed for potential abuse by 20,000 Facebook employees.
It is important for data owners, privacy officers and security leaders to know what data they have in order to classify, analyze and protect it. Obviously, you can’t protect what you don’t know you have in your possession. Therefore, it's necessary for data leaders to have a continually updated catalogue of data assets, data sources and the data privacy and residency regulations that the data elements in your possession directly attract.
Most Big Data environments comprise of massive data sets of structured, unstructured and semi-structured data that can’t be processed through traditional database and software techniques. This distributed processing of data across unsecured processing nodes put the data as the interactions between the distributed nodes are not secured. A lack of visibility into the information flows, particularly the unstructured data leads to inconsistent access policies.
Business Intelligence platforms, on the other hand, are increasingly offering capabilities such as self-service data modeling, data mining and dynamic data content sharing – all of which only exaggerates the problem of understanding the data flows and complying with data privacy and residency regulations.
Most data security tools, including database security and IAM tools, only cater to the part of the problem and have their own limitations. With the massive collection of data through multiple data sources including third-party data streams, it becomes increasingly important for CIOs, CISOs and CDOs to implement effective data security and governance (DSG) for the Big Data and BI platforms to gain the required visibility and appropriate level of control over the data flowing through the enterprise systems, applications and databases.
Some security tools and technologies that are commonly in use and can be extended to certain components within a Big Data or BI platform are:
- Database Security
- Data Discovery & Classification
- Database & Data Encryption
- UBA (User Behaviour Analytics)
- Data Masking & Tokenization
- Data Virtualization
- IGA (Identity Governance & Administration)
- PAM (Privileged Access Management)
- Dynamic Authorization Management
- DLP (Data Leakage Prevention)
- API (Application Programming Interface) Security
There remain specific limitations of each of these technologies in addressing the broader security requirements of a Big Data and BI platform. However, using them wisely and selectively for the right Big Data and BI component potentially reduces the risks of data espionage and misuse arising from these components and thereby contributing to the overall security state of the environment.
Data governance for Big Data and BI is fast becoming an urgent requirement and has largely been absent from the existing IGA tools. Existing IGA tools provide basic access governance, mostly for structured data but lack in-built capabilities to support the complex access governance requirements of the massive unstructured data as well as do not support the multitude of data dimensions required for driving authorizations and access control including access requests and approvals at a granular level.
It is therefore recommended that security leaders work with application and data owners to understand the data flows and authorization requirements of the Big Data and BI environments. Besides practicing standard data sanitization and encryption, security leaders are advised to evaluate the right set of existing data security technologies to meet the urgent Big Data and BI security requirements and build on additional security capabilities in the long term.
We, at KuppingerCole, deliver our standardized Strategy Compass and Portfolio Compass methodology to help security leaders assess their Big Data and BI security requirements and identify the priorities. The methodology also helps leaders provide ratings to available security technologies based on these priorities – eventually providing strong and justifiable recommendations for use of the right set of technologies. Please get in touch with our sales team for more information on relevant research and how we can help you in your plans to secure your Big Data and BI environment.