1 Executive Summary
An ever-increasing number of devices, sensors and people are connected to the global internet and generate, communicate, share and access data. The volume of this data has become so large in recent years, that it is referred to as “Big Data”. However, in itself, Big Data has no meaning and little value; it is the analysis and interpretation of Big Data into Smart Information that creates value. This Smart Information has already proven valuable in a wide range of areas. However, there are also increasing concerns around the ethics relating to the way in which some data is collected and analysed. This analysis can impact on individual privacy and safety as well as on society itself. This report describes how better information stewardship for Big Data can address these challenges.
Smart Information is Big Data analysed to make it useful - for example to improve effectiveness, to help make better decisions and to more accurately forecast likely outcomes. However, the value of this analysis can be compromised if the Big Data is not trustworthy. For example - if its source is in doubt, the validity of the data is uncertain or the right to use the data is not known.
The normal information security challenges apply equally to Big Data. The infrastructure used to acquire, store and analyse Big Data acquisition needs to be secure. However, some of the technology that underlies the processing of Big Data was conceived to provide massively scalability rather than to enforce security controls. Good data management is essential to protect the value and veracity of Big Data; however, Big Data turns the classical information lifecycle on its head. Internally generated Big Data is often unstructured and may have no formal owner or classification. The provenance of externally acquired Big Data may be doubtful and its ownership may be subject to dispute. These factors also increase the ethical and compliance challenges around the use of Big Data.
To meet these challenges organizations should implement information stewardship for Big Data. Information stewardship uses good governance techniques to ensure that Big Data is acquired, used and managed in ways that are ethical, compliant and secure.
The infrastructure for processing Big Data should be secured; it should be acquired, built, run and managed using the same disciplines as for other kinds of data processing.
The lifecycle of Big Data should be properly managed. Big Data should have an owner who is responsible for its classification as well as control over its use and lifecycle. Big Data that originates externally must be from verifiably trustworthy sources. Access to Big Data should be subject to the same rules of access governance as other data. There should be a clear policy for the retention of Big Data and processes to ensure that it is securely disposed of at its end of life.
Good Information Stewardship helps to ensure that Big Data is used in ways that are ethical, compliant and secure. It demands that the potential impact of any use of Big Data both on the organization and externally (for example on the data subjects) should be assessed. The risks identified should be managed using controls that can be audited. KuppingerCole recommends that organizations should implement good information stewardship for all data processing.