Session at the European Identity & Cloud Conference 2013
May 16, 2013 11:30
KuppingerCole's Advisory stands out due to our regular communication with vendors and key clients, providing us with in-depth insight into the issues and knowledge required to address real-world challenges.
Unlock the power of industry-leading insights and expertise. Gain access to our extensive knowledge base, vibrant community, and tailored analyst sessions—all designed to keep you at the forefront of identity security.
Get instant access to our complete research library.
Access essential knowledge at your fingertips with KuppingerCole's extensive resources. From in-depth reports to concise one-pagers, leverage our complete security library to inform strategy and drive innovation.
Get instant access to our complete research library.
Gain access to comprehensive resources, personalized analyst consultations, and exclusive events – all designed to enhance your decision-making capabilities and industry connections.
Get instant access to our complete research library.
Gain a true partner to drive transformative initiatives. Access comprehensive resources, tailored expert guidance, and networking opportunities.
Get instant access to our complete research library.
Optimize your decision-making process with the most comprehensive and up-to-date market data available.
Compare solution offerings and follow predefined best practices or adapt them to the individual requirements of your company.
Configure your individual requirements to discover the ideal solution for your business.
Meet our team of analysts and advisors who are highly skilled and experienced professionals dedicated to helping you make informed decisions and achieve your goals.
Meet our business team committed to helping you achieve success. We understand that running a business can be challenging, but with the right team in your corner, anything is possible.
Session at the European Identity & Cloud Conference 2013
May 16, 2013 11:30
Session at the European Identity & Cloud Conference 2013
May 16, 2013 11:30
Next talk will be again, be given by Mike, which we'll now discuss the ownership or stewardship of data Infor, sorry, for of information, of course, in that context. Thank you. Yes. So now I'm going to take all the fun out of big data. This is the security part of it. So I've called this information stewardship, and if you follow KuppingerCole an Analyst reports, you'll see that late last year, I, I together with my colleague, Dave Kearns produced a report called information stewardship from information from data leak prevention to information stewardship.
And this has come from the notion that we, we, we we've had a lot of thoughts about technologies and processes and reasons why we should be able to manage information leaks, but, but they haven't really succeeded in preventing them. So what is it that's needed? And this is this word, stewardship in English comes from the notion of a servant, a servant who was a steward. And the steward's job was to look after the data to look after the property that his master owned. So being a good steward is looking after something that you don't own.
And since most of us have access to and touch or manipulate data that we don't own, then we really ought to treat it in that way and be good stewards and make sure that we don't lose it. We don't damage it. We don't destroy it. So how does that concept apply to big data? So what I'm gonna talk about is some of the threats, I'm gonna talk about what information stewardship means and how things are changed by big data, some of the technology challenges, and then how in order to secure big data, what you really need is information centric security.
So let's look at the risks and threats, basically that the things that can go wrong, the threats, the risks are just the same as for ordinary data for ordinary information. And when you look at them, although there is always a technology component, I it's like the American say that it isn't guns that kill people. It's bad people with guns. So there's always, it seems a human factor in it. And I've put these root causes in terms of basic human terms, for example, that there is the malicious reasons for security risks.
Those come from people who are wanting to steal your data and they may be insiders. They may be outsiders. And to give examples, we talked earlier on about these advanced persistent threats, coordinated threats, and so forth, but they also come from the insider who is in fact, wanting to exploit your information for their own purposes, or to take the information and sell it to somebody else. Then there is misuse, which is not quite the, the, the same as, as, as, as malice. It may be that you didn't realize you just, you thought it was okay.
There, there is a way of using things that you have been validly given, like, for example, your privilege as being a database administrator and you are abusing it through curiosity, for example, to see what it is that certain people are doing or what they earn or what disease they suffer from. And then a very wide area is to do with mistake, which often comes from people who are trying to do the right thing. And there's a whole interesting set of stories about people making mistakes. And it's interesting. It doesn't have to involve technology.
I, I, I follow this and, and it looks like Downing street is the, which is where the prime minister and the chancellor of the exte live in, in the UK. I is a dangerous place to carry a document because if you walk down Downing street, carrying a document, there are all these photographers, the para peri with very powerful cameras who take photographs of this. And every couple of months, some government person goes wandering down, Downing street, carrying a secret paper, which is visible externally. And of course the photographers photograph it, and then they blow it up.
And then they say what an idiot the chap is. And so that sort of, kind of accidental disclosure, not involved with the use of it technology at all, but those things apply equally to big data now.
So what, what is it that, that big data actually makes it more difficult to keep a secret? So if you own information, if, if you, you have trade secrets, then it may be possible for those those trade secrets to be found by looking at an aggregating data about your organization. If you have a secret recipe for your product, can somebody figure it out by looking at all the information to do with the volumes of stuff that your business buys? So big data gives the opportunity to people to draw useful information from what seemed like aggregations of multiple facts.
And like it says here, big data can also be used by the bad guys. And I talked about that in the earlier talk, in order to improve their techniques for stealing your identity for refining their exploits and for analyzing how well they have been performing when they've been trying to steal data. So that's, if you will, the twist in the tail. So I'm now going to talk about stewardship, which is what I introduced at the beginning, which is looking after property, which is not your own, which certainly applies to, to, to information and to big data. So information stewardship, isn't a new idea.
And this has been written about certainly since the 1980s. And I've given you an example there, which came out in 1999 of a man who was writing about information stewardship, but the focus in the past has really been on.
I, if you will, the right hand side of that circle, it's been to do with the architectures of data it's been to do with information quality, and those are serious issues in traditional data. And in fact, they become even more serious in, in, in, in, in the question of big data. One of the areas that perhaps was neglected in all of this once the left hand side, which is to do with security. And so we are in, KuppingerCole saying that there should be a great deal, more attention paid to the information security end, but this all starts from the information lifecycle.
And one of the interesting things about big data is that it actually turns the normal life cycle on its head because the, the way that we've organized ourselves tends to have been around the notion that there is a, a data owner, an information owner that you can identify in the organization who is responsible for defining the acceptable uses of that data and who should actually have access to it for what business purpose. And so in this classical data life cycle information, life cycle, you see, that's what happens now with big data.
It may not be like that because certainly you don't for a great deal of the, the, the stuff that's called big data. It comes from outside of the organization, so you don't own it. And so you don't necessarily have this, this owner who you can immediately make responsible for things. And the other problem is that you may not know what you are going to find in it until you've actually done the analysis. So you've got this acquisition process, you've got an analysis process, a discovery process, which then leads to what you're going to use it for.
So it kind of alters the whole perspective of how you have to look at information and security. So I, I, I I've put on this slide, what I think are the top challenges, which are, how do you secure the big data infrastructure, cuz a lot of it is outside of your control. It's like cloud, so to speak.
And how do you, how, what confidence do you have in the architecture and the implementation of this new generation hardware and software, which is in involved in this and basically the, these things are as yet unproven pieces of technology, when you are analyzing the data, how do you know that the data is coming from where you think it is coming from and how do you have some kind of access control? Now you may say, oh, that that's sort of not necessarily a, a big thing.
But when I talked about the smart meters, when I talked about the smart meters in the previous talk, one of the key issues that the, the people who are using smart meter data have, is being able to be sure that the information, the data packets that come from one particular smart meter can be identified without any ambiguity and without any doubt. And that itself, interestingly raises a whole series of issues around data Providence. For example, what technology do you use?
And I, I saw a, a previous talk where we were talking about, can you use an HHA one key, you know, to, to, to, to, to sign a document? Is that going to be good enough? How long does a smart meter last probably 30 years, are you going to be sure that the encryption technology, the signing technology, the certification technology that you are using today is going to be good enough for that? So that is, is, is a kind of a simple example around the data Providence.
And, and once somebody, once people realize that there is an ability to spoof data, then you could unsettle, if not bankrupt your competition by feeding data to them, that they think is about what's happening with their customers. But in fact is just rubbish that you've been sending them. And so this whole area to do with analysis and compliance, where you can be sure that, you know, where your data's coming from, that you have a scalable way of enforcing any privacy that surrounds it through to being able to audit what you are doing.
And we, my, my colleague, Dr. Will be talking about the privacy issues in the subsequent speech. So this Providence and ownership, if you don't actually control the collection process, how can you be sure of the source and the, the, the, the, the ownership of that. And this is specifically written in the network. These are quotes from the network operators documents in the UK, the planning documents that they have surrounding the, the implementation of smart meters, that they need to be sure where that data is coming from. And the SA the same is true from the aircraft engine.
The same is true from the mining equipment. The same is true from people who are doing wind turbine planning, whatever, if you are taking Twitter feeds for, for that. And just to, to, to sort of give you an, an interesting sort of aside, I didn't talk about this in, in the previous talk, but what do you think people do when they're watching television?
Well, there was a BBC study, which said when people are watching television, most of them are either Twittering on their iPads or they are playing games. Now that's an important insight, and it's actually got very big relevance to big data because that information is being used by the UK TV channels. And so the UK TV channels are actually monitoring tweets and game downloads and online gaming and correlating that information with their TV programs that they they have.
And they're using the analysis of that data to better target adverts during those programs, to what it is that people are going to be interested in. Now, as Martin said, in the previous thing, they haven't yet got round to the, the, the detail of being able to send you an individual advert, but what they are doing is they are aggregating that information. So they can say, well, people watching this program are more likely to be interested in buying this product and using that to preferentially, sell their advertising time to people, to organizations in that way.
So that then takes you round to the question of well, who owned those tweets, who, who, who had the right to use them in that particular way? And in the case of smart meters that I put there, there is the issue, is it, it can the network operate to be sure that they have the permission of the people that own the meter to use that data in this particular way. And that's another issue that will be brought up in the next thing. There we are.
So that's, those are provenance and ownership of data. Then we have the technology channels challenges, and let's look at some of this, these particular things. First of all, I'm not sure if everybody understands that the architecture of Hadoop, but what had is is you have a rack of what are apparently commodity Wintel type processes with, with discs associated with them. And your data is imported into this, and it is fragmented and spread around this, so that, and it's replicated as well.
So this isn't a long term storage device, but it is a parallel processing device so that you can basically divide up the processing into little chunks, each of which is done by one of those processes and there's enough resilience and redundancy in it. So that if one of the things fails, the thing will carry on. When each of them has done its job, it will merge the results together. Just like I said, at the very beginning, a sort and merge, but this is, this is just commodity hardware.
And well, you know, is it full of full of things coming from, from foreign parts? I don't know, but nobody has had time yet to understand what the risks might be. Can you be sure that if you give it a processing job, that it actually does it, has it been certified in some way or another, has it been verified in some way? Do you have a way of controlling access to it when in fact the data's taken away, if you use something like elastic map per reduce, can you be sure that your data has properly gone? Do administrators have access to this thing? Is the data properly, properly erased?
And how can you audit what's happened if it gets a particular answer? How can you be sure that that answer is correct? So you you've, you've got that and you have potentially the problem, which is that you could have been doing all of that using, like I said, elastic map, reducing the cloud and everything that you had to do with worrying about the cloud now applies to this processing. So that generally speaking is not intended for long term storage, but the, the transition through this could in fact leave residue and you have, how do you actually verify the results that come out?
You could very, very obviously put yourself in a difficult position if the results are wrong. So having a way of verifying your answers through some kind of other technology is, is, is a, an advisable approach. So what do we need to do to, to achieve information stewardship? And one of the problems that we I see in organizations is that there is a, a loss or a lack of understanding within the organization of the true value of the information that they own. Whereas in the 19th century, capital itself was really sufficient to build an organization. Now you can easily get capital.
What, what, what makes the difference is the intellectual property, the knowledge, the information that you have, and basically the four pillars that, that Martin talked about in the opening keynote are that you need good information governance. And that means understanding what the business need is for this, and then implementing some kind of best practice.
Well, that's an interesting challenge because we don't really have the best practice, but we might understand the business need of why we're doing it. That with big data, you also need to manage the information lifecycle, which is knowing what, what you have. And that I've explained to you, some of the challenges that we have around that with big data that it's coming from outside, you may, you're not sure where it's coming from and you, you, you may not know what you're going to do with it until you've actually processed it.
What you can do though, is you can make sure that you create an information security culture, and that's all about changing the perception of information security within the organization from being something that sucks, all the fun out of the universe into something which is actually a truly a business enabler, and making sure that you have organized yourselves in such a way as to best implement those things. So what we believe in KuppingerCole is the key to this is information-centric security.
So much of what has been written and continues to be written about information security in it is all written in technology terms. The answer is all seen in terms of buy another piece of this, have another piece of software put in a bigger firewall or whatever. But in fact, there are three key parts to this. One is what is security. And we all know that confidentiality, integrity, and availability.
We then have to have processes that support that and the processes with big data are to do with understanding the purpose, the process for requiring it, the Providence and so forth, and make sure that those processes include a capability to audit what is happening.
And then you have a series of elements which are to do with making sure you understand the Providence of the data, making sure you understand how you can manage privileged access to it and making sure it's in fact, properly stored in a way which is secure and you have control over it, and you can actually get back what it is you put in it. And then really what is the, the, the data flow? Are you in fact, encrypting the data flow is, is there, is there a way of making sure that what arrives is, what, what you expected to be? And it hasn't been tampered with along the, along the way.
So all of those things are elements which need to be implemented. And in terms of what it means, you have to secure this infrastructure, you have to secure this analysis process and you have to have what I call assured compliance. And this slide is illustrating some of the components which go into that.
Now, in fact, it's, if, if you're really interested, we have a, a paper which, which goes into this in more devs, which you can download from the KuppingerCole website. So that in F essence is how I have taken all of the fun out big data and put information stewardship and security back at the center of it. So thank you very much, everyone. So one big issue as we discussed among these days is we may not know in advance what we're going to look for. Big data as, and also Martin were post pointing this out. Everything is real time in an ideal, big data scenario. So we don't plan advance.
We don't plan the store, which we don't plan the data availability. We, we may expose new APIs on the fly with queries.
We, we, we may not imagine right now. So question is how, how can information stewardship apply or what, what would you do to prepare an organization to cope with that real time issue?
Well, I, I think the, the point is that what I see is that many organizations are already ill prepared to deal with what they have at the moment, and actually making a better job of the way you understand and manage information security at the moment is the best foundation that you can lay for big data that, that if you don't already have good processes, if you don't have a culture that you've inculcated in your employees, associates and partners, that says that information matters. If, if you haven't got an organization, that's there, that's concerning about it.
If, if the basic people that are running the organization don't understand the value of the information they have, then it's only going to get worse when it comes to big data. And at the moment, what all I can say in terms of reassurance is the, the reality of big data today is a great, it's a long way short of this vision that we've been putting forward of being able to manage, to take, get value instantly out of real time feed, for example, and the example I gave of the television channel, they're doing that.
And what that actually means is they literally once a day reduce a 20 billion line table to a 5 million time label line table, which they then put through a standard business analytic tool. And that's at the very minimum, a 24 hour delay. So the best that I have seen in terms of the use of big data has been to do with, with the monitoring of the monitoring of online reputation, where virtually every organization now has people, it's the human intelligence component of big data monitoring Twitter feeds for comments about comments, about their product.
And one of my acquaintances bought a mobile phone. He was so pleased with the service that he received, that he tweeted what a good, good time he'd had in this shop buying from this particular thing. And within two hours, he'd received an email and a phone call thanking him for his comments. So that is two hours for detecting and seeing what is happening on, on Twitter, about online reputation to thank you.
Now, if they had got that wrong, then that could have been a big problem. And, and so that's the closest that I'm seeing organizations have actually managed to get towards that, but basically it's, you should already be prepared to have an information security culture, Indeed, starting with an existing information security management program makes necessary to define the information stewardship slash ownership. Yes. But in advance, But in advance. Yes. Right. So there's still things to, to, to understand and to learn how to approach this real time Thing.
Well, that that's right. So, so, but the point I'm making about information stewardship is that, although you may not know in advance what it is you are going to get from this data. If you have created that culture, then people will be sensitive to the fact that information is valuable and they will take care of it. Even if they don't didn't realize what they were going to find in the first place.
And that's the, that's the key thing it's, it's having that view that whatever the coinage, you know, if I have in my, in my pocket, if I have euros dollars pounds, I was going to say, Deut Martin, which probably more valuable than any of those. If I have any of those Curr, I still know their money and I take care of them. That's that's the point. Okay. So I'm not totally convinced. I think there's still room for things you can discuss later on. Yes. Yes. Let's have A thank you, Mike. Thank you. Very good.