Okay, Paul. Hello, and good afternoon, good evening, good morning, depending on where you're tuning in from. Welcome to this special webinar. My name is Paul Fisher. I'm a lead analyst with KuppingerCole. The title of this is "Data governance is more than good housekeeping". So hopefully we'll give you an idea of what I mean by that and why data governance is assuming new importance in the age of identity management as well. So just a few little housekeeping notes. You as a listener are muted. You don't need to do anything. You need to mute or unmute yourself.
There is a chance to ask questions and answers will be given hopefully at the end of the webinar. You can enter those questions by using the panel in the GoToWebinar control panel. This webinar is also being recorded so that if any of your colleagues or anyone wants to listen to it again, or if you'd like it so much that you want to listen to it again, then it will be available on the Copenhagen Coal website. So our time today is about half an hour. Hopefully give or take, so not taking up too much of your time.
So that should give enough time for me to tell you what I think and also to get any questions in. So let's move on. So I'm going back in time here now. The advertisement you can see on the left is something that I've always remembered from when I was a teenager and was very heavily into audio or hi-fi, as we used to call it back in the 1970s. And at that time, there was a company called Linn, still exists actually, but they had a record player, a turntable called the Linn Sundeck.
And they took the phrase garbage in, garbage out, which originally had come from a computing circles, but they hit on the idea that was contrary to sort of received hi-fi wisdom at the time was that the source of your music was not as important as the amplifier and especially not as important as the loudspeakers. They, which now seems rather obvious, decided that quite clearly, if the source isn't any good, then the amplifier is only going to amplify a bad source or a bad signal and loudspeakers will recreate that bad signal.
So they came up with this advert, which as I said, is stuck in my mind, mostly because of the headline. And we seem to have gone full circle because Linn Sundeck took that phrase from computing. But now we're in a situation where garbage in, garbage out very much applies to the way that we handle data and the way we use data in our organizations. The other couple of interesting things about that advert is that actually that's what our bins in the UK at least used to actually look like. Everybody just piled everything into a bin.
There was no thought to separation or recycling and everything was thrown in the wagon. And the other interesting thing about that advert is the amount of copy there is on it. You don't see adverts these days that have a copy in them. So there's sort of a lost art of copywriting. But nevermind all that. Let's get into what we call garbage data creators and the things that happen, which create bad data in the first place, so bad signals. Obviously the first one is data entry errors, which are often will be human errors where data is entered incorrectly in the first place.
But then you also have, and this comes where governance starts to come in, you have different standards, different policies, different processes for data validation or verification across multiple systems. And multiple systems is the crucial phrase there because we today in our businesses and computing have multiple systems indeed. We have multiple architectures. We have multiple cloud servers. We have all sorts of different environmental factors going on in the way that we compute. In other words, the way we do business.
When companies get together, when companies merge, or even when companies demerge, data is thrust together and then you have a doubling up of the first two data entry areas and different rules of data validation, which makes the situation more complicated. And that lack of interpretation across systems within a complex business process leads to more confusion, more bad data, and a general sense of being out of control. So ultimately, the absence of a data governance program or organization or a department is crucial.
Because if you don't have data governance, if you don't have any oversight about over what is happening with your data, then you're going to be in trouble. And the situation is that many, many organizations will be in this exact position right now. And they will be creating data and they will be creating new places to put that data. So no presentation about data is complete without some big number. And you can go online, you can get various figure estimates of how much data will exist in the world in a few years time.
Well, actually, next year, there will be, according to one source, 175 trillion gigabytes. Wow.
Well, the thing is, it's really hard to actually visualize what that is when you get beyond a certain point. All you need to know is that it is a massive number. How does that actually relate to you in your organization, to you trying to govern data, trying to govern those who access the data? You might as well be looking at a picture of a banana in as much as how much relevance that has to your day to day. So let's look at something more realistic, but actually gives you a fair indication or fair illustration of how much data is being created on a daily basis. So this is actually worked out.
So if you have a company of around 1,000 employees or 1,000 people or 1,000 people, not necessarily employees, of course, they could be third party workers, contract workers, et cetera, which is another thing you have to factor in when you're talking about data. But it's estimated that those people would create about seven and a half megabytes of email data every day. And that's one thing that people often forget about data when they start to try and control it is that it's not just data in a traditional sense that is held in databases.
It's also unstructured data, which is created all the time. And unstructured data exists in emails, exists in social media, exists in Messenger, exists in office documents, and 7.5 megabytes extra each day. And then your office documents, that's reckoned to be about five megabytes each day. Then your traditional database transactions account for another probably 500 megabytes a day. So that's still a big number, and that's still where a lot of the data is going.
Collaboration tools, so I just mentioned those things like Office, but also monday.com or the other collaboration tools that now exist. And they are already outstripping emails in terms of how much data is produced because increasingly we are using things like Teams and Slack to communicate and also, unfortunately, to store more data. You now have a situation where you can have data or documents stored in email without the same documents duplicated in Teams and perhaps duplicated somewhere else on another file sharing application.
Cloud applications, a bit of an underestimate, I think, but it says here two megabytes per employee per day, probably more than that. All of that means that a business each day is going to be producing 24 gigs of data in a single day, which sounds manageable, but it isn't. If you haven't actually documented that, you would have to, to keep control, you'd have to take that 24, 25 gig of data each day, govern it, and that means audit it, decide where it should go, who should have it, whether it's useful, whether it should be deleted. But of course, that doesn't happen. That simply doesn't happen.
And in case you're wondering whether this is all about governance and just doing stuff to please auditors or compliance managers, here's a couple of actual real-life use cases for small businesses where the companies actually did think about data governance, and they did do something about it. And they took the route of finding market data and ways that they could use their data to improve the business. So the Australian pool company, Narrowland Pools, used data in relation to temperature to see when customers started thinking about buying pools.
So obviously, probably when the temperature started going up, then more people would think about buying a swimming pool or perhaps servicing the pool that they have. This is not a big problem in the UK, I think. I have to say that we don't really have many swimming pools in our back gardens for good reason, because it's always too cold. But in Australia, lovely, lovely weather there. But on certain days, they discovered that they had an 800% increase in conversion compared to the everyday. So it was really useful data.
And that was obtained through having a data governance program specifically for them. A completely different kind of company, the Canadian Opera Company, managed to increase its sales calls from 15% to 50% by targeting customers when they were most likely to have opera on their mind. So they looked at various data that showed when exactly those people might be thinking about going to the opera. And they also used it to develop customer profiles based on web visits, their purchasing history, et cetera. So it's not, as I said, it's not just about putting data in its place.
It also has a very tangible business benefit. And that reads into also how you allow people to use data, who uses data, who has access to. So it's not just about data governance. It's also about identity and access. And as you'll see later, I've developed some sort of theories around that. So here's a nice graph here from the Harvard Business Review. And basically, before you can do anything, you need to think about your strategy. What do you want to actually do with this data?
Now, it could be that you want to, for like the Pools Company or the Opera House, you want to find insights into customer behavior. Or it could be that you simply want to ensure that data is protected. Probably both of those things. But you need to have a strategy in the first place as to what you want to do with the data, how you're going to protect it, how you're going to look at it, who gets to look at it, et cetera. So you need quite simply a data governance program.
So you need to think about the data that goes in and the data that comes out so that it is not garbage and it's not garbage coming out. So we get back to that garbage in, garbage out philosophy. And then you need to think about the value of data. How much of the data that is being generated is actually worth anything? Because if it's not doing anything, if it doesn't say anything, if the data is of no value, it's of no business value either. But it might have things in it, which, for example, phone number or an email address of a customer that needs protecting.
So it might not actually add any value to the business, but it would certainly take away value if that data was lost. So if you look at the grid here, they've divided it into defensive and defensive. They've divided it into defense and offense. So basically defense is what I've just said about keeping data secure, but offense is the positive side of that. So to improve competitive position and profitability. The core activities of defense is to optimize data extraction, standardizing, and storage and access. To get value, you need to optimize your data analytics.
You couldn't find out when people are more likely to buy a swimming pool or go to the opera if you didn't have good analytics to do that. So you need modeling, visualization, transformation, enrichment. So they're big words. But a lot of that stuff, it can be found in modern data governance platforms. And a lot of this can be done with the help of AI, particularly these days. Data management orientation. So for defense, you need to control. For the offense, you need flexibility. So you need to be able to get at that data to get the business value that I've been talking about.
And finally, the enabling architecture. So a defense would always have a single source of truth. But if you want to analyze that data, you're likely to get multiple versions of the truth. The single source of truth means this data is valuable. This data contains personal information. It contains financial data. It cannot be left unprotected. And it cannot be accessed by people that don't have access or should have access to that.
The multiple version of the truth is when you can allow people, because you can extract the bits from that data, still give them to the right people, still make it that it's secure and protected, but allow it to be analyzed, put into AI, put into some kind of processing so you can get some answers out of it. Another way of looking at it is also this chart here, also from HPR, which I won't go into detail on this webinar.
But it's certainly worth your time afterwards taking a look at this and seeing how data sources, data storage, and then the analytics will come out with business, customer, and partner outcomes, which should benefit the business. But that's a very nice, I think, a very nice description or a very nice graph showing how good data works, how good data can be stored, et cetera, and the analytics. So prescript, sorry, descriptive, diagnostic, predictive, prescriptive.
So in this particular business focus model, so descriptive captures product's condition, a diagnostic can see why perhaps something isn't selling, or why people aren't buying at a particular time of day, detect patterns. That's probably the most crucial thing that you can do with data for business. And then prescriptive is when, okay, we should hit the phones when people are more likely to be thinking about going to the opera.
So that's, again, you can find the full article from the HPR website. So let's now bring this bit more into focus on where Köppinger-Koll sees identity and data working together. So this is a slide that I use in many, many presentations, and it shows basically how identity flows through a modern business to get to resources. And part of those resources is data, but you can see that in any typical business, there's likely to be seven identity types that will want access to the data. So we have end users, obviously, third parties, increasingly, customers, increasingly, and then we have machines.
So AI, for example, will start to look at the data on your behalf. And then we have the classic access management platform, PAM, CIEM, Cloud Infrastructure Entitlement Management, Identity Access Management, and a little bit off-center this, but we are starting to see identity threat detection. And response tools coming into what I call the identity zoo. ITDR is very, very much focused on defense and defending identities from attack in the most visceral sense, whereas PAM, CIEM, and IAM are much more sophisticated tools in terms of managing how access to data is managed.
And then data is obviously found all over the place, as I've already said. So it'll be in platform of service, infrastructure service, in clouds, on servers, containers, virtual machines, in every application that people want to use. So those resources of files, apps, workloads, et cetera, all of that on the right. And then we have a lot of data, a lot of data that we're going to have access to, a lot of data, all of that on the right.
And the foundational elements of this, probably when we talked about identity flow and management in modern business previously, probably wouldn't have talked about data governance so much. But I believe, and this is what I'm getting at in this webinar, is that we can no longer see these in isolation. Identity and access management, identity flow, and the resources need to be governed.
We need to know, or you need to know, what's in your resources before you can start thinking about, on the left side of this diagram, the access policies and access criteria that you give to identities to get access to that data. Until you know that, what data you have, until you know what it is, where it is, et cetera, until you know the value of that data, it's very difficult to start sorting out who gets access to it and who doesn't. And I think that's where we need a coming together of traditional identity access and data. And I've named this sort of paradigm identity access for data.
And I've taken data now to be a broad description of resources. Resources. So it is pure data, but it is also access. It is data that is used in applications. It's also pure data, as I said, but it's basically resources that identities need access to. So IAD is kind of like a wrapper for identity access and for data governance. So I brought the two together. And now this is quite a new idea. But even without that, there are some things that would work right now, even if you don't accept the identity access data paradigm.
And the things you have to do, very simply, okay, I understand when people like me do short presentations, we tend to simplify things. But within these nine areas is obviously a lot more detail, a lot more work. But the first thing you need to think about is, you know, assessing your company culture, accessing how data works through the business and where it's stored. The kind of business you're in will seriously affect your data governance blueprint or your data governance strategy, whatever you want to call it.
You need to think about, you know, things that is a self-service data framework going to be important to you. And then once you've kind of sorted out the framework and the policies, the culture, and what you want to do with your data, then it's on to, you know, discovery and ordering and analyzing. You need to find that dark data. You need to sift through unstructured data versus structured data. Unstructured data is growing more quickly than structured data. And then you need to realize that data governance is a continuous process. It's not something you can do once.
It's given, like I said, the amount, the 25 megabytes of data each day for a company. It's something that is going to be needed to doing on a continuous basis. And that's where data governance platforms, which I'll come to in a minute, will help. So you need to think about a data culture in the business as well. Think about how data governance is integral to everything you're doing. It's not a separate job, just like identity and access is no longer separate. It's not something, okay, we've sorted out identity, because identity is an integral because identity is computing. Data is computing.
Data is basically the two together or everything that we're doing in modern organizations. And then, you know, you need to think about things like syntactic data and data lakes, you know, all of that. Not everyone's going to have to think about all this stuff, but it's important.
And again, like I said, I've simplified things here, but it's a start. So let's just talk quickly while I have time about the leadership compass that I wrote and was published quite recently, which looked at the very types of vendors in data governance that can help you with a lot of this stuff. And you can see that we have 13 vendors. Is that right?
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13. So they're all there in the report. Now I've put on a QR code, which you can, if you've got your phone with you, you should be able to scan and it will bring up the report for you. But I'll also show you what, I don't know if you can see that, but what I'm trying to do there is anyway, is show you the leadership compass.
As I said, if you scan the QR code, you can go directly to the leadership compass that I produced and you can get a free 30-day trial. So plenty of time to look at the report, see what I have to say about these guys, see which ones we think are most suitable. Like all of our leadership compass, don't just look at the leaders immediately because all of these guys here have something of value to offer to organizations that want to make sense of data governance. And that is the end of my presentation. I think I'm just about on time. Let's just see if there's any questions. Here.
None have come in so far. So I'll leave the option open just for a few more minutes in case anybody wishes to ask a question. Here's some related research. The leadership compass, again, you can click here and read it. We also did a separate leadership compass on data security, which is also available. And then one of our videos or our podcast, which talked about the integration of IGA and data governance, which is also something a little bit what I've been talking about, the new paradigm of identity access for data. So it doesn't look like we have any questions. That's okay.
I appreciate your time with me this afternoon. I hope that this was of benefit to you, but the overall message is, and I think I've said it already, but think about data governance, think about identity access as integrated elements, think of them as both as critical to creating a more secure, but also more valuable business environment and more valuable data environment. So with that, I'll say thank you very much. And if any of you want to get in touch, then you can email me directly, pf at kukungokul.com. And with that, I'll say goodbye. Thank you so much.