Welcome to our Ko call webinar. You can only protect and govern the data. You know about this webinar is supported by one trust. And the speakers today are Sam Gilby, who is data governance, offering manager at one trust. Welcome Sam, me Martin Ko. I'm principal Analyst, call Analyst. That's usual for our webinars. And the first step, I'll give you a little bit of a background and we look at the agenda and then we walk through the sort of different parts of our webinars. And as it is common, you are muted. You don't need to care for audio control. We control it centrally.
We do the Q and a at the end of the webinar. So again, here, you know, you can end the questions at any time we will collect these questions. We will pick them by the end of the webinar. If time allows, if we run out of time or have too many questions, we might just respond in a one-to-one communication to you on this. We are recording the webinar and the Slidex will be also available for download. So you don't need to take exhaustive notes on this webinar. And last and least we will run a few pulse, precisely two poles during the webinar and discuss the results.
If time allows during the Q and a settlement, that is what I want to do first. So I wanna quickly raise our first poll and we will talk about a lot about data catalogs today, about technology that allows to understand which data you have and to keep the data data, a central place to, to provide an overview and to enable access to the data. So the question to you is, do you already have a data catalog implemented in your organization and the options we have RAs? No. Or it's something which is work in progress, but not yet succeeded.
So looking forward to your answers, the more people answer the better it is. So don't be shy and tell us about the state of data catalogs in your organization. Think we need open for another 10 seconds or so.
So if, if I haven't boarded app, please do so.
Okay. Thank you with that. Let's have a look at the agenda for today. We have three, not four parts in that case, even in the agenda. First is I'll talk a little bit rather shortly about nurturing roles for data catalogs when it comes to governance, privacy, and, and protecting data, sticking regulations, cetera, the second part, then Sam GPI of one trust. Talk about the business benefits of trust and data and how to achieve this and how to involve following that.
We will have a little bit more, more conversational style part, where we look at some, some core aspects of making such projects to success and what to keep in mind to best practices for making data catalogs a success. And them, as I already said, we will close with the Q and a session for the, at the end of this webinar. So where I wanna start is going back to, we need something that helps us understanding where data sites, this is, where I started.
And what, what I first wanna do is looking a little bit at the, the features, from my perspective, we, in the data catalog with a sort of a specific emphasis system, I'll take on things that help us really more from a governance perspective from, from understanding which data we, we, we, we, we own in our organizations and how to make this data available, how to make it available for different types of different purposes. And protecting complying with regulations are some of these essential purposes.
So I, I did earlier this year, we published leadership on data catalogs and data, data management, and this in this leadership combust had been written by me. I looked at various products in the market and I looked at capabilities of such data catalogs. And at the end, I think it's very important to understand that such a data catalog is not just a catalog of which data do you have, where and which system, et cetera, it's significantly more.
So these, these technologies are increasingly powerful and help you really, to sort of I'll attach this later on from the foundation of what you do in all your sort of work with data across a wide range of domain speed or AI analytics of consuming the data or being the data protection, data security aspects, et cetera. And so amongst these capabilities, some of the key features are it's from my perspectives, very important that there's a good support for, for automation and cataloging and, and connecting to data because there are so many different types of sources.
There are so much complexity in data. We need automation. We need to transform technical terms to business terms of business glossary, important. We need to classify data and we need support for classification to understand which data is what, what is a credit card number? What is a whatever salary information, what whatever else, and which data is sensitive, which is, is subject to which regulation I'm, I'm a big believer in data lineage, which is understanding how data flows.
And I think this is one of the biggest challenges of every organization that unfortunately the same data, not just reside in one place, but it is exported. It goes from a database into a data lake is analyzed by analytics, by BI exported, to something again, used in a different form. And so keeping track of how data flows, understanding this is essential to get a group on the data, to be able to protect the data well, to apply policies, we need data profiling. So how good is the data quality? How good are data assets? We need to analyze this. We need to be able to export information.
We need to look at integrated data quality functions, even while this is a little bit of separate terms of just these days. I'm in broker of writing a, another leadership, which is around data quality and data integration, actually technologies to build on a data catalog and that are close to data catalogs, but again, have a different specialization.
We need to have the dashboards and the views on data.
So what is in there, data security and data privacy capabilities, which is sometimes depending on the, the roots of the vendors and where they come from, more integrated capabilities, if vendors come more from that end of privacy and governance, or it's an integration thing, but we need to integrate because that is one of the most important purposes.
Yes, clearly that is making well you out of the data making use of the data very important, but the other side, the governance and, and controlling what happens with data complying is equally important, which also means that several data catalogs have integrated data governance features usually at a sort of a little bit different level, again, depending on where, where the window sort of has its as its roots, we need to monitor how the data is used, which is also very important because we surely have virtually every organization, a ton of data that is isn't used at all.
And we have data that is heavily used. We need to understand as to better use data or to reduce the cost of keeping data that no one wants in needs.
How can we access data? So self-service access integration to business, business intelligence, all these are capabilities we see in that space. And so what is, when I look as an Analyst on, on this field, what is a focus or what are focus areas that is data catalog nowadays from my perspective is way more than, than trusted catalog. It goes beyond the catalog part.
So the, the part of connecting is important. And I think what we've seen as an evolution in the the past few years is, is really away from, from relatively few traditional databases to a huge range and a huge array of different types of data sources in different deployment models, traditional relational databases, some more databases from the cloud, E R P and other business applications, structured data versus unstructured data and so on.
And so this connectivity to data sources is I think, what, what is very important that you, that you really can analyze data across on the different types of, so then maybe you just look at PII PI, you will find in your CRM and you will definitely find it in quite a number of Excel files on whatever your SharePoint service and other places we need support by AI and ML in analyzing in automation like cataloging, but we need to be this very effective.
So really delivering to the purpose, not just being the marketing password, but we see some really interesting things happening there.
I already talked about later lineage. And what is also very important is really this shift to data governance. So allowing us to us to control what happens with the data, how it's consumed, what is then innovation.
We, we observe in this field, it's AI and ML. So everyone, every vendor has it very top of the list because it's still one of the most important marketing passwords, but there's also a lot of good technology behind them. I think this is the point. You just need to be, look at it. What does it really deliver? And there are some really great things and there which add value cause AI at the end, it's, it's probably lesser artificial than augmenting intelligence. It augments the user.
There's a huge, per huge value of doing so in this field, I see the, the, the innovation very much in go in, in integration to governance, in controlling access to data based on policies and, and controlling who, who can see what partially in dynamic data I'm asking.
That's, that's always the question.
So to which extent does it catalog data or trust data structures, but at the end, it's important that, that we come latest when it goes towards consumption, but also for instance, to samples of data within a catalog that we can mask critical data, that we can discover data and that we can analyze the use of data even down to, to more the older stuff.
And so data catalogs to me, and this is going back to the title of the subject of this webinar are a core element in the sort of speak in a data blueprint in the data plane, in the whatever you'd like to call in your organization there, the core of data architecture.
So you have the sources when we read this picture from bottom with the data catalog, which is about managing the metadata where we then put on things like data, quality, data integration, master data management, more as a sort of business or use case driven approach, then the analytics to use it for digital services, for decision support and more, and on the right hand side in this graphic, you'll find this data governance part, which is really about privacy, which is about risk, which is also about more the technical end of data security.
And by the way, you'll find for quite a number of these areas, you'll already find a good number of research from, from, so what you should do is looking at data really more from a holistic perspective across the organization, bringing people together to make good value out of data, to deal the right way with data. And that requires that you're really good towards the data architecture that is not just driven by consumption, but by all aspects of that and thinking about what do you need to do?
Is that going away from data silos towards a really data-driven approach in your organization because you can't utilize what you don't know, you can't secure what you don't know and you can't govern what you don't. No, which then I'll just brought, bring up a second poll.
So if you're implementing a data catalog or already have a one place, so is this data catalog decentralized so initiative? So a lot of data catalog silos, so to speak or a centralized in initiative, which spans multiple departments. So decentralized or centralized looking forward to your answers.
So it's relative simply if you have more than one data catalog in place or more than one initiative running, then it's decentralized. So another 10 seconds, and please participate in this pulse. The more answers we get, the better it is. Okay. Thank you with that. I'd like to hand over to Sam, Sam right now, we'll talk about the business benefits of trust and data achieved this and whom to involve Sam your term.
Thank you so much, Martin. I think that was a great introduction to really what we are also seeing as a technology provider when it comes to data catalogs.
And I think you make a really great point that really what a data catalog is used for and how it fits within a broader business objectives when it comes to data has really evolved. And that's a great thing because as we develop our to new technologies, new uses of data, the way in which we use our tools also has to develop. And what we are seeing as a really key part in this is that a data catalog is key to enabling organizations to be trusted when it comes to their use development access to data.
And in fact, being trusted in the use of our data and how we are utilizing that is gonna enable us to be more trusted as an organization.
And really that means that a data catalog is not just a foundation for being, you know, a good user as an organization of data, but also really the foundation of being a trusted organization. And really what does that mean? And what's driving, this is three key things, societal pressures, regulatory changes, but also the use of new technology.
So when it comes to societal kind of drivers, consumers are aware that organizations use data that organizations use data to personalize their experience, to develop new products, to, you know, use automation. But also with that awareness is also now a pressure, but also an expectation that companies are responsible in the way that they use data.
And actually that could be used as a real promotion of you as an organization, because you'll see a lot of companies now are driving to promote how they respect individual's data, that they are using it in a responsible way that they know exactly, you know, where it is, how it's being used, how it's protected, so that there is not the risk of negative consequences on that individual.
Of course, as the regulatory pressure.
Now, you know, we are quite a few years on from GDPR and I'll go on to speak a little bit more about other regulatory pressures out there, but you'll see that it's just becoming more and more common that countries around the world are putting requirements on organizations. And this is actually going beyond privacy as well, which is something I'll talk about very briefly, but also as well, you'll see technology is being used in new, amazing ways. We are becoming more and more advanced in our use of big data platforms of analytics of AI.
But of course, as we, you know, use more advanced technologies, we have to trust that what we are doing with this and the outcomes of that is something that is ethical. That is correct that in withholds integrity, and again, having a data catalog is that foundation is gonna allow organizations to trust that the use of this new technology is gonna be correct.
You know, I Martin mentioned some, some areas, and this is just to really reiterate, these are exciting things that are being done with data.
You know, it's an, organization's one of their most important assets. So we wanna be utilizing it to develop our company, to use new products, to become more efficient through things like analytics and AI and ML. But this of course is also driving new privacy and ethical concerns. So you can't just run off and start using this new technology without having the good data management in place, underpinned by a data catalog.
Now, the thing is is that if you get this sweet spot right of using new technology, but also having that trust in your data and the way you're using it, that's gonna really, really bring some good, some excellent advantages to you as an organization.
Cause the fact is is that if you are a trusted organization with the trust in data being the foundation of that is really gonna bring competitors advantages to you as an organization, you're gonna have individuals, even if you're not a B2C business, this is other companies, this is other stakeholders who, because they trust you, they'll be willing to pay you more as a premium, more willing to utilize their data, to be a customer of yours.
And also gonna be more loyal again, just to reiterate, this is not just talking from it as an individual customer.
This is also other organizations, if you're more a B2B business. So what we're seeing is that I'm not gonna kind of speak about the, the core foundations of the catalog and what the outcomes are, because I think Martin did a great job of giving an overview of this, but more speak about what other areas of focus we are seeing our customers utilize and incorporate into their catalog so that they can have a more trusted data foundation within their business and then start to evolve. So they are using data responsibly, but again, in a way that's gonna give good business outcomes.
So I'm gonna speak a bit, little bit about how we can actually embed privacy and privacy requirements into your data catalog, gonna speak about sustainability.
And in fact how we can utilize our catalog to source data in order to reduce our, our impact on the environment. And also speak a little bit about ethical use of data as well.
The other part you'll see here is a really interesting topic, but it's almost a whole topic on itself and I didn't wanna verge too much into that area, but definitely check out some of the stuff we have around giving individuals the, you know, the ability to control their data use, but in a way that's gonna again, continue to mean that we get business value from their data.
So speaking about privacy, you know, this is something that I think a lot of, you know, a lot of us have worked heavily on over the last few years, just because of GDPR CCPA and all the other privacy regulations that have come into place.
But as we see a continued evolution of privacy laws come into place worldwide, the fact is is that just having a kind of checkbox approach to privacy is not gonna cut it in today's global privacy regulations. And this is even the case.
If you just operate in one country market, we're seeing privacy laws evolve, we're seeing expectations around privacy evolve. And so it's really important that you incorporate privacy requirements into your data catalog, but at the same time, your data catalog is really going to benefit your privacy program and allow you to be much more efficient in the way you respond to privacy, because it can be complex. If you look at definitions of even what is personal data, they different between regulations.
If you look at the requirements on how long we should be having data for again, you'll see, there's often subtle differences between regulations, how we can use data, what purposes we can define with that.
Again can be different. So this is something you need to incorporate into your data catalog because what's gonna be defined as personal data. In some instances will be different from the others. And this is where you utilize some of the features that Martin spoke about, including things like AI.
So using artificial intelligence machine learning to look at different data sets and say, under this regulation, this is defined as personal data under this regulation is not, and here are requirements having that intelligence embedded into your catalog means that those ultimate data, consumers or citizens that are gonna be using that data or managing it or remediating risks in it are gonna be much more informed about what they can and can't do and what they in steps they need to take with that data.
And this is why you need to ensure that this integration of data governance and trust is really a foundation of your data catalog because allowing that to do so will mean that again, you evolve it to be a good part of your business, but one that's not gonna cause regulatory and consumer issues down the road.
And this is a really exciting part of, of what our customers are incorporating into their, into their platforms. And this really needs some, some real good foundations within your tools. And this is kind of again, just reiterating some of the points that Martin made earlier.
So that number one, you do need real time information. Your data landscape is gonna be evolving. It's going to be changing. You're gonna be using new platforms. You're gonna be utilizing those platforms for new ways. So you wanna make sure that your catalog is integrated so that the information and the updates is real tight and enhanced, but of course, you know, lists and lists and lists of data, that old kind of concept of metadata management is not gonna be sufficient in this because you need that context. And this is where you are gonna need the intelligence.
Again, going back to that privacy example of, you know, this is defined as personal data, the data's located in this it's for these individuals.
So these are the requirements that we need to do that data in order to maintain our compliance with those particular regulations, but also the so that we can continue to use it in a way that's allowed under these regulations. We don't want this to be a blocker. We don't want to just because it's personal information say it's locked down and we can't use it.
So that does require some degree of program automation so that we can kind of understand this data, catalog it, know our requirements, remediate issues when they are found, but then unlock that data to be used by our business. And of course that means we do need to incorporate the catalog into the data lifestyle integration to make sure that at the point of collection of that data, wherever it's sourced, it's classified, it's linked to our business terminology it's then property catalog, stamps, trusted data.
And when we do need to eventually dispose of that data, it's then flagged or whatever process we got in place for the ultimate disposal or protection of that. You're gonna lead to some great things to use as an organization, if you do manage to do this. But if you don't, the fact is is that the organizations that are not taking this pro this concept of trusted data seriously are getting left behind. So we see here's some good stats that show that organizations that are trustworthy.
And again, like I said, having trusted data as a key foundation of that are gonna be much more growth orientated, that those that are not as well. And that, and then moving on to another key point that we're starting to see, that's become really important.
Again, going to that theme of what's a trusted organization, a trusted organization is one that not only of course respects individual's privacy, but also respects the planet respects society.
It's just doing good things for the world. And I think with increasing number of consumer awareness for this, this is pretty hot topic at the moment, but of course what's gonna be the foundation of your program is understanding your current state. You can't improve, you know, your environmental impact.
You can't improve your social governance unless you understand, you know, our current benchmark of how we are doing with these particular areas. And this is often gonna require your teams who are producing these reports, who are doing this analysis to be utilizing data sets that maybe they're not used to working with, because it can often be a new area. And also finding data sets that we need that, you know, typically may not be included with the data in a data catalog.
If you look at environmental stats, you know, around what are our missions, how much electricity we're using, how much are our employees traveling?
These may be found in things like, you know, expense systems they may be found in, you know, external office management tools, which typically might not be included in a data catalog, which may have focused say on a data warehouse that you are currently utilizing.
So this is why it's important to really look at your objectives beyond just, you know, the, the foundational analytics that we typically do as an organization and make sure that the scope of our catalog is then included so that we can calculate emission analytics for instance. So if you look at how you can, the core steps of, for instance, reducing your environmental impact, number one is calculate in order to do that. You need to have data around the different scopes of emissions that we're creating and a data catalog can be provided good source for your teams to be able to do that.
But of course, we've gotta make sure that our catalog is include in the areas where they reside. Then the final part I just wanted to speak about as well is something that is again, very important for a lot of organizations, but they're struggling to do this in a way that's actually tangible and a word of warning on this.
You know, nobody's, nobody's, let's say hit the sweet spot or nobody's achieved the, the ultimate way of being able to have a comprehensive data ethics program, but it is something that organizations need to start thinking about. So beyond privacy, when we do have a data catalog, making sure that when people are utilizing the data that they've located, that's being used for ethical reasons that it's not gonna cause, you know, negative effects on individuals.
And likewise making sure that the data has been sourced in an ethical way, that is if we are using third parties, that there's not issues there.
And the key to the, the reason that this is becoming more, more important is that if data is being used in an incorrect way, this can lead to really negative media consequences for an organization.
So particularly around the area of AI, AI, of course, like we said is amazing, but if it has bias, it is been developed in a way that it's, you know, biased towards certain communities or certain individuals, then that's gonna cause negative consequences to use as an organization. So this is again where that the governance part becomes really important. You wanna make sure that you have that governance embedded within the catalog so that when individuals, sorry, when teams are using that data to then U to for instance, produce algorithms, that is correct data. That is a good quality.
So that ultimately those algorithms come out are correct and don't include, you know, negative bias.
And likewise, we wanna make sure that when data, citizens or consumers are utilizing data in the catalog, they're aware of these ethical concerns that can come outta the products that they may be developing. So to make sure that that's included, there's almost kind of like ethics by design mentality. And this needs to really start by incorporating a framework of ethical data use within your catalog and within the steps that you have within that particular data catalog.
And this can be a bit of a challenge because there's no kind of overriding ethical data use framework. Like you might see with privacy frameworks, but there are some out there that we see organizations start to use and start to consider when they're developing their overall data management program. So I've got a couple examples there, ones that are commonly used by both the EU and the ICO.
And there's definitely kind of an increasing regulatory progress in this area.
In fact, in the us, we've actually seen a couple in New York and AI and the use of ethical data regulation come into place. So it's definitely something that's gonna be seen, get the focus more of regulators. So start thinking now about how we can incorporate this, because ultimately this is gonna bring benefits to you as an organization, just a new, real quick recap of who one trust are before I hand back to Martin and we have this discussion, we essentially, we call ourselves the trust intelligence platform.
We're here to help companies automate and scale those different areas that allow organizations to be trusted. So that includes privacy and data governance, GRC, and security, insurance, ethics, and compliance, and also ESG and sustainability all within one platform to help customers with these different areas, dependent on what their focus is.
The great thing about our platform is embedded within it is different intelligence, such as data discovery and classification and real time regulatory intelligence.
We see customers primarily using our data catalog for that foundational trust program when it comes to their data. So kind of aligns to what Martin was saying is what you see in a good data catalog is that we allow customers to scan and enrich data no matter where it's held.
This includes both structured and unstructured data, create those metadata inventories that can be enriched with regulatory and different business intelligence linked to your business glossary, and then utilizing that for different areas of the business to ultimately start to better understand our data, but also start to use it in a bit more of efficient way. Martin, thank you very much for allowing me to do that presentation back over to you,
Sam, thank you very much for all the insights.
As I already told previously, we will do a little bit of our conversation also to, to about our perspectives on certain aspects and so questions where, where we feel that, that these are very relevant to making data catalog projects, success, and to, to provide this foundation where we know where the right data is, where data overall resides, etcetera. And, and I think that the first point we, we selected for that is so how do you get a data catalog implemented?
So, so my perspective a little bit is that there's always this risk of ending up as a, a, a huge project or doing very technical and at the end, slightly feeling. So what is your experience on that?
Yeah, that's a great question. And I think you make a good point. It could go either way. I've seen with customers with data catalogs, one, they become too focused, you know, maybe on their data warehouse, their data lake, whatever. There's a certain store of data that they just become hyper focused on cataloging for whatever reason, which obviously there is often a business need to do that, but then you're ignoring all the other areas of the organization that potentially has very valuable data.
And then on the other side, we see some customers who are trying to catalog every bit of data that they hold in two months, which again is unreasonable. So in my experience, the best approach to take is almost like a cool walk run to this. So you've gotta really see what are our objectives of the catalog.
You know, like we discussed Martin, a catalog could be used for all sorts of different, amazing things, but again, you can kind of be off in different directions with this.
So what are the real business requirements that we have for a catalog? What are the immediate narrow requirements versus the ones we maybe want to do in the future? And then we can align them to what data stores are we gonna then start cataloging, where again, we don't maybe want to catalog everything, but we might wanna catalog certain areas of one whilst everything of the other.
So you really wanna start to have a plan of starting small and expanding. That's gonna bring the most immediate benefits to you as a business.
You know, a lot of our customers are focus. Sorry, go ahead, Martin.
Yeah, I would fully agree with that. And I think that's your spot on was saying, don't go over to the top. So I always say I'm old enough to have experience the days when organizations sought an enterprise data model is the right way to go. And like with all these, these model things, I think the problem is if you go too big at the beginning, it takes too long. And the risk of not succeeding is, is very significant. And I think this is the, the art and this leads us to, to next topic goes around success.
The artist is still to have something which allows you to grow, but to start with a sort of sufficiently small domain where, where you do things, I, I think it's also important to understand, and I think this is goes back to what I said at the end. I think it's also to important to understand how your data architecture looks like, because I think that that still too much focuses on at the end on, for instance, BI, because we need, want to make some something out of the data. We want to analyze them without understanding what is the, the foundation for that? What is that makes it success?
And finding this balance from my perspective is a very important thing when we think about data catalogs.
Yeah, exactly. And this is something I commonly see with organizations that, you know, the primary reason they create the catalog is, like you said, for those, you know, the BI producing reports and analysis, and because that's obviously extremely important. A lot of the times you, you know, when you see organizations that maybe don't have a centralized model, you'll have different teams end up using their own data catalog.
Maybe it'll be an open source one, maybe it'll be a catalog that's available with whichever data warehouse they're using, which is great. You know, they've got a catalog in place, but then that kind of defeats the object. Cause then if you have five different data catalogs, then that's kind not gonna be the most benefit to the business. The whole concept of a data catalog is that it should be that kind of ultimately that enterprise view of your data.
Exactly.
And I think this brings us to the point of what are the features that are required to make it successful because my strong belief is, and we touches this in a minute that while starting small, you need to have this expansion in mind saying, okay, we, we, we start here, but we, we might serve more use cases over time, which on the other hand means that you, you must think a little bit broader when you start, when it comes to the features because surely some of these features like more data governance equations that are more relevant to other teams then saying, okay, I just want to, to have a view on whatever few databases or data, whatever some data I have in the cloud for my new digital services.
And, and even for them, I think it's important to understand that it may grow because at the end you might not observe, okay. Aside of all the new stuff, we have a lot of old data, which also helps us to, to in this entire process. So things are getting bigger. And I personally believe that that it is important, even if you don't use whatever, even 80%, maybe if you only use 30 or 40% of the capability to look for something which gives you the option to grow.
This is for me a very important success factor side of what we already discussed saying start small or slice the elephant in pieces, whatever.
Yeah. I think you make a really good point there because as we've kind of alluded to the future of a data catalog is that it's being used for way more purposes than previously.
And again, maybe right now, your focus is on those traditional data catalog uses of, you know, producing analysis report, BI whatever. But in the future, you do want that catalog to be used by maybe personas users that aren't traditionally using a catalog. So if the data catalog is very difficult to use, not user friendly, can't be tailored to that particular user. You're not setting yourself up for success and you're gonna be stuck in that cycle of the catalog being used by, you know, select individuals within your organization.
So again, you may not use all the features at the beginning, but you wanna have them in place because ultimately for this to realize your broader objectives, we wanna make sure that we can expand the use of the catalog.
Yeah. And I think this bring nice to a very important point and, and title this France. And I think this is, this is an interesting point because I think we, we at least are aware, so such a data catalog project takes its time. There are some, depending on where you go, there are certain complexities.
So managing a business, gloss classification, all these things require some thinking. And we also touched upon it there quite, quite a number of different use cases. And so I think a very important aspect to think about from the very beginning of data catalog projects is how, how can you make friends that say, okay, great initiative. I don't want to start fresh.
Can I, can I work with you to expand it? And I think this is, this is essential. I think there, there are two aspects in that. I think the one which we discussed it is essential to think that way, the other way is how do you convince ours maybe from your practice of, oh, I'd like to hop on that train.
Yeah. I think this is an interesting topic and it's something that's still a lot of, let's say owners of a catalog struggle with. I think there's two aspects to this. Number one is that you will need people's help to build up a catalog. Of course. So it is about getting that buy in.
And this is where you need your skills as, you know, a people managers or people influencer, but it's very easy to show the benefits of a data catalog. And it's very easy to show how that will benefit their, their, their job, their objectives. So there's lots of resources available out there that can, that you can utilize to influence those different stakeholders. But likewise, you wanna get them involved in a way that they are contributing to the catalog.
You know, they're helping to develop it, which is great because people like to see outcomes. People like to see, you know, results and that's very easily tangible in catalog. So you can say this is you see amazing things and outcomes from it.
Yeah. Very important aspects on it.
The one is when, when you help or when you allow people to, to help on this, they, you can, to a certain extent, help them benefiting from your experience because as with every technology and with every, at the end, you know, data catalogs from my perspective are yes, there's technology, but there's a lot of conceptual work. As I've said, cluster risk classification, all that stuff, profiling, all that also has to do a lot with conceptual work and benefiting from experience is something which is extremely valuable.
So extending, expanding you, you, you, the initiative truly is supported when, when you are willing to share your experience, not doing the work for them, but sharing the experience. And the other thing is when oops, when you do such a data catalog, the, the other, I think very important aspect is there will always be some overlap of data. So at the beginning, that might be a little bit more distinct, but the more work in the team, the more data will already be in. So to speak, it will be analyzed. There might be different angles to that, but it's not that everyone starts from scratch.
And so was everyone who contributes, it'll be lesser work and faster success.
Yeah. I think that's a really important point. I think often where I've seen catalog projects not be successful is that they do wanna start from scratch because there's nice to, this is the opposite to what I'm saying at distrust of whatever data inventory that is already there. And that's a mistake, cuz people have worked hard to build up those data inventories. Yeah.
They may have not be the most real time insights into your data, but utilize that as a foundation, you know, you do bulk uploads, integration, whatever you can in order to get that into catalog, that information can then be verified later on with the method is available in the catalog, but utilize what you've got, show this to people say, Hey, the work you've already done is now here. And this is how we can expand on that.
That's like you said, Martin is gonna really allow you to get some good buy-in,
But it requires also well defined roles across the different, so the user, the data stewards, all the different people involved this needs to be really well defined. Okay. I think these were, I believe some, some important aspect that hopefully very valuable to our audience. So let's move to the Q and a part of this webinar again to the end is if you have additional questions. So we have already a few questions here. I don't hesitate to enter these questions.
So first question I have here is that's clearly one for use. Typically what data policies do you see customers focusing on using the data to enforce? Is it practically oops. To try and do all of them or is it better to, to, to narrow it down from the beginning?
Yeah. And obviously kind of data policies is more of a term in the data catalog sense of, you know, business rules to surface data that has issues that we need to remediate. And it can include things like retention or minimization quality access. It's a whole kind of breadth of, you know, how we can better especially govern the data.
And again, trying to do all of it at the same time is not going to be beneficial to an organization. So it is focused on what areas you think are the biggest issues in your data.
Set, focus on that. And then it can evolve. And I'm seeing retention be something that a lot of organizations are focusing on because there is that regulatory requirement, but also two things, you know, number one or a lot of organizations have been building up their data sets for 20 years, which means that there's a lot of stuff there that they don't know whether it's required.
And a lot of organizations are obviously doing big migration projects.
You know, they're going to new platforms, you know, they're going to the cloud, but obviously you don't wanna be moving junk from one area to the other. It's like when you move house, right, you wanna clear out before you move, the mistake is to move your junk from one house to the other. So in my experience, a lot of customers are focusing on retention right now in terms of a policy. But obviously it's again, look at what you think of the, the issues.
And this is why it's good to at least build up a foundational catalog because you're gonna get a good sense of what potential remediations you need to be doing on that data. And you can then develop those policies into something actionable.
Okay. Great answer. Thank you. Another question I have here is can you give a practical example of how object based permissions work?
Yeah. So something I'm seeing quite a lot, a lot of our customers now are beginning to, or utilizing snowflake.
I think they're becoming one of the leaders when it comes to a data platform, you know, has very flexible where it can be deployed and they've got some really great features, but again, there is a risk of, you know, if we give everybody access to snowflake, they can access some pretty sensitive information. So what we're seeing a lot of customers do is they will utilize, you know, our data catalog to classify the data contained within snowflake.
And then any data that's found to be sensitive, maybe like credit card information, things like social security or national ID numbers, then these could be actually masks at a column level to certain users. So you can assign yourself to user groups so that certain users, when they search for those or query that data sets will not be able to while select, you know, admins or users that we know are very careful when it comes to the way they use data will have access to that.
So again, we're not locking down entire data sets. We just, that this particular stuff that we don't want people using, we can mask that using Snowflake's native features.
And, and, and beyond that, I think the, the usual advantage of a data catalog is that you don't say I trust, protect what is in snowflake, but I know where the same data is and where it comes from. So I can, I can keep a track on this. I can ensure that I, that I protect the data at the same place, because I think our, our big problem is that we have the same data in multiple places or related data in multiple places. And as I always tend to say so, so once you start duplicating data, then it gets out of control and data catalogs are one means to, to get back control about this.
And, and I think this is, this is truly one of the important points behind it. Okay. One more question I have here is typically what data sources are organization focusing on for cataloging first. So where do they from your experience most commonly start?
Yeah, a good question. Again, it really depends on what are the priorities of the customer. If the priorities are, you know, producing analytics and typically are, I'm kind of focusing on comprehensively cataloging their data warehouse in a way that it could be much more utilized for those business outcomes. If a customers are focusing on the compliance issues, they're much more likely to catalog something like their S3, the Amazon S3, because that's often used as like the dumping ground for data. And there's some issues there with data and also unstructured systems.
Actually now you're gonna catalog unstructured data differently because you imagine a huge catalog of files. It's gonna be very clogged with data, but it's more doing it as a way to uncover data that we need to remediate in unstructured data and then looking to better protect that.
So again, it really depends on the customer. It's no one, no, you know, there's no one, this is what everybody's focusing on when it comes to the, of the implementation and that back to what Martin design first really thinking.
It's, it's basically to honest, really what seems to be most prominent, where can I get a good CRI on like a three cetera, on the other hand, really, this, this very use case thing aspect where, where you say, okay, I have this compliance issue and, and I need to look at where could this data be. And them, what we see is what we observed at over time than other types of data sources come into play, like going down into legacy, going down to more complex areas. And I think it sounds like a good idea also from a success perspective at the end, it's always about quick win and big win.
So big Bitcoin take compliance. You need to be compliant at em. The quick one is that you solve whatever per like the first 80% fast and then go into the details so that you can say, okay, we show demonstrate and, and, and improve. And I think that's advantage of many data catalogs. You can improve a lot of things. You did be through all the dashboard stuff that you can prove.
Okay, we've made progress here. Okay.
We are, I think, done with our questions. So that means, thank you very much to all of the attends for this call webinar for listening in, thank you to you, Sam, for all your contributions. Thank you to OneTrust for supporting this Ko call webinar and enjoy hopefully warm and sunny day. At least I, where I recite, I have one of these. Thank you. And try to remainder of the day.
Thanks Martin. Appreciate it.