Hello, and welcome to another KuppingerCole webinar. Our topic for today is Navigating Data Challenges, Unlocking Power of Data Marketplaces. My name is Alexei Balaganski. I am a lead analyst here at KuppingerCole Analysts. And my guest today is Bart Koek, who is the field CTO at Immuta. Before we start, just a few housekeeping rules. You should not worry about your microphones or anything. The audio features are controlled centrally.
We are recording this session, and we will make the recording and the slides available on the KuppingerCole website, probably tomorrow, and everyone will get an email with all the links. Despite what the slide is saying, we actually are not playing any balls today, but we do have the Q&A session at the end of the webinar. So at any time, you can ask your questions using the Q&A tool, which you would normally see on the right side of your browser when you are watching this live. The agenda for this webinar is, as usual, split into three parts.
First, I, as an analyst, would be doing some kind of a neutral, high-level introduction to the field of our discussion today. Then I will give the stage to Bart, who will be talking much more technical and deeper into all the details and the implementation intricacies of a data marketplace. And as I mentioned, in the end, we will have a Q&A session. So without further ado, let's just start with the webinar. And I guess I have to address the elephant in the room. I was really surprised when I was talking to Bart earlier how different our views of the entire data security market are.
And not because any of us is wrong or anything. It's just kind of the whole field of data security is extremely broad and complicated. And it's driven by multiple trends and challenges and risks. All of those, of course, were too difficult to cover in a single webinar. But broadly, I would say we are living, essentially, in an insecure world. And we are facing challenges of the clouds, of the mobile workforce working from home, especially after the COVID pandemic, of multiple tools hosted somewhere by a third party and where you, as a data owner, have limited control.
And of course, the entire landscape of privacy and compliance regulations is ever-increasing in sophistication. It all started over five years ago with GDPR, but now most countries and even individual federal states within the US, for example, have their own regulations, which are difficult to comprehend and to kind of make them work together in an agreement.
So yes, the three major challenges for every digital business are the fear of a data breach, the fear of being incompliant with all those regulations, and of course, business continuity. What if a ransomware attack hits you and you just cannot access your data anymore? Those are three things we usually discuss in almost every Coupang and Co webinar, but today we have to address the fourth one. It's efficiency.
It's, well, I mean, the main reason you are using digital data within your company is to earn money with that, right? And of course, you need to be able to do that efficiently to get your return of investment on your data and on all those tools you use to manage, to secure and protect your data. And those tools and measures are becoming extremely expensive recently. I've just listed a few numbers on this slide just to know that managing your data just slightly incorrectly will lead to catastrophic consequences. You will be out of your business continuity for hours, if not weeks.
You will lose millions, if not billions, of data records to hackers, and you can very easily be hit with a billion dollar fine for data violations. Why?
Like, what's going on? I mean, the world is becoming increasingly digital. We supposedly have better and better technologies. We can scale. We can have so many security tools we need, but why are they all failing? What are we doing wrong? I would argue that it all starts with a major misunderstanding, a myth, that having more data means having more value.
Well, this is profoundly incorrect. And again, I would argue data is not the new oil. It's not the crown jewels. Sometimes data can be a dangerous liability, like a barrel of toxic waste, depending on how well you are able to handle that data. But more importantly, the data has no intrinsic value of its own, and it would only become useful, it would only start generating money for you, if you will, when you are processing, refining, transforming, working on the data across the entirety of your business.
So multiple business units, multiple people, multiple personas have to be able to work with the data efficiently and securely at the same time. What are those challenges that every digital business has to address? As I mentioned, the biggest one is actually data to value gap. You are probably sitting on terabytes, if not petabytes of data, but you have no way to efficiently turn those heaps of data into dollars.
Of course, everyone attempts to do that with different results, but for a lot of companies, which it ends up badly with the other issues, like for example, data sprawl. Usually you will just have too many data sources, completely disparate, completely unconnected to each other, having different data models, different formats, different infrastructures, different access and security controls, and very little common visibility across them. And of course, all this leads to data friction, because data friction is exactly that challenge.
How do you make your data accessible fast enough and securely enough for every potential consumer? And of course, having all those business challenges in mind, you still have to maintain the security and compliance regulations. You have to ensure that your data is protected from all the malicious actors. You have to make sure that your data remains available all the time, that you are not hit by a ransomware attack, for example. And of course, that your data is handled in a way that makes auditors happy. The biggest challenge here is that data doesn't actually exist in a vacuum.
It's not like you have a gold button or a nugget in a safe, and you can kind of derive value from that. Data is constantly moving across your entire infrastructure, across multiple clouds or on-prem environments. It has to go through tons of different transformation steps.
And it, of course, includes a lot of different business-oriented and security-oriented tools in the chain. And just some of those tools I tried to illustrate in this picture. This is by far not the entirety of data protection and data security, just some of the most relevant and important steps. And as you can see, not all of those are even directly related to data. For example, you have to think about API security and dynamic and policy-based access controls and monitoring user behaviors and stuff like that. So this is extremely complicated, and it is extremely disjointed.
How do you try to make this complexity go down? One of the approaches Kubernetes has been pushing for quite a long time is think about data, oh, think about data-centric security. Instead of focusing on individual parts of your infrastructure, like a database or an API gateway or a network or a storage space, you treat your data as a kind of a semi-living, semi-sentient object or even a being that has a life cycle.
It appears from nothing, it's created from transformation of other data during the acquisition step, and it disappears or has to be kind of dealt with safely during the disposition, archival or deletion state. And in between, there is this long and active life where data moves, transforms, being accessed by multiple parties. You have to incorporate a lot of different protection capabilities during that life cycle. And of course, one of those is access control.
I find it somewhat ironic that some people would not even consider access control a quote-unquote security measure at all, because, well, it's identity management and access management, right? Well, it's the most important part of modern security. And as we at Kubernetes are trying to preach during every webinar or any other event, identity and security are nowadays just kind of two sides of the same coin. Neither is possible without the other.
But if access control and identity management are such a crucial step, and, I mean, IIM as a discipline has been around for decades, surely we have already figured it out like to the tiniest detail. Why exactly are we still having problems with managing access control to our data?
Well, again, kind of the answer is pretty straightforward, because you have multiple personas, multiple stakeholders within your enterprise, and everyone has very different understanding who should be accessing which data, why, when, and with what outcome. Essentially, all those stakeholders are just speaking different languages, because for them, even the notion of access or policy or security or even what data is might be very, very different.
And if you try to apply the traditional static, largely manual role-based, for example, access controls to data in a modern enterprise, they would inevitably fail. They just cannot keep up with the scale and speed of transformation and the ephemeral nature of modern data processing, especially in the cloud. And of course, we do still have a very disjointed ecosystem in a typical enterprise with legacy systems on-prem and modern multi-cloud services coexisting in parallel. The problem is that they not just have to coexist, they have to talk the same language again.
They have to support the same policies. They have to provide the same consistent feasibility and avoid security gaps. And this doesn't happen simply because there is no orchestration or enough automation in all those numerous security tools which every company has at their disposal nowadays. So this all doesn't work. This traditional approach, infrastructure-focused approach, and even kind of the basic idea of traditional access management and even policy-based access controls are not really working because, to a large extent, data is not treated as a product within the business.
Again, data exists in multiple formats, models, technology stacks, and so on. Most business-focused parties and stakeholders, all those intricacies are way above their heads, so to say. They just want to do their job as quickly as possible, and they are not ready to accept the idea that there must be some security controls standing between them and their business goals. And this is why they would do everything they can to avoid, thinking in security terms, to kind of try to overcome security controls. They would leave their data buckets unprotected in the cloud.
They would leak data inadvertently because of a phishing attack and so on. Simply, security is not on the top of their minds. And ideally, they would love to see all this security delegated to a trusted third party, or at least to have it figured out for them and instead focus on their business productivity. They do not want to access a database. They want to access data as a product and turn it into a different product and kind of continue in this business or data supply chain, if you will, because every step in the chain generates tangible income.
And this is where we are coming to this notion, this emerging concept of a data marketplace. What is it?
Well, it can be implemented in multiple different ways on the technology level, but the whole point of a data marketplace is that it provides a unified platform for managing access to all data sources across the enterprise. And that it does not treat data sources as technical systems. It does not highlight the point, for example, that this one is a relational database and that one is an object store. And another one is a cloud-based service.
Most important is that all those previously disjointed sources are combined into some kind of a logical entity, a data product, which is enriched with metadata, context information, tools, maybe even services built around those products, all to simplify the consumption of the data. So ideally, a data product published in this marketplace should be easily found and accessible to a consumer through some kind of a self-service interface. You are basically browsing your grocery store. You find a nice basket of pumpkins or whatever you're looking for, and you are done.
You do not need to involve a DBA or a cloud administrator or any kind of IT service for that. Of course, everything is automated as much as possible. All the workflows are ensured to do exactly as expected without any manual intervention. And they're all kind of controlled by attribute-based and policy-based access. So you just define your business logic or security rules or compliance regulations once. For example, in our hospital, a doctor is only allowed to access the patient records if he's treating the patient and if he works in the same hospital or in the same federal state, for example.
This is it. One policy that should automatically apply across all those layers of security and compliance controls somewhere inside the data marketplace. And you would automatically get only the data you were shopping for. Your patient from your state, sanitized, secured, maybe even masked for especially sensitive data like the credit card numbers. And of course, everything has to be monitored to ensure that you would stay on top of all the activities, malicious or not, to ensure compliance and have tons of opportunities to improve the future operations even further.
Sounds interesting as a concept. Have I seen such tools already available as turnkey products? I have to confess, no, I have not. Although I've heard a lot of interesting developments. And one of those developments we are going to be seeing ourselves in the second part of this webinar.
So Bart, it's now your turn. Let's see what Immuta has to offer in this area. Thank you very much. I'll talk to you about like how you can actually do it. How you can take those concepts of a marketplace, policies, granting that access, data products, and how you can actually manage that scale. And that's what we do at Immuta. And then specifically like around the power of that data marketplace. But before I go into that like real detailed solution, I would like to make a comparison to like maybe your data landscape or something that I see quite often talking to different customers.
So if you go back like thousands of years to the time of like cavemen, all the data was captured as like a cave painting. So if you saw a wolf, like you would like draw it on the wall. If you like found like three deer and you hunted down, like you would like draw that. So all the data was captured as pictures and images. But the reason I'm talking about this is like, it's located in one place. Like you have to go to your cave, see the painting and like to make that visible. And if you want to share that data with other like people around you, you have to bring them to your cave.
And this really compares to like the silos that we see in a lot of the organizations. Like there might be a silo for like finance data but you need to know it's there. And you actually have to go there to consume it.
Now, if we go like thousands of years in time, we invented like the clay tablets and we started to write. And that's when data become portable. Like you can actually take your clay tablet and bring it to somewhere else. Like you can make a copy and give it to someone else. But you still have to know that the data is there. And like both the producer, the person who writes it as the consumer, the person wants to get that information, needs to know that it's there and that's how it can be shared. But that's slowly where we're starting to decentralize.
And then it took another like thousand years where like we humans like invented the library and only then like data starts to become like centralized. Like you can actually find like what data you have. There's like data types, like everything is sorted into different buckets. You might have sensitive data, like who owns this land or some like royal family history that shouldn't be enclosed for like hundreds of years. But that's where you have that catalog.
Like you can find like what's in the library and there might even be like interlibrary systems where you know, okay, like we don't have this information but if you go to that other library, like that's where you can find it. And from there, like the data, like books or like written data became useful like across mankind.
And again, if we go back to like how people or organizations manage data, like that transition took a long time in like history. And we also see with customers that like that's a journey to go through. To go from like people having their own data and their own like laptops or their own silo to having data that can be shared across, being used, like shared, et cetera. But that journey is hard. And that's where sharing comes in.
And as Alexander said in the beginning, like maybe he and I look at it from different aspects, like perspectives, looking more for like an outside, like people should not come in, except by like data leaks externally. But I would say or argue that like, actually that internal data sharing is the hard problem to solve today. But it might be good to explain like the difference. So like if you share data internally, that's across the organization.
Like maybe marketing has some data that's finance needs or maybe the logistic department wants to know about the manufacturing data to know if a part is coming on their way. It's like different parts of the business need data from each other. But it's hard. There might be potential conflicts of interest within the departments. And people might be afraid to violate like compliancy regulations.
So like, for example, data sovereignty, data from yours go to the US, like that's a risk. Where if you look at like external sharing, it's typically with third parties. Like you're sharing data from one company to another. Like you might have a vendor doing your tax audit or you have data that's relevant for like a lot of other companies. You want to sell it out there. And that's where you want to like to data monetization or to gain some money for it. But then like, while you're sharing it, like you can already take account for like geography based regulations.
Like you know that you're not sharing specific information because that might be incompliant for like data sovereignty. But with internal sharing, that's a lot harder because like you want to give all the data, it's the company's data. And how to use that. So it's that internal sharing that I would like to focus on today on how you can come up with that.
And like, well, doing quite some like research, talking to a lot of people in this space, like we see like five typical challenges with that like internal data sharing. Like the first one is the data management.
It's like, do you know what data you have across the organization? How can you store it, organize it and make sure that like you might create the right copies or like share it in the right way. It's that like intralibrary system. Like it took people like thousands of years to figure that out. And organizations are also figuring out like how can I manage that data at scale for my whole organization? But then if you go like one level deeper about like people owning the data, it's that they're scared to share the data. They have like perceived regular prohibitions.
So it means that like, okay, how can I share this data with others? Like maybe data is sensitive or not. Do I need to comply with GDPR? It passes like different like regulations that we have across like, it's complex. And we're not all lawyers. Like we're probably like data engineers, data analysts, like security people. So how like to share it?
Like, well, if I'm scared, like, well, then just not share it. And that's also that like risk assessment.
Like, do you know it works? Do I know I comply?
And like, if you have something secured, it doesn't mean that you're compliant or vice versa. So that's a scary thing to do. And that's where you can have tools, but then like technology, but then different cloud providers, there's different tools, different systems. Like as Alex mentioned, like different databases, data warehouses, data mart, like it all has their different language to control that data. So how do you like put it across? And in reaction, like people just overly restrictive and close things down just because, and that's point five, they have this fear. People are risk averse.
They don't want to take the liability on their data. If organizations like centralize it, it's typically a data function. They're not like measured on like business outcome. They just want to collect that data. And that's where like, you need to change that mindset from, okay, like I'm not sharing data because I'm too scared to, okay, like you must share data unless you're not allowed to, you cannot. But that's a transition. Like you need to go to that path. But if you do, you can like be a lot more effective in using the data and use that data in a valuable manner.
And that's how you can treat that waste into valuable assets and actually get value out of it. Because it's not just like one team using the data. It's not just data analysts or data scientists using the data. But nowadays, especially with like gen AI and people can run like natural language against like data. Like it's everybody. It's like executive leaderships that need to look at like KPIs, it's marketing professionals that want to understand the customers and campaigns.
Product managers want to improve the things that they are delivering, operations and logistics, they want to optimize their processes, et cetera. So like it's the whole board, the whole organization that's using and consuming the data. And everybody is using it in a different way. Like maybe the data scientist is writing Python and doing like training machine learning models and coding in code where like the business analysts are creating BI reports and the executive leaderships they're consuming BI reports. So everybody again, like has their own language, their own way to consume that data.
But in the end, the data needs to be provisioned. And at scale, you cannot just rely on like data team or an IT department to take care of that. So if you look at like top down, like the business wants to consume data. They put pressure on the data stewards to say like, hey, give me that data. But the data stewards themselves are not in control. Like it's data governance that's responsible for the rules regulations. So they push that pressure to the data governance, but it's in the end, the IT department that needs to operate that.
So it's all those different layers putting pressure to IT and that's slowing things down, like increasing pressure. And as Alexa mentioned in the beginning, it's like, it's that DBA that in the end needs to control that access, which doesn't scale. So in order to improve this, you need automations to make that scale. And in that case, you can measure time in minutes and not in months before people can get access to their new dashboard. And today it is so slow because like there are so many steps in between.
So like here, you can see that there are some assets like warehouses, data sets, databases, storage, et cetera. But to get that to like the users, to do their analytics, their analysis, et cetera, there are so many steps in between. Like IT needs to be able to provision it. They need to set like the requests, set up the domains, et cetera. It's the governance that like has their rules and liability on it. And then it's the stewards that like approve and get the data out. And in every step, there's a delay.
And here I wrote that the late weeks where like getting actually access to your data can take weeks. But on the previous slide, it said it needs to be like minutes because people want to do their work. They want to get value out of that data. So how does Immuta facilitate that? And that's where I want to pivot slowly to like how Immuta can help you and help with that automation. And we believe that that access control management, that's a large barrier for internal data share.
Yes, it's easy to open up your whole database or open up your S3 bucket and give it to someone else, but then you lose control and people don't want to lose control. So how can I share the right data with the right people? How can I mask data that needs to be masked or only give some financial data to the people that should be able to see those numbers? And with Immuta, we see two big patterns with the customers that we work with. The first one is the workflows.
It's like, how do you automate access requests? Like people request access and then getting that access. Like in the previous slide, I said all the different steps that need to go in between and that's putting pressure to IT. Like how can Immuta help with that? Like that's the first bit. And then the other half is the automated access controls. So how can Immuta ensure that you just have the right data based on like the right policy? So in the example you're given, it's like doctors treating a patient should see all their patient data.
They shouldn't need to request access for a patient, they should just have it. So it's a difference between like who should have access, the automated provisioning, just automatic access, and the people who could have access, you should be able to give them the ability to allow them to go through. So first of all, I want to give a couple of words on like who should have access. And that's where the automated access provisioning comes in. The example with the doctor and the patient.
So in Immuta, you can define rules, you can define policy, and that's where you can use something that we call attribute-based access controls. You can use facts on the organization, on users, and combine it with facts on the data. So for example, if you have a patient, combine it with facts on the data.
Now, what does that mean? Like, what are some examples? So facts on the users are, okay, in what department are the different people? Are you in finance, marketing, logistics, R&D? But it's more than that.
Like, it's also like, what's your job title? Are you like a data analyst or data scientist? Like that might change like what data you should have. Have you completed your GPR training? Or are you based out of the Netherlands or in the US? That all might change the level of access. But it can even go more fine-grained than that. To come back to the doctor example, like what's the list of patients that you're treating? That's all information on the user. And then on the other hand, like we have information on the data.
Does this table or this asset, does it belong to the finance department or is it marketing data? Does this column contain a patient ID or a credit card number or a name? It's all of the information that we have around like the data that we mentioned. And Immuta can combine those two facts, facts on the users and facts on the data into rules, into policies. I want to mask all columns that are labeled SPIR. I want to filter all the patient data to the doctors that are treating those patients. I want to give all finance data to finance people.
And by creating those like policies, those rules, you can automate your access control. Like you don't have to create your traditional roles and manage all your groups. Like that's where Alex already mentioned, it's like your RBAC model will fall down. Or if you can automate all of that, you can automatically proficiently access to data to the people who should have access to that data. So that's where you can automate that. And that was the first example. But maybe not everything is automated.
Like you still want to allow some people to request access or maybe you don't want to define those rules, but you just want to give this like place where people can find and consume that data. And that's where I want to introduce like another concept. And that's the concept of a data catalog and a data market. So I think most of you have heard about like data catalogs and a data catalog is for the builders. It's typically for like data engineers, ML engineers, creating mobiles. It's like the people that like create the different assets. And a data catalog should contain all the objects.
It should contain all the tables, the views, source systems, BI report, like have that full energy in there. And it basically contains all the building blocks. But at the other hand, that might be new is the data marketplace. And that's for the consumer. It's for the business analysts, like BI developers. It's for people that wants to consume that data. And they want to have like ready to use products.
Now, what's the difference between a building block and a product? Now, think about like Lego. In your data catalog, you have all the different Lego bricks that exist. So the data engineer, they might collect those different Lego bricks and build something out of them.
Consumers, they want to consume the data. They want to use a BI report. Like they want to consume it. They don't necessarily have to build it. They might want to create the analytics on top, but they don't want to build that data.
So again, Legos, they want to buy a box of Legos where they have like the full set of Lego bricks. They want to have a manual that shows how they should build it. They want to see that it's only for like age nine plus. They want to see how much it costs. And they want to have like product that's ready to use. They can trust it. The quality is right. All the bricks are in them that they need. And that's where they consume the data. So that's the difference between like a data catalog and a data marketplace. And you might have thought, that's all great, but the purpose is different.
That marketplace is where the people combine, like come together to share the data across the whole organization. So it's that data consumer that wants to have that data product, that Lego box. It's the steward that wants to approve it.
So yes, you can use it, or maybe you're not allowed to. It's the product owner that wants to publish that data product and say, okay, well, this is like what people can use. That's that risk averse person. But like if it's in the marketplace, it's governed around there. And then you still have that governor that needs to enforce policies, improve compliance across the board. Now this looks very simple and the idea is great. But if you look in practice and how people work like in organizations, it's actually very complex.
There are loads of different building blocks that make that whole journey of like requesting access to getting that access. Like you might have a ticketing system where you can request something that will go to your IAM system where you get your role. That's then being synchronized to your data platform.
Owners, they publish something in the catalog, but they build it in their works. Like this is a very complex system where you have the different building blocks that you all need to connect together. What if you can simplify the process and if you can actually like accelerate that back? Because like here, you still have that DBA that needs to go from their access management to in the end that access request. And that's something where Emuta can help you to power your marketplace. So you can use those policies. You can use those ABAC rules that I talked about earlier to drive your marketplace.
So if you have a data catalog and if you have a marketplace where you can publish the data assets and people can request it, once that approval goes through, like that approval goes to Emuta and we will then directly go to the data platform to provision it. So we really simplified that whole step where you need to combine IAMs, catalog, marketplace and those different solutions. And we can accelerate that process. You can still use that birthright or that ABAC policy.
So the governor, they can still have this holistic view about like who can see this data, who should see this data and combine that in. And that's where our organizations that we work with are creating their marketplace applications. Like you can have views where people can request access. They can see all their different data products. They can publish their data products. But then like when somebody requests access and somebody has approved, then Emuta will take care of like actually provisioning that data and making sure that the user consume that data in their data platform.
Whether that be a data rig, stove, like a cloud data warehouse or database storage, like you name it. Emuta will actually make sure that you get that. Meaning that you can automate that whole workflow. Today you can automate a lot, but in the end you typically still have that ABA that needs to do the grant and manage that role and set that up. Where Emuta can actually automate it all the way to your data and making sure that the consumers can just consume the data products and don't have to wait for weeks until all those different layers have been processed.
So how does it look like from like a market, the market extra perspective? So you have your like platforms, like you have your data warehouses, your storage, your databases, maybe your BI tool.
And Emuta, this like big platform is connected to those integrations. So Emuta can control those like warehouses, storage, you name it. And in Emuta, we have different services that can help you with that access control. And then on top, you can build your apps. Like you can say, okay, who could have access that marketplace, who can like request access, provision, et cetera. And then Emuta make sure that that automation to your database happens. Or you can think about like who should have access.
And that's where like the doctor can see their patient data directly when they're assigned that new patient and go all the way to your analytical database. So you don't need to risk a patient's life until you get that data, but you just like have the data that you need. But this is the conclusion from my part of the presentation. And I thought quite a bit about like marketplace and how you can build it. I would love to invite you for a next webinar or we'll be announcing some like new product enhancement that can even further accelerate like your marketplace journey.
So feel free to scan the QR code. It will all be posted in the email afterwards. But as well, I would like you to invite me for that. Thank you very much, Bart. That was a great presentation. That was a really kind of an interesting deep dive into all these concepts.
But kind of the biggest takeaway for me was like, I really liked this idea that a quote unquote data security platform not only does well what a security platform usually does like securing the data, but it actually improves your productivity because it not only kind of automates the access requests, but it actually anticipates what data you might use because of your persona, because of your kind of line of work and would do that automatically for you. This is really something which I don't believe I've seen before. And I would really love to explore that even after this webinar.
And I definitely recommend our viewers to do the same because this is really something totally new. And before we go to our Q&A session, let me just quickly use one minute for a shameless plug of Coupang.co's own research in this area. I've included a few links to relevant publications we had recently on the subject of data security and governance, the kind of the past, the present and the future. And speaking of the leadership compass on data security platforms, I'm currently working on the new version of that which will be published later this year. So watch this space.
There will be definitely a lot of interesting changes in that. And of course, you are all very welcome to attend our upcoming cybersecurity related conference, Cyber Revolution, which will take place in Frankfurt, Germany this December. And we do cover a lot of security related topics there. So hopefully we will meet you there as well. And of course, let me remind you that you can submit your questions through the Q&A part of the Cvent platform where you can type in your questions and I will just read them aloud and part myself will answer them to the best of our knowledge.
And let me just stop sharing and let's kind of start with our questions. And the first one is already there. Is the difficulty of sharing data a technical one or is it a culture shift? If I may kind of give my two cents before you, Bart, I would argue that it's very much a terminology discussion. Like is sharing difficult or not? Depends on how you define sharing. It's very easy to overshare your data, right? This is what happens all the time. This is what we call data breaches.
Yes, it is sharing. It is extremely easy, but it doesn't mean you should do anything like that. What is difficult is to maintain the right balance between, again, productivity, security and compliance, the right target audience of your sharing. And of course, making it all as seamless and frictionless as possible.
But still, what is your thought? Is it more like a technically limited problem or a culturally, people-limited one? I mostly agree with you. It's technically very easy to share data, but to share it in a controlled way that you feel comfortable with it, that's actually quite hard. And because that's the hard bit, people stop doing it. And that makes it a cultural difficult problem. So because it's technically not easy to do it in a controlled way, it becomes like an organization. So that's why people start sitting on their data.
And if you lock data up and nobody has access, that's the most secure way of using the data, which you're actually not using. So if you can have the right tools in place and facilitate users to share data in a compliant and secure manner, that will drive behavior. And that's when people will start to share more and feel more comfortable sharing data.
Like, you know, we are both based in Europe, right? Like me in Germany and you in the Netherlands. So we are very, very familiar with this whole kind of, I mean, it's really a phobia. It's data sharing phobia. It used to be related to the cloud like a decade ago when people were really eager to adopt the cloud, but they didn't know how to do it safely. So they would just stop their cloud migration projects completely. The same happens to AI, for example, now, because it's a hot topic.
But again, people just aren't sure how to do it properly, how to avoid GDPR violations and stuff like that. So they're just not doing it.
And again, this is completely irrational. This is a phobia by definition. It's like fear of zombies.
And yes, to a large extent, kind of my job as an analyst and your job as a security vendor to address those phobias, but it has to be done very carefully because if you try to force this without the appropriate changes in the culture and business trends, it will surely backfire. Yeah, and people like to engineer around it. So then you might get like IT departments that like over-engineer a solution to keep that control, but that then slows adoption because it's an over-engineered and complex solution.
So like, it's always that balance, like making sure that it's secure and compliant, but it's also like needs to be usable and workable and you should get value out of it. And it's that like balance.
Balance, that's hard to strike. And if you look at history, like it's typically a pendulum, like you go to like completely lock things down and secure, that's opening it up because the business is not happy. And then like you have like a slightly safer leak and then you go back to securing and closing.
Oh yes, it is both a technical and the culture issue. And it also is too much, and unfortunately kind of to underappreciated extent, a legal issue as well. It's an issue that requires more regulation.
And again, this is where Europe is probably kind of a little bit even leading because we are still good at bureaucracy. But on the other hand, it makes all the things slower in the end.
Yeah, and it's different personas, right? Like you have the people that like are lawyers that read law and understand GDPR, but it's the data engineer or it's the DBA that needs to enforce it. And that's two different personas.
Like, so what if you can like have that lawyer define some rules, some policies that automate an accident? And then like everything that the DBA needs to do is label the data and make sure that those attributes are. And then like those policies, they can automatically apply. So then you change the policy to rules and laws ensuring compliance.
Okay, next question. So you've shown some screenshots of a marketplace. Is that your kind of built-in capability or is it something your customers have built on top of your platform? We have a bunch of customers that like are using the Muta capability to drive access to their data based on their marketplace. But I'd love to invite you to the webinar that I announced to see more on this aspect.
Okay, so this is just to drive up excitement and something new to announce soon. Okay, sounds cool. Definitely you can kind of watch this space.
Again, the next one, you talked about data catalogs and data marketplaces. There was a difference between them, but can you actually build a data marketplace in the catalog or on top of a data catalog or are those like two completely unrelated things in technologies? I think you can, but you also have to be careful because like if you have your data catalog, that's where you have to have like everything, like all your data should be there. And a data catalog is only as good as the data you put in there.
So you rely on people to have good data quality in the catalog that like everything is labeled correctly and they're added. Something that a marketplace is solving is that you have like a list of like nicely curated up-to-date data products. And like you mentioned, Alexander, in the introduction, it's like, that's where you have your descriptions, like your data quality. Like it's something that like if I use the data product, I can trust it and it's the right thing.
But if I add that to that data catalog where I have a bunch of other objects, like there's the risk that like it's not being updated that much. So that's where like, if you keep it separate or if you make it a place within the catalog, it can live together. But it's different concept, like a data product compared to different assets that you could have. From my point of view, I would add, first of all, it's a completely different layer of abstraction, right? Doesn't matter how exactly it is implemented. Technically, the point is that yes, you are serving a different crowd.
And you are actually serving a different kind of resource. Instead of data source, you actually want to deliver a data product.
And again, kind of the data product does not need to be like a connection URL to a specific database. It might be something, or it might be in the future, it will even be something even higher, highly abstracted. Maybe it will be an API endpoint. Maybe it will be, or I don't know, like API software pre-configured session where you just click a button and access a nicely formatted report. The point is that, yes, you have this kind of lineage and supply chain, if you will. Guarantee that everything is clean and correct and validated and current and whatever.
Basically, this is like a high quality product, not just a truckload of iron ore for you to dig into. Right, okay, the next one. In light of evolving data privacy regulations, what strategies do experts recommend for ensuring compliance while optimizing data utility?
Wow, that's a really big question. I can, I hope I understand the question correctly.
But, so data masking is one of the techniques to make sure that you can use data without like, well, getting out of compliance. So if you have a data set containing like names, for example, and medical information, like you don't want to like share that data with everybody. Like it might be that doctor that's treating the patient that should be able to see it, but other doctors in the hospital, they should not see like the full list of patients, for example, because then you're leaking data.
Now, if you can apply the right masking techniques to that data, you can still share the data. You can still utilize the data across the organization, but you're not sharing everything. So maybe you're filtering to only the data that you are treating as a doctor, like that's a masking technique, or maybe you remove the names from that data set, but then like maybe somebody that's researching, like I can still use some analytics around that. So like by different like masking techniques, you can like share most of the data, as much data as possible while still complying with the rules.
Now, that might be even harder than giving access to tables and rules, like where you can like maintain rule, because like you need to understand like law, you're getting closer to that area, but also that's something where you can use those policies and those rules. Like if I have a column that's PII, I can mask it by hashing it or filtering it or making it null. If I have a patient ID, I can always make sure that I reduce it to only the patients that I have.
So if you can like apply those techniques correctly, you can actually share and reuse the data a lot more, but you have to be careful and apply it in a correct way. To build a little bit higher on top of your answer is, I would say, remember one of the earlier slides in my presentation where I would call kind of data a toxic liability. I would argue this should be like your primary driver for any data security strategy. You should not think of data as something extremely valuable, like gold or printer ink or perfume or whatever. And at the same time, like extremely safe to handle. It's not.
Like you should really think of your data as dangerous, volatile substance, which you really have to secure by default. And of course, I mean, if you don't need to collect some data or just don't collect the data, like do you really need to have as much PII about your customers as you need? Maybe you don't. And the less PII you have, the less effort you have to spend on securing, for example.
If you do have some sensitive data, maybe you have to secure, mask, encrypt, tokenize, whatever by default to make sure that even if there is some unexpected vulnerability in your infrastructure, or if your access control policy was not good enough and data was leaked, at least it was leaked in a form which cannot be maliciously exploited. And so on. But if you have the right controls, like if you have like the right safety goggles and gloves and like shields and protective things to use that like nuclear wave, you can still get use out of it.
Again, my point is that it's not you who should wear the protective goggles. It's your data.
You know, the data-centric approach is that kind of you treat your data as kind of the biggest actor in your digital business. So you secure the data, you don't secure people from data. And if you know that your data is always secure, always in a state where it cannot be compromised, encrypted, processed in a hardware isolated enclave. If it's transported somewhere, it's always kind of TLS encrypted and all the access to it is done through the zero trust enabled access control proxy or something like that.
And yes, you are safe. You should never assume that some kind of operation or some kind of environment is safe enough to handle your data unprotected. That would be my biggest strategic recommendation. So if you kind of presume everything is unsafe and you always wear your security goggles or whatever, and you always protect your data, then you will be safe, then you will be compliant all the time. Otherwise you would end up sharing something too easily and without the means to kind of take it back later.
Right, we have a few minutes left and probably like a chance for one last question. And if you do not have anything from our audience, can you maybe kind of talk a little bit about your roadmap? So what do we expect in the near future from Muta? It's a good question.
One of it, like webinar, we're announcing some new stuff there. But also like, if you look at the Muta, we are really there to protect data in your cloud data platform. Like you protect your data in like the big data platforms from like the cloud vendors. So like Snowflake, Databricks, Redshift, BigQuery, Synapse, like maybe like S3, maybe like Starburst, Trino. Like if you have it in those platforms, we can take control, we can manage that data. And that's the focus so far. But you have data in loads of different spaces. So like we're investigating, like, can we expand that?
Because like, it's nice to protect all those places, but there's much more that can be done. That's one area we look into. And it's not only about like protecting your data, it's also about like, can we better find the PII columns?
Like, can Muta automatically find that PII column or that patient ID, et cetera, and help organizations finding that sensitive information? And you also mentioned about like the monitoring and like the reporting on top of that.
Like, how can you make sure that you can show your governance team that you're compliant? And like how people are using your data with the right purpose, et cetera. And do better reporting on that. Like that's all areas that we're looking at. That's all areas that we look into to improve our approach.
Okay, well, that sounds like a very solid plan. Thank you very much, Bart, for being with us today. Thank you to all the attendees, the current live ones and the viewers of our recording in the future. Another reminder for you to attend that upcoming Immuta webinar. And of course, let's hope that we will see some of our attendees at the Cyber Evolution Conference in Frankfurt. Thank you very much again and have a nice day.