Good afternoon, ladies and gentlemen, welcome to our webinar, the foundation for GDPR compliance and PI PII protection. So personally identifiable information, private information, personal information, understand where data resides and who processes it. This webinar supported by big ID. The speakers today are vacs whose chief product officer at big ID and me Martin Ko.
I, one of the co-founders of Ko a call and principal Analyst at Ko, a call before we start some very quick information background information about copy, a call and some housekeeping information for the webinar. So Ko a call is an independent usual Analyst company. We are have offices around cloud globe. We are focusing on IM information, cybersecurity, GRC, and a couple of related topics. And we deliver our services in three areas. So one area is our research where we cover major areas of IM security and other topics.
We are strictly vendor neutral, and we do things like our market comparisons in form of leadership, compass, and other stuff with our events, which I will touch in a minute, particularly upcoming conferences.
We also have our webinars and other types of events, and we do advisory for end user organizations and support them on the strategic side on the tools choice, etcetera. So our advisory is focused on benchmarking strategy, support, architecture, support, technology, selection, and project guidance.
So from strategy to selecting the right tool and supporting and subsequent process, we don't implement anything because we Arely neutral. We have a series of upcoming events, as you can see, the sort of most important next event is our European identity conference, which will run next time, mid may in Munich. Then we have other events around blockchain, around digital finance, around consumer identity, around cyber security. Don't miss our onsite events for the webinar itself. Some housekeeping and guidelines. You are muted centrally.
So you don't have to mute around, mute yourself via controlling this. We are recording the webinar and we'll, we'll make the podcast recording available very short term, and we will have a Q and a session at the end.
So if you have any questions, please enter the questions into questions panel at the, in the questions area, in the go to webinar control panel at the right side of your screen, the more questions we have, the more likely the Q and a will be. So looking at our webinar, we have a usual three parts.
So the first part I'll talk about the need for getting a CRI on PI and PII beyond GDPR. Even while GDPR is an important aspect. They're in data governance and continuous compliance must become an integral element of the it risk management and it security architecture. The second part then will be done by ER, wax of big ID. And he will talk about the capabilities to automate data processing reports, how they fit into a continuous compliance model of services and technology partners and the concept or of privacy assurance in the third area.
As I've said, we will go into the Q and a.
So let's directly dive into our presentation minus listen to five short parts. You can see at the bottom, starting with the compliance channeling challenge, our GDPR and data governance, how this is related, how to get a structured data, the bigger context where this entire sync fits in and finally how this relates to it, risk and security. And I wanna start with looking at some terms we frequently use and here around compliance, audit, and security, these are different things at the end, at the end, it's about taking the right action.
So compliance means that you meet laws and regulations that is compliance audit generally means that you are doing what you say you are doing and that you can historically prove it. So the audit goes and tracks, whether you really do what you promised to do, the action is what you actually do rather than tell the audit are what you do so that it might be different.
It might be even more and even better.
It, my be worse audit and compliance obviously are tightly related, but needs of compliance or audit really make you secure. It's taking the right actions, thus. So even if you trust fullest strictly compliance or, and regulations and, and you get your audit checks, it doesn't mean that you're really sec secure. It still might be that you need to, to fail different things. And we will talk. We put our emphasis on taking the right actions.
We take the context of the regulations, but be always aware, trust having your check mark might not be good enough, do the take the right actions and it might help you beyond trust passing the audit. So when we look at this entire theme, then we are talking a lot about access risk about someone having access to data. He or she shouldn't have.
And access risk is a business risk. And what you need to have is a continuous control about that. So simply said, this entire thing is not a nice teasing anymore.
So if you have an issue with access to the wrong data, you might end up at the, in the headline of the news. You might face server fines, you might have reputation issues and a lot of other stuff. And it's also that it itself can actually good policies, but it doesn't exactly know who needs access to what. So there's a business angle on that as well. The approval, the certification determination of access is connected to the business and the access risks are connected to business risks. GDPR sort of put a new focus on that in the sense of, yes, you need to do more around the entire things.
And one of the pieces of research we published is leadership brief around six key actions to prepare, to prepare for GDPR.
And so there, there are many things that you can do. But when we look at this key actions, the first thing we have in there, the first action and the something we published ahead of ahead of the, when GDPR became effective, the data of effectiveness discover the data. So you need to understand where that data resides and for every action you take, regardless of the context, whether it's GDPR or CCPA, or it's just protecting your assets.
And a lot of data around persons is your assets, your customer data, it's other data, regardless of what you do, taking the right actions always requires to know on which data to act. So discover the data. Then you can control the access. Then you can manage the consent. Then you can manage cloud services where this data resides. Then you can prepare for data breach, and then you can also implement privacy engineering, ensuring that all you do from Daron is secure.
So these are the sort of data protection principles and summary of the key of the key provisions.
And we also could look at maturity levels for GDPR readiness. We also, that's also something we published a while ago. I've hardly believe you. You need to go well beyond the maturity levels.
Three, which sometimes is the target of many businesses. So that to ensure really that, that you get a complete control and continuous inside into the information. So level one would be really react at ho and reactive. So only the basic things that are high risk of in compliance. And then you start some strategic approach, but fragmented actions fragmented PI I repositories at level three, you should move to some more consistent repositories, but a lot of manual work still there, but you might be able to justify some level of compliance.
It might be not enough, but it's at least you're underweight and more automation, level four and level five.
It's really on one hand, the continuous improvement of process and technologies that what in these maturity levels always describes level five, just continuous improvements thing. But it's particular the aspect I highlighted in red, which is around complete and continuous inside documentation and controls of collection, storage, storage, and processing of PI and PII. That is where you should end up.
And my perspective is that this red part is something where you need these insights at all levels better do it right. Better do it early. That should be one of your main targets because when you're using back to the previous slide, all the actions you can take depend on knowing where the data resides, because if you don't know where to act on, you will fail. So this is from our perspective, really essential. A lot of this, what I, what we are talking about when we are putting in the context of GDPR, but it goes beyond GDPR.
Be clear about it in GDPR.
It's the article 30, which brings in things like the obligation for data process and data controls, data processes to keep record of processing activities at the end, it's about documentating what you're processing, where is resides, which type of data you have, et cetera. There are a couple of other articles in the GDPR touching that basically when you look at recommendations, and this is from very early slides, when we started talking about what to do around again, recommendation number one is know where the data and that's not only PII, PI, it's all sensitive data where it resides.
That should be a target of everything you do in your business, understanding where your sensitive data resides, where PI PI is part of it, but the data goes beyond it. This is super essential. And then you need to control and monitor access to the data because it's the responsibility of the data controller. So the one who wants the data and the data processor, the one who processes the data to ensure that you're in compliance. And it is good idea for every business, for every type of sensitive data to know what happens with that data.
And then we have two areas of the data.
The one is the unstructured, the Analyst is structured data. And when we look at these structured data, this is organized discrete in databases, directory services, structured file formats. It's the relatively clear even with, while we big data and analytics, it gets more fast because we put everything into big data lakes, and then we manual it through and something different happens and we lose control. So that's other big area. We will publish a new reporter and this topic is the next one or two weeks.
How to get a grip on the structured data and big data and analytics for the unstructured data. It's even more difficult. It's difficult to organize. You have unmanaged content repository. You have factually even have a sprawl. You have to share system structures, sending data as an attachment with email, et cetera.
And I'll touch it in a minute again, but things are even more complex, but a lot of the critical data you have in your business resides and that type of data structure, it's unstructured, sprawling.
And we need to get better on that for a variety of purposes, where compliance is one and compliance, driven by GDPR is a specific one, but it's not the only one for that. You have a lot of stakeholders and you have a lot of drivers. So there's the management who has certain requirements. The employees want to work and they must do their job. You have the customers who want to ensure that they sometimes want to receive certain types of data. They want to ensure, or they want to feel that they're indeed not. They ensure they want to feel that their data safe.
You have to work as council, which has sometimes very restrictive perspective on how to control access on how to direct data and other stuff.
And then you on the right side of the data protection laws, external audited, internal audited, corporate policies, all that stuff, which, which are drivers, but there might be even a driver beyond that, from my perspective, which is more, if you understand where data resides and at how many places, maybe the same data results might also help you optimizing a lot of stuff in your organization.
So, so why should we do sort of data governance governing that data they governing where is resides and who is entitled to access it, etc. So yes, we want to avoid legal transactions. We want to get rid of fraud. We want to avoid information leakage and changes of data, which we don't have.
So, which also might result in loss of data because we had a change in some important earliest data got lost, external attacks, etc. So there are a mass of threats for the data.
Actually, when we look at the reality, we, we are facing this big, big challenge, which is yes, unstructured data is used in a heart to control way and most business don't have appropriate control.
So as a simple example, so if marketing marketing creates an axle based target customer list, this based target customer list is success by a team member who puts it as an attachment to a mail, send it to a colleague, maybe not the colleague who should have access to it, who modifies the customer list, puts it on the sales SharePoint side. So we have PII and Axel. We have PII and excellent in the mail and Microsoft exchange or outlook. We have a local copy of that.
Again, it's an excellent, it's a changed purpose. It's a SharePoint. And then from there, it might be shared by a link or by default to other people.
And at some point it's sprawling and no one knows what happens, but how do we deal here with the consent to the data? So who's allowed to use them for which purpose. So at some point we change the purpose. Difficult is consent transparency as a requirement of the GDPR, how do we enforce the rights of the data subjects? So the right to be forgotten across the entire chain, which we don't exactly know data minimization.
So use as little as you can storage limitation store it only when you need it. So, oh, we don't need it anymore. Delete all the axles, purpose, limitation, accountability, data breach. It's really difficult unless you know, really what happens with the data and you need to get a trip on that. That's something number then we'll talk about in detail away, how you could do that.
So, but a little bit back to GDPR, the task CS, again, it all begins with understand the personal data that this processed.
So when, when I look at a flow of a flow, we, we, we, we created a while ago, very in the early stages of the discussion around GDPR. And we all started also here with assess your organization, understand the personal data process, how is it processed, obviously needed the data protection impact assessment in certain cases, are there risks to the rights and freedoms of data subjects.
Then you need to embed it into your business processes, organizational measures and the it system. So you need to understand the data resides, what happens. You need to protect it critically. There's where identity management, the broader sense comes in. You need content management. That's that our main topic here. You need to prevent the breach by data protection and security by design prepare for the breach, detect, notify, and understand it. So understand what happens with the data. If there's something Fring happen, you should know.
As soon as possible, have a framework in place controls implemented measures that you're here to, this is really what you should do it, but at the forefront of everything, it's understand the personal data process. So GDPR and CCPA, depending on where you are, which regulations apply are great, starting points for every activity you have around it. But think beyond that. So what you learn here helps in other areas beyond GDPR. Clearly also, when you look at GDP and CCPA, they don't exactly match.
There are things which are in G CCPA wide range of types, of personal data, certain types of applications. They are not the final world on regulations. Others are coming up Singapore and others. So you should do it individually. Determine your individual sec security, legal and regulatory requirements include GDP and CPA, but go beyond. So customer trust, privacy drivers, what, what does it mean for your business model?
So what is puts your business model in danger? What helps you in, in, in creating more trust to your customer, et cetera. So that is where you need a data privacy strategy.
You need to fostering by the tool. That is what NID again, will talk about in the next 22 for 25 minutes milestone plan. And that's really the point of discover and categorize PII across your entire organization. That is where you need tools. That is where you need organization. That is where you need processes. Obviously the customer staff drain the employees, et cetera. And that is really the bigger thing you should do. And so regulations such as CCPA and GDPR help you, but go beyond that.
So from technical perspective, there are masses of tools, but it all again, and again, starts with discovery, discover and document PII detect and document data flows. And then there are other things patches, secure configuration and data encryption, et cetera.
So there are many, many technologies. And from my perspective, the essential thing is really to start with everything which helps your understanding, where the data resides.
So what you at the end should do is you should go for continuous compliance, something which helps to permanently permanently understand what happens with your data, where does it reside, etcetera, all these various technologies and all the framework you set up and you should set up a cybersecurity framework in your organization, ask for advice.
There are massive organizations including us, which can support you in doing so at the end, it's about achieving continuous compliance and more to be able to take the right action for what you need to do to ensure that nothing wrong happens with your data, but only good things happen. And that is really how to do it. That is right.
Not part, ER, will talk about. So he will talk about capabilities to automate data processing reports, how they fit into a continuous, continuous compliance model of service and technology partners and the concept of privacy assurance with that I hand over to Niro and unmute him. And then it's your turn.
Thank you.
Perfect. Go ahead.
Thank you very much. And thanks for the overview. Very interesting.
And just to recap some of the challenges when we talk about how we do this, is that also when we started the big ID, we realized that there are a lot of gaps in the current tooling or capabilities that are out there because the new regulations, whether it's GDP, GDPR, or CCPA, and all the other global regulations that are coming into play, basically introduce new requirements that were not there before. For example, data, subject rights, not that the requirement was not there, but it wasn't that broadly required across all of those different industries.
And that this new requirement requires you to have more insight about the data. It's not only enough to know what type of information you have. Suddenly you need to know whose data is found, where, so that you can find that person's data and deleted.
You need to document the, the usage of the data, the data flow mapping, the record of processing activities, article 30 of the GDPR. So you need to put that data in a business context.
That's not trivial because the tooling that are available today are either very business facing like the GRC tools or the, the tools that deal with the big compliance processes or very data centric, like the data discovery tools or data protection tools that are really looking at servicing the, the CSOs and their requirements around data protection. You need to assess the risk. You need to respond to breach. If you have a bridge, you need to notify the people that were impacted.
Again, you need to know whose data is there so that you can notify those people. And consent. Martin mentioned consent tools and automated data discovery tools or metadata tools. Don't look at this business, metadata of da.
Do I have permission to collect that data? What is the purpose of processing of that data who has access to that data, this, or, or what type of data it is this type of business metadata I is, is new. And that's why you need to have a, a different approach to data discovery and data management.
And this is really has been our kind of our, our opinion and our, our direction, an approach that is identity centric, an approach that looks at the data, not just as data, but, but looks at, looks at the data in the context of a person and being able to look across those different silos of data that you have. And most of you have a lot of different data stored, both instructured and unstructured resources in file shares in business applications, in the cloud and big data repositories. So look across all of those different data sources, that's one important thing.
And then when you find that data, be able to, to know who's data, it is be build an inventory of that data, a full catalog, but be able to correlate that those data elements back to an individual that is what will allow you to satisfy a lot of those new requirements that the, the privacy regulations introduce like the right to be forgotten, like breach response investigation, like record of processing activities, like consent, the ability to know whose data it is also helps you broaden your, what you find a lot of the existing tools that were available until now.
We're really looking at, at, at regular expression and our, at the structure of the data, looking at whether this looks like a passport number or a citizen ID, does it have the right format of a French citizen ID or a German citizen ID? It didn't really look at whose data it is.
Is this a, this is a credit card fine, but is this a credit card of a customer or an employee or, or a us resident or an EU resident, or is it a child? Is it an adult? And is it Nimrods or Martins? Because if Nimrod wants to be deleted, you need to find that social secure, that credit card number and deleted.
So the context is important, but also what about a date of birth or gender or religion? This information is not identifiable, and it's only identifiable in the context of, of an identity. That's why knowing whose data record it is allows you to know about information that is not necessarily high identifiable, who it belongs to, and whether it's categorized as personal or not. And also the scale is important. We are dealing with big data repositories that cannot be streamed. A lot of the existing tools rely on streaming the data and processing it.
You cannot do that today.
You need to run natively within those, the data warehouse solutions, and be able to deal with petabytes of data at scale and with accuracy. And so it requires you to run natively and not necessarily stream the data and use the old techniques on those environments.
And, and that, that's where we take a really big, very different approach on that. The other thing is that you need to think about also the architecture of your solution and look for a modern architecture that would address those new challenges. I mentioned one of them being big data, take advantage of the new containerized technologies that allow you to scale up and down natively running on, on platforms like Kubernetes and Docker try to be agentless.
A lot of being able to be agentless allows you to update your solution faster, allows you to reduce the friction with your lines of business and provide a more accurate solution.
And most importantly, expose everything through APIs because the, the modern it environment relies on APIs lives on top of APIs. And that allows your modern DevOps teams, your application teams, and your business partners and existing solutions to integrate into that intelligence layer. So what are the kind of the use cases, right? That you wanted, you want to achieve?
You wanna be able to build an inventory, classify the data, discovery it across all of those different data sources and give your data, data managers or chief data officer team access to that information freely so that you can easily search and see, where do you have application data, which systems contain the most personal information. Do you, what type of personal information do you do you record? But also what is the residency of the data? Whose data are you storing?
Are you storing data of European residents of German residents, French residents, us residents that has an impact on your risk and, and on your ability to manage the, or, or obligations from a legal perspective. Also being able to discover duplicate data and so forth. This has a lot of impact on that.
Once you know, where the data is, you can now map it to business processes and, and have those business processes described, not necessarily the data flow, but the business flow, because you're required to report about your business processes and how you justify collecting personal information for the fulfillment of those business functions. That's the whole idea to collect data. You're allowed to collect personal information, but with legitimate use for a, a purpose that is that you've received consent to, that makes sense for your customers.
So there has to be a business context to the data that's very important, but how do you combine both business context and data and it context together. That's a big challenge today. You have a lot of solutions that are in, in the business space that help you define the business flows, but those solutions typically rely on surveys and interviews.
And because you, there are, you cannot really discover automatically the purpose of processing. You have to ask a data owner for that. You cannot discover automatically how emails flow and as a part of a recruiting process.
So there, so that business context needs to come from collaboration tools and that's fine, but the tools today that do that typically don't have any access to the data. On the other hand, the tools that do discover data, discover where the data is located and, and, and so forth, do not have a business context, so they can tell you where you have data, but they don't know why that data is being used. And for what purpose.
So being able to combine those two together is a critical requirement in order to be successful in automating that process, because data changes all the time, business processes change. How do you notify the business owner that new data has been discovered?
And new data is, is being used once, you know, where the data is now you can implement U flows for data, subject rights, the right to be forgotten or the right to access.
But in order to do that, you need to, to have that data in, in the context of an identity, you need to know not only where you store credit card numbers, but also who is credit card numbers you are storing, what is the purpose of processing of that and so forth. So that is really a foundational element in, in order to be able to, to do that consent. How do you track consent now that you know where the data is and, you know, and if you know whose data it is, you can cross reference and see whether that person actually provided consent. Do you have an agreement that person agreed to?
Do you have terms and, and conditions that person accepted? Do you have a, some kind of a legal binding contract with that person, such as an, an employment relationship that gives you legitimate use of this data for those purposes? So making that cross reference also requires you to know who data is found, where mapping it to an agreement, and being able to look at the data in the context of that individual and, and the agreement and mapping make in, in short answering the question, do I have permission to store this data?
Did that person provide consent for me to use the data for the purposes of use that I have collected? And I know about, so with that, let me give you kind of a quick overview of how we do deal with this with big, with big ID. So big ID at a high level takes a very unique approach to data discovery and data governance and management.
We connect to your data sources wherever they are. We run inside your data center. So we don't collect your data or send it outside of your data center.
We connect your data sources across the board, that we have a very broad set of systems that we support, whether it's structured data in databases, all of the relational data databases, whether it's unstructured data in file shares, windows, or Unix, or emails, cloud storage, whether it's in AWS S3 or in Azure cloud storage or blog storage business applications like SAP, like Salesforce, Workday, NetSuite, and big data repositories like ADU PI running natively in those environments, collecting finding data and correlating all of this data to a set of entities.
It doesn't have to do to be people or typically it would be your consumer or your customers referencing an HR system or a CRM system, or any table that contains profiles of your identities.
Big ID uses that as a learning set to find all of that data and correlated back to that, that entity to build that entity map. I'll talk about when we talk about how this is applicable to other realms in your business, I'll talk about how you, that entity doesn't have to be an identity.
It could be also a product skew, or it could be any master data element that you have in your organization, but that's separate of the, of the concept of privacy and talks more about how you can leverage this beyond just privacy. So the core like Martin mentioned is in building an inventory.
So all of this information flows into this central inventory that allows you to browse that information and see not only where you store personal information, but also whose data it is because of the correlation we can say, Hey, are you storing data of German residents or us residents in which state, because of that reference to an identity and that identity's profile.
So we could say, Hey, where do I store data of German residents specifically, I'm interested in a personal data category of personal sensitive, right?
Which includes these, these attributes, looking at the map, I can see that I store this, some of this information outside of Germany in the us, I can see which data source contains that information. Furthermore, because of the correlation, I can see exactly whose data is is found there. So if I have a breach in this data source, I can go directly to that. I can export that list of identities and report exactly on the people whose data was impacted, notified, just the people that were impacted.
And that helped me reduce the scope of a breach, another requirement by GDPR and, and the CCPA and the, the similar regulations. We also calculated risk score. You could use. And now once you have all of this information in the inventory, we, you could start building data flows that describe your business processes.
And this is exactly what we talk about. When we talk about article 30 record of processing activities. What you're seeing here is basically a very simple chart that describes the business process. This is an HR process. This is not a data flow. This is a business flow, right?
So you have an external candidate that submits a paper-based application to an HR manager that stores it in shared drive. It's almost like a video diagram, right? And you could actually very easily build it, right. But everyone that is in this business knows that the person building it is not necessarily the person that knows the details of this business process. And that's where collaboration is very important. And because this information, a lot of it needs to come from the business users. So I can collaborate and ask the HR person to provide more information, right.
And I could specify exactly what information we need, needs to be provided the, the display name, the description, who is the owner of that process and, and what personal information is used there, which is actually, I'll talk about that because in a minute minute, you'll see that not that some of this, these tasks can be automated, but it's important to be able to collaborate.
And a lot of the existing tools at the business level to facilitate that collaboration as a, as a business owner, I will receive this in this note saying, Hey, please provide some more details and more information.
I can update this and then submit it back. I can also reassign it to someone else I can ask for help from other people, if they can help me. And they can, they will also receive a note and, and, and, and an information. But at the end, I can resolve this. And this is how you collaborate and enrich the information discovered automatically with information that needs to come from the business. So you have the business side here. The data side, big ID is already covering because big ID in that HR share drive big ID already discovered all of those different attributes.
That, that take part that, that were stored on that shared drive by scanning all the documents that are stored in the shared drive, extracting the, the, the PI elements that are of interest that correlate back to an individual.
Also, if they are a, in categories there, they could also be provided as categories of personal information. You can provide the purpose of processing for those categories and for those attributes. And if a big ID does not have the purpose of processing, because big ID cannot discover the purpose of processing automatically.
And let's say, I did not have a purpose of processing for the email. You can see that I, the big ID generates a collaboration task that is directed to the owner of that data, to the person who's the data source owner and asking him, Hey, please provide the purpose of processing. So basically again, that business owner would receive an email saying, Hey, new attributes were discovered, please provide the purpose of processing. And that person will then provide the purpose of processing out of a list of approved purposes, coming from a business glossary, resolve it.
And that information will then be reflected back in the, in, in this business process flow. And you can see that now this, this purpose is, is, is provided here. And this is how we enrich the information that is discovered automatically using scanning capabilities and, and AI, and add to that, enrich it with information that has to come from the business.
And at any point, we can generate a report that provides the full record of processing activities, the type of report that you will provide to the auditor if needed that describes the business process, some general properties, and for each attribute and business flow, the purpose of processing, as well as additional annotations, you can add such as gaps and mitigations and any other comments, free text or documents you want to upload to this process as well.
And if three months from now suddenly big ID discovers a new attribute because one of your candidates added, accidentally added his credit card information into the, in, into the resume.
And big idea discovers that credit card. It would automatically discover, create a task for the application owner saying, Hey, we, we found a new attribute, please provide the purpose of processing. And you could say either, okay, we need to collect that information for various reasons, or we do not collect that information it's excluded and, and, and provide another mitigation for that, resolving that.
And once this is resolved, your data flow is then updated with that information. And you can generate a new, a, a new report. You have a full audit record of any exclusion or activities that someone has provided, and that gives you kind of the continuous compliance that you need.
Because a lot of the organizations we speak with have really rushed to be compliant with GDPR pro performed a privacy impact assessment and provided some basic record of processing activities, but now things change and they don't want to go back again to all of their business owners and redo all of that process again and again, year after year.
And they want to move into a, a continuously compliant mode of operation, very similar to what organizations have done in the past with more mature regulations, like Sox, where they are required to provide accountability for access rights, right, and need to perform access reification. So in the past access recertification was done as a waterfall process every, every quarter or every year, and organizations, more mature organizations with more mature identity management system have moved into a continuous compliance mode where they only certify re-certify changes.
We are taking the same approach. We are Boeing. Those concepts from identity management. Martin is very well aware of that space as an expert in that field. And we are applying the same lessons to this more immature category of personal information and privacy compliance.
So that is kind of the, the continuous compliance aspect.
You know, one of the other aspect that that is enabled by the inventory and by data discovery is performing the subject access request, right? So if Benjamin Terry Young, for example, calls in and ask what information you store about me, that the fact that we have that inventory, and we can tie to the I, I identity sources and of the organization.
We can say what information we store about that person, but we could also find all of those instances of that information in similar information across all of the different data sources that that person has and go out and search for all of that information across the board. And then once those, once that scan is completed, generate a, a, a, a report that highlights exactly what information we discovered about Benjamin Terry Young, where that information was discovered.
And also because we know whose data it is and whether, and we also map to the consent logs.
So every application general holds a log of, of the acceptance of terms and use by, by its users. We can reference those logs and say, and show proof of consent, because the first questions question Benjamin is going to ask once they see their data, is they who gives you the right to collect my information. We can say, Hey, you provided your consent on this date or that date, but big ID can also validate and see whether consent was given for the specific purposes of processing, right?
Because we, that as, as you've seen in the data flow mapping, that enables you also to know what is the purpose of processing of each attribute. And you can now say at a much more granular level, Hey, yes, I found consent logs by Benjamin, but they, he did not provide consent for those specific attributes and not for the purposes of processing specified here. So that allows me to do, go and make sure that there are no violations.
I can download this report, or I can send a request for report removal, where this would basically trigger a, a task for it, to the it people to remove that person with a link to a very detailed report that holds all of the different attributes and records, specific records that contain personal information. And so that it can go to all of those different systems, the specific records and remove those records if needed.
So again, all of this is enabled by having a, a, a data intelligence layer and, and an inventory that kind of tracks all of that, that information.
And as I mentioned, this has benefit beyond just privacy. You could see how for privacy purposes, it helps you both handle personal data rights record of processing activities and govern consent. But this is also relevant for security because now that you know the data, and, you know, at the, you have the sensitivity to know also where you find European resident data or children information, you can now enforce more granular policies on that because the existing DLP tools, security tools, they don't have that sensitivity to the type of identity or, or who owns that data.
They can distinguish between a credit card number of an employee versus a, a, a, a customer or date of birth of a child or an adult.
But big ID can help that with that. And by labeling those files that contain that information can enable third party tools to have more granularity, use that the APIs to apply the identification to the right cells and information where you need to apply that information, provide the access governance information.
So who has access to the data as well as being able to use it for data governance purposes, because essentially what you get is a full catalog of all the data.
Now, you could use that catalog for a purposes of governance and enrich your existing data governance solutions with additional metadata business metadata that they don't and, and operational metadata data don't have access to data flows and lineage and retention policies can be managed and processed based on that information as well, just to illustrate our, our way of thinking and how we see this really at the core, what you need is that core discovery and indexing across the data and do that across all types of data.
And that's important because you have a lot of silos today with solutions that are focused on unstructured data solutions that are focused on structured data, and nothing really ties all of this together. So you need to be able to tie to everything, a set of APIs on top of that. So your solutions can, can now integrate with it for big idea. We have implemented specific application that I, I, I, I kind of demoed in that presentation, like the inventory, the access request management, the data flows.
We didn't get to see all of the other capabilities that are there across the data, access, intelligence, breach, response, investigations, data labeling, and so forth.
But there are many more that you could do, but the fact that you have an API allows you to now integrate with your existing ecosystem with an existing solution that you may already have in your, in your environment, like, for example, for privacy management, for encryption labeling and building integrations into other tools that address those different realms, because privacy, governance, and process like, or, or arch site, they can use the APIs to, to enrich their, their data, not to rely only on surveys and interviews, encryption with solutions that do encryption labeling with Microsoft information protection, for example, or others performing the identification data governance based on the data that was discovered.
So there are uses across the board for this data intelligence layer that spans across other multiple realms, not only, not only data privacy.
And also if you look at the overall life cycle management from the definition of what you want to, to protect and manage the classification process across structured unstructured, but also metadata and application all centralizing into a central inventory that then allows you to apply governance activities like privacy, consent data, subject access, right, recorded trusting activities, but also data access governance who has access to the data and data governance activities like data quality, the duplication lineage that are associated more data governance, still not touching the data, but just applying governance on top of it.
And then the next step is really applying controls, remediating by labeling files and data sources. So that third party tools can apply policies based on those labels or actually remediating and changing access rights on files to reduce, to remove everyone access or world readable files or applying data deletion and so forth. Like I mentioned. So those are kind of the, kind of our philosophy around the, the, how important data discovery is.
And, and having that intelligence layer across the board, we are handling those aspects of, of privacy and data governance and security, and through the, the API layer, making other tools and your existing investments much more accurate and, and productive with that. Maybe I'll pass it back to Martin, to you and open it up for questions from the audience.
Yes. Thank you very much Nira for this insight you delivered. I think that's very helpful, very deep. Yeah. I think it shows a lot of stuff around what you should do as a business to get better when dealing with data.
So we already have a number of questions here, and I ask the audience to enter additional questions if they have, but given that you already have a couple of questions, we should directly start with the Q and a also given that we only have a few minutes left. So let's start, start with a question of which is, I think very important for a lot of businesses, which is, can you find data in different languages?
So specifically for the big idea, because of the way we discover data that is based on a learning set, we basically learn based on your data.
So if your data is in, in German or in Chinese or other languages, that's, that's the data that we will look for that big will search for. So for the, that type of discovery and correlation, yes, definitely can search in any language when you are using tools like regular expressions that are based on regular expressions. And big idea also has regular expressions, but tools that rely on the regular expression will need to build a regular expression based on the language. And that would require some additional configuration.
So why, why, why don't does someone need to know the identity of the data? That's another question I see here.
Yeah.
So, so, so there are obviously the privacy aspects that I mentioned, right? If you want to automate the subject rights, you need to find a person's data. You need to end, that person wants to be deleted. You need to find that person's data and deleted. So that requires you to know where that person's data is located.
Also, if you discovered that you had a breach in one, your data sources, you need to know whose data was on it. So you can notify just the people that were impacted. So those are kind of the, the general requirements from a privacy perspective. But also if you look at the, for pure discovery, not even for, for privacy purposes, knowing the identity gives you much more assurance regarding false positives, because if you, if you find someone's social security number and that social security number is a real social security number of one of your customers, you know, it's not a phone number, right?
So it's, it's much more accurate. And that the ability to correlate to an identity gives you also the ability to find information that is not necessarily highly identifiable, like a date of birth. A date of birth is a date of birth only when it's found in the proximity or in the context of an identity. If you know that proximity or context you can distinguish between a per a, a date of birth versus a transaction date or any other type of date. So that helps you find not only PII, which is highlight the F like a social security number, but also PI personal information, like a date of birth.
Okay. So those are kinda examples. Yep.
So, so a lot of S that's another question doesn't seem to get that many requests about which PII and PI they have stored. So why should they automate anyway?
Well, I think that one of the major reasons is that sometimes when you do get a request in complex environments, it's simply very difficult to if, if not impossible, to, to be able to comply within the 30 or 40 day timeframe that you get from a, by the GDPR. So being able to do it quickly is sometimes impossible to do manually. That's one case, but I think the most compelling use case for, for having to automate it is for the assurance that you actually deleted that person, because today that person wants to be deleted and you did it manually, and then he was deleted.
But then how do you make sure that that person's data doesn't resurface? Because it was brought back up by, by a backup, for example, and suddenly that information resurfaces in one of your data, data sources or BI tools.
And, and so in order to routinely be able to validate that a person has actually not resurfaced after they requested to be deleted, you have to do that automatically because those requests would add up and very soon ever after, even if you're getting very few of them, very soon, you will be getting, you you'll have tens, if not hundreds of people that you need to validate that are not coming back, not thousands that are not coming back to, to your system.
So that assurance, I think, is that most compelling reason why you want to automate it?
Yeah, I would say, I will say particularly, you always should be, be prepared because when at some day you start getting a lot of inquiries that really would put you in trouble. And I think it's when we look at the overall audits requirements, compliance requirements, we spend far too much time and businesses with manual fulfillment of such requirements. And it really costs us the time we need to get better. So try to automate, try to do it ride once. And I think that's really way to do it. One other questions I think we can pick before we are at the end of the time we have.
So if you already did a PI assessment or privacy impact assessment, isn't that enough? So, or with other words, what is the difference between a continuous approach and the one time assessment and why should you go forward and continuous approach?
Yeah, so we, we hear that a lot. So a lot of organizations have, have done that, right? I've done a privacy impact assessment, and then maybe a record of basic record of processing activities, which is basically a spreadsheet where, where each row is a line of business.
And, and they've reached out to their business lines of businesses and, and received kind of a semi accurate view of the data. And, and they, and, and, but now they, they've probably spent a lot of time and a lot of money doing that either internally or through consultants. And then what happens next year? How do you sustain that? How do you make that an ongoing process that you don't have to go through this effort over and over and over again? Because a lot of it can be, can be automated. So I think that is the main driver that is now driving people to say, okay, I've done it.
I'm compliant.
I've done something basic. I, but now how do I make it cost effective and accurate? So that's one thing. So they wanna automate it. And then they also, during that process, realize the importance of this data flow and data mapping, knowing where the data is for their data governance, people, organizations spend a lot of money on data brokers to get enrich their information for marketing purposes across the using external data, not knowing the amount richness of data they have internally.
So this process can help them discover the data they own, and for far beyond just privacy, also for their internal uses as well.
Okay. Thank you very much. So we are very close to the top of the hour. Thank you very much numerate for your presentation. Deep insights provided for the answers on our questions we had.
Thank you. And thanks. Very inviting.
Thank you very much to the audience and hope to see you soon at one of our upcoming events, be the onsite event or of webinar. Thank you very much. Bye.