Good morning. It's Alasdair Anderson. What I'm going to talk to you about today, well I'm going to tell you how to do two things, or demonstrate two things, or talk you through two things. The first thing I'll talk about is how you develop a secure data sharing service. The second thing I'll demonstrate is the one problem that ChatGPT can't solve, the Glasgow accent. It's impossible for artificial intelligence to understand. So if you guys do, that's great. If you don't, put up your hand and ask a question and I'll go into BBC English for you.
Okay, so what's a secure data sharing service? Well, first of all, why would you need one? And really, it's a challenge of the sort of world that we're living in. I think up until OpenAI released GPT 3.5, the demand for data was pretty high. Since the last 12 months, the demand for data has been off the scale, absolutely off the scale. And when you realise that they train those models in the entire internet, is that When you realise they train these models in the entire internet, you realise that getting them more information in the entire internet might be a bit of a challenge.
At the same time, you're seeing a lot of regulations about what is an acceptable use of data. And also, what I would describe as a sort of data nationalism that's going on, and that people are, people being governments, are starting to de-globalise almost, and insist that data be held within certain countries, the data cannot travel. And therefore, you have to be very careful about what information goes where, what happens when it crosses borders.
And this is especially important when you look at global supply chains and how you run your business, especially if you're in the virtual e-commerce world. So you have this classic sort of squeeze between supply and demand. The demand for information and utilisation and being able to use data is way higher than it was even just 12 months ago, and it was pretty high then. And at the same time, people are cutting off the supply. So how do you go about answering these challenges?
And this is the point of the presentation when I realised it's someone else's slides and I didn't realise there was animation on it. Oh, click, click, click, click, click.
Right, there we go. So what you'll notice here is I've got two techniques for protecting information. There's another thing you'll notice is that encryption is not there. So encryption is as useless as it is useful, because as soon as you encrypt data, in order to use it, you have to decrypt it, which gives you a problem that information is unclear. The other thing I'll point out, apart from the fact we've used American spellings, which also shows that it's not my slide, anonymisation, is that these are legal terms.
Because when you go to use information, you're going to have to clear it with the lawyers, with compliance, with the regulators, with your authorities. So it's good for technical people like us to get in the mindset of using the language of the people that are going to give permission for what we do.
So really, to our mind, when you protect information, you have to keep it useful, because if you encrypt it, it's by definition quite useless. It's very protected, but it's quite useless. And therefore you have two techniques in order to protect data. One is pseudo-anonymisation, and one is full anonymisation. The difference is, well, it's up there in the board, but the way I explain it is, pseudo-anonymisation, you can get back to the original data. So you can see the first name and last name of the information you've protected.
So somewhat like encryption, except you don't have to unprotect in order to use it. Anonymisation is different. It's a one-way process. I think we've all used very, very sort of easy versions of it when we're trying to generate or test data sets for non-production environments. But nonetheless, when you're looking at how do we grow data sets, how do we share data, anonymisation is a very, very valid technique.
What I'm going to focus today on is the pseudo-anonymisation part, and just give a demonstration, not an actual demonstration, there's no computer here for me to use, thank God, but a demonstration of what our customers use this for. But before I go on to that, I'll explain what I mean by pseudo-anonymisation and how we approach this problem.
And again, here we have animation I didn't know about. Okay, so it's a really simple technique, actually, but it's really hard to do. The first thing is you look at all the sensitive data. So not all data is equal. So if you look at your average database or file store, wherever you have your information about your customers or your processes, there's loads of data bits that actually you don't care about, you know, database flag fields, you know, markers to say what their preferences are on a certain screen.
Those are very different pieces of information where we examine that against first name, last name, your social security number, your date of birth, even your gender actually is quite a useful piece of information for the bad guys. So we would allow you to tokenise that data. Now what does that mean? It means you take the information and you replace it with a token, which is another piece of information. But it's a two-way scrub, where it retains the character in terms of length and also the type, but also maintains referential integrity.
So that's a database term to say that you can still run your business intelligence reports, your analytical models, your AI processes on the data without ever having given those processes the actual sensitive data itself. Why is that hard to do? Because I said this was easy to explain, hard to do, because the first thing you have to do is where is my sensitive data? That is a bit of a challenge when, you know, I'm sure some of us are looking at databases that were built 20 years ago, maybe even 30 years ago, and the people that built them are retired or dead or even worse.
But nonetheless, you've got to find that data, then you protect it, and then it changes the way that you deal with data. Your data in your enterprise is by default always protected, and you only unprotect based on a permission and a rationale and a justification to use that data, and that takes me back to the legal definitions. What data are you going to access for what purpose and why should you have access to that data? And it allows you to build up different views. People in different job roles, people with different permissions, can see different views of the data.
I'll show that in a minute. But really, viewing data in the clear becomes the exception. And if you think about that, that really radically reduces the risk that you're carrying within your systems, within your processes, within your interfaces.
Okay, so we'll put this slide up because it's easy for me to talk to big pictures. So when you've done that, when you have put your sensitive data protection in place, it then gives you ability to map those unprotects.
Remember, we protect by default and then unprotect on exception to the need to see the data. So when you're looking at data in a database, a database administrator or a systems administrator, they never need to see that information, so it's always protected. When you're looking at someone who's doing maybe some data science, so some modeling, analytics, AI-type stuff, they have a need to see, in this case, you know, date of birth.
Sorry, just the gender and the location, but the date of birth and the names are still protected. When you're looking at someone, in this case we've taken medicine, when you're looking at someone who's dealing with patients in a hospital, they need to be able to see everybody. They need to be able to call your name. They need to know who you are, okay? But they only need to see that one record at a time.
Whereas a data scientist needs to be able to run their models on the entire dataset, all customers, and that's where we run into really big problems in utilizing information if we don't protect, okay? So by default, everything is protected. By exception, people are able to see the data as they need it, okay? Now what it allows you to do is build up a model of how you deal with your data on a use case by use case basis.
I mean, this is at a very high level, and I'll talk you through some customer examples of how they've implemented it. But really you have a, you know, you always have central functions, whether they be your legal department, whether it be your compliance department, people who tell you what you can and can't do with information. Then you get down to, you know, is this information used inside the company or is it used outside the company? And then when it's used inside or outside the company, where is it used and for what purpose?
So you start to go down the sort of the decision tree of can someone see some data, and that allows you to implement policies. Now the good news is you guys are already doing this because this is the same approach we have when we look at authentication identification type models, okay? What we are doing is taking a deeper view and then eliminating that sort of, that exposure of data in the clear. So following the zero trust and trust no one and ask all the time, we are making sure that people are asking the questions, why do you need to see this data for this purpose, okay?
Now what I'll do, I think it's always relevant to find out, I mean, look, I'm a software vendor, I'm going to tell you to stand up and say this is fabulous, this is the best thing you've ever seen in your life, but what's best to explain is how people actually use this software, okay? And if you buy me a coffee, actually you don't need to buy me anything, if you get me outside I'll tell you it is.
Okay so this is a global bank, super big bank, you know your two trillion dollar balance sheet, although you can see this slide was made in America because Switzerland is somewhere beside Benidorm and Korea is in the Pacific Ocean, but nonetheless this is an example of really where our software does its best, right? So we have a team of two guys, literally two guys in New York who run the centralised control function, but they then allow that service to be consumed, so they call it DPaaS, data privacy as a service, they allow that service to be consumed via REST API globally.
Now the enforcement and the controls are then pushed down to the regions, so the biggest success they had was dealing with Switzerland, which is always a problem for, well it's a problem for most folk, despite the fact they're neutral, neutral seems to be they don't do anything, but it allowed them to actually implement data security from the US but make the people in the Swiss private bank in this case, completely the last line of enforcement, so none of the policies that were applied in Switzerland were accessed by anyone outside Switzerland, so it was quite a rigorous control but also a dynamic implementation, and what it means is it allows you to move data protection back into your development environment, you can, because it's just a REST API, you can embed it into your DevOps processes, you can have part of your continual build, your data is protected, so you flag the data, you find it, that problem I talked about before, and then you auto protect as part of the development, and then you can see a global view of who is using the protected data because you can scan your software repositories, we all do that, and see what build processes are using the protection and what are not, so it allows some of the, this is a bank with I think 200, 300,000 people, it allows the executives at the top who are responsible for that risk to be fully attached to the risk that's getting carried at the ground level, and view that across the globe.
Now the fact that they can do that at a very rapid pace, because it's not our software that we are implementing for each project, it's our software that's being made available for people to implement, so they've got 40,000 developers that guys can all access the same REST API and build it into their code, so it allows them to completely parallelise the rollout of the security, and we've all been through big projects that are centralised and you become a bottleneck, so that bottleneck is removed, but the central security administration and the running of the services is run by just two people, whereas the enforcement is done by all the local regions, which is consistent with their laws and their business processes, okay.
I'm going to change industry here, we do a lot of work in the medical sphere, more in the US than we do in Europe, that's just a fact of life for the medical industry. This is a company that does kidney dialysis, so if you have kidney disease it's a pretty awful thing to have, you have to go and get a dialysis three, four times a week. This company are in the front side of that in that they have clinics where people come in and they get their treatments.
So one of the products they have actually is a kidney dialysis holiday service, believe it or not, as you can go on holiday but you're beside a clinic where you can enjoy the sunshine and at the same time stay alive, which is also a good part of a holiday. So the challenge they have is they're part of that whole medical world, as we saw in the pandemic you are expected to be able to share your information really really quickly.
So the customer success in their world is that they are improving the quality of life of their patients, that they are working with the medical equipment suppliers and the pharmaceutical companies to get to better outcomes of people's health. In order to do that they want to share their information, but they're spread, they're actually a Swedish based company but they're spread all over the world and therefore they run into all those data movement challenges we talked about.
So they are able to protect the information once and then look at what is applicable based on the sort of level of trust they have with their partner. So for internal type processes they may actually have data in the clear like we saw with the example with the nurse in the hospital, where when they're working with pharmaceutical companies it will be completely anonymised so they can never get back to the patient.
So it gives them the flexibility to use the appropriate protection for the appropriate partner as well and obviously they can talk about how they're significantly increasing people's quality of life. The next two, last two examples are really come out of the sort of financial world and that's because there's a lot of processes that banks and insurance companies do that they all have to do. So things like check for credit checks, check for anti-money laundering, stuff like that, check for sanction screening. There's obviously no competitive advantage in someone doing this great.
If they do it great, fantastic. The real danger is that someone does it really poorly and exposes the whole network to terrorist financing or the mafia cleaning their money. So this is actually a Belgian bank, this is AML, stands for anti-money laundering and this is a service that they've launched. We're used as a data protection there and what it allows people to do is to send all the very sensitive financial transaction information to the central service, completely pseudo-anonymised.
These guys can run their models on, this is a cloud-based system, these guys can run their models and they're looking for what the banking folks call red flags or SARS, suspicious activity reports, which is people who could be bad guys. And in some countries you actually, as a processor of that information, even if you're the bank, you are not entitled to see who that potential red flag might be. So our protection allows all those processes to run.
I used to work for a financial services company in Denmark and actually the bank could never see who the red flag was, we had to give it to the police. So you get that ability to have a differential view of the data, that goes straight through the organisation in some cases where it goes to an external party and only they can see who that is. The last example is similar. This is for a global analytics vendor, there's only three of them so you know you take a guess, you've got a 30% chance of getting one right.
This is about, again it's a shared function that everybody has to do, what is the risk exposure on commercial loans. I think as we've all seen, offices are a lot emptier than they used to be and therefore people are working out what is the potential of a default, how much money would we lose if this loan goes bad, stuff like this. This is a service that was always delivered on premise, it's now being delivered in the cloud and again people can send their data to it, pseudo-anonymised, it goes through the whole processing and then it only becomes re-identified when it comes back to the client.
So the processor never actually sees the data. So really that sort of magician's trick that we play has many uses as you look at it across your systems, your processes and even the people that process that data. And that is my crescendo, so thank you for listening and if you've got any questions...