Event Recording

Explainable AI for Internal Threat Detection in a Mixed Identity Landscape

Name: Explainable AI for Internal Threat Detection in a Mixed Identity Landscape
Uploaded: 2024-06-07T12:00:00+02:00
Duration: 17 min 41 s

Sayantan Polley

Senior Principal

Technology Risk Partners

Posted on Jun 07, 2024

Did a John Doe make a sudden quick transaction of the entire procurement cycle from purchase requisition to purchase order to payment, very quickly than typical transactions? Did "John Doe" create the purchase requisition, while "J Doe" made the final invoice payment? The problem of finding and matching John Doe versus J.Doe is attempted in an end-to-end procurement process to detect internal fraud, based on data from ERP systems. We present an explainable AI (XAI) approach that attempts to explain to business users how certain features of data affect high-risk indicators in a procurement process. Various state-of-the-art XAI methods and novel contributions are empirically evaluated to provide an explainable AI-driven output that can be trusted by business users for actions.

Video Description

Short Summary

Lorem ipsum odor amet, consectetuer adipiscing elit. Luctus fames rutrum metus habitasse donec quis turpis.

Nibh porta tristique sociosqu eleifend condimentum sapien ultricies. Dapibus rhoncus urna elit commodo blandit ut vestibulum tristique. Ante parturient morbi maecenas leo ac est dolor aliquam iaculis.

Leo vehicula vivamus ipsum lacinia cubilia torquent accumsan! Viverra a dictumst dapibus; nam consequat felis mus. Euismod semper iaculis congue mauris nullam.

Become a member of the KuppingerCole Community to access this and thousands of other publications.

Interesting Facts

Become a member of the KuppingerCole Community to access this and thousands of other publications.

Recommendations

Become a member of the KuppingerCole Community to access this and thousands of other publications.

Takeaways

Become a member of the KuppingerCole Community to access this and thousands of other publications.

Video Description

Short Summary

Interesting Facts

Recommendations

Takeaways

Show Transcript

Thank you. Good morning everyone for having time today. Morning right after coffee for this session. My name is Shaan. I'll take you through a use case on where we use explainable ai where business users can consume AI in a mixed identity landscape for internal fraud direction. So very briefly about myself, I'm a PhD researcher at University of Mac book. My publications are mostly in the areas of ethical ai, explainable ai, and I run a small startup focusing on security and risk management and where I'm applying some of my research and other technologies tools into real life applications.

So the problem that we had to solve for a global business house based in Europe and also regulated with SOX compliance in the US was when you have multiple identities and you already have an access and identity management tool. But typically what we have seen that when you onboard applications like your SAP or Oracle Transaction System, these count often go for some companies into 10 4,000 or thousands. And in those cases some identities might look similar based on certain attribute, but is it really the same John Doe versus jdo and Mary Jane versus mj?

So that's the problem we had and we tried building on simple rules that where if, if the algorithm fails, it also tells the business user why we failed. So we started getting data from different transaction system and as you see Elrich, Bachman, Elrich, Bachman, pardon my Deutsch, and now we have Python scripts where it tells you that okay, there's a certain degree of match between IC and elrich and the way it automates that it is efficient, it runs and then it gives you a bunch of false positive which can be manually removed so that you have a overall reliable output.

So this was one part of the problem how we solved it. The other bigger part is in a business application. Now if you look at any organization which is using procurement practices and typically depending on what you're procuring, it starts with a request or requisition. You get the items, you receive the items, then you create an invoice or a bill and then you make the final payment.

And what you did last summer is that what happens, that a very classic use case is that some transactions get executed very fast, was one trait we found in literature and we actually applied it and we got results that certain transaction, depending on what you're buying are done very swiftly. Maybe there was a phone call, maybe there was a collusion. So how do we determine it? And this unsupervised, you don't have labels on what is good and what is bad. So it makes it difficult to train a deep learning algorithm and stuff like that. So what did we do? We started with simple rules.

For example, all transactions would tend to follow a typical pattern based on how long it takes to onboard a vendor in your organization, how quickly you receive goods depending on the category of goods. So this is the standard bell curve and we started giving user options that okay, you could tune to find out two standard deviation. So basically all the items to the left of this blue to the right of this red are the ones that can be potentially risky and even more when some of them are done by J do and some of them are done by John Doe.

So this is explainable because we could tell a business user that these are items done by Jay and Jane and these are faster compared to the other transaction. So this is one of the interface of the tool. I have a live demo I'll show you. So you see this is the overall cycle starting from a requisition on the left to a purchase order goods, receipt, invoices, payment for a certain category of item. And now these line items that you see at the bottom are basically those transactions which are risky. So gen AI is there and we are using it to generate explanations.

So we are fine tuning our own data on large language models. And now this is telling a CISO or a head of procurement that these records are outliers for flows between purchase order to goods receipt. These happened relatively quickly compared to the other transaction based on mean and standard deviation. And then when you drill down, you actually have the transactions. And the example that we did was the company actually ran an internal investigation because you get false positives.

This is completely unsupervised and happy to tell you that on couple of occasions the output of the algorithm was actually giving results which were investigated and there have been something which is not the usual, not the show. So let's see what we did last summer and demo.

Hello, this is a demo video of our AI platform in a mixed identity landscape. So this is the landing page of our risk management platform IRM. You see a hundred percent data has been synced, meaning the user role data of one transaction system, which is SAP, is synced along with that for another system, which is system two for Oracle Fusion data of users and rule has been synced on a certain date and time.

Now based on after the data has been synced in the platform, we run what is known as a cross platform simulation, which is using the fuzzy W logic to match the John Doe versus jdo based on a risk library that looks at ask access risk of the system in the sense that if John Doe can create a payment in SAP and jdo, a matching name is able to create a payment or create a vendor master in Oracle Fusion. And hence that's a risk. So if I navigate to the dashboard, this is where you see that cross application users.

So this certain user on the first line has access to SAP and the same user with a different username has access to Oracle and you see over the status column which shows certain users which have violated certain exceptions on rules. And if I click on the show, this is showing me that via a certain rule in the SAP transaction system like zed fi, you have an access to processing vendor invoices through a certain object and a privilege in SAP.

And thereby you could process vendor invoices and the same user with a matching name of a John do and a jdu in the Oracle system have access to an entitlement of creating payables invoice via a certain rule. So this way you have found out Ross platform access risk, however this is what we call as a can do scenario that someone can do a certain transaction. Now let me take you to what we call as a did do scenario. This is the second use case where after we have matched the identities over here, this is a page where we configure the certain risk standards that we were talking.

So for example, if you put the number two standard deviation, you see that these boxes spread out. So if I click one, you have one standard deviation away from a bell curve. So imagine you have all your transaction per purchase requisition to purchase order, purchase order to GGRN or goods receipt GRN to invoice, and then you have invoice to payment. When you press a number two over here, it means that you go away two times away from all the transactions which are two standard divisions away meaning to the left of this blue box and to the right of this red box.

And then you place a margin, a tolerance margin, say 10%. And then you say that when I move to standard deviation, can I look at transactions which are like 10% plus and minus And this is where we upload that actual transaction system from your SAP or Oracle system. And once we run this algorithm, what you see over here is a particle flow diagram showing you the entire flow of data from requisition to purchase order to GRN to invoice and to payment.

And what you could do in principle is that you could look at certain transaction line items which have, you know, for this vehicle I can category has overshot the usual two standard deviation plus minus certain risk threshold. And if you click on a certain element, which is interesting for you, for example, if I click between invoice to payment, we generate explanations in the philosophy of gen ai. So you see these explanations getting generated which are saying that these transactions are outliers are not typical for flows between invoice to payment.

These transactions happened relatively quickly compared to other transaction based on mean and standard deviation values and the value of the transactions and the actual values of PO invoice is then over here. And usually in our experience the way it happens is you run an investigation and you try to see that based on your business logic based on your industry, which which of the flows should be investigated for a certain category of items.

So this was the demo of the use case of the transaction level analytics created by John Doe and J do maybe John Doe created one part of the transaction and J do created the other part of the transaction. So thank you for watching Questions. Any questions? No. Let me see if there's any online questions. Nope. Okay.

Okay, one question. How, how do you match the accounts between the, on the first page, you the name and the all Good question. So because I can't put really customer data over here. So as you could guess that I had certain other attributes and we really leveraged the enterprise data lake where we could look at for which country typically, which attributes you could make a guess. And it was sort of a, as it is said that 80% of your time is spent in engineering the data and 20% on analysis.

So we started with some items that have put over here, but it had to be discussion with business and, but it's actually data driven to a large degree But no biometrics, right? There are certain aspects in certain places where we use that information, but I can't diverge it. Understand, but, but, but yes, the answer is yes, Your data analysis which give you the best result, there are two aspects of that, right? You can get identities which is wrong or you can reuse identity the wrong way because it's not the same identity, right? True. Based on your data you project, right?

Which one were the most, you know, used proper matching. We used a actually developed a proprietary algorithm where we took certain factors based on the last name and something like mother's maiden name and you know, the last four digits of certain other attribute. And we created a component composite key with certain weights. So this was something we tried and this worked fairly good and this I'm talking about which has about, we onboarded like 20 different countries having 20 different systems and I am was covering maybe 50% of them.

And in this case where you do this kind of detection, right? Right. Kind of segregation issue, right? Because someone profile that typically, right? So doesn't mean that there's no IGA or global proper IGA definition. So rather than Very good question again, typically what we have seen the IGA A tools in the market and the application GRC don't talk very well to each other because the rule set sitting inside SAPI could name something like a Z analyst and I could have a sensitive T code with a certain authorization object. I am is only going to authorize that user.

So that deep level analysis is usually some tools are coming up, you are right. However, what we do is we don't only look at can do, we say, okay, you could have access, let me look at what you have actually done. And that gives you a holistic bird's eye picture of the provisioning part and what those identities are actually doing in the system. When I say system actually making a payment transaction, I don't think an IGA tool would typically stop at the identity part. I don't think it's looking what transaction in your SAP system you're doing at two o'clock in the night.

That's, so we really covered a very bespoke use case bringing in the identity part and then the application risk together. So typically these are different tools that are doing these things and people spend a lot of time plumbing them. Thank you.

Yeah, We have time for, for one more question. There's one online. So very quickly you can address this. The question is, how can an explainable AI approach help in detecting and explaining potential internal fraud in an end-to-end procurement process? Specifically when there are discrepancies in user identities like John Doe and Jane Doe across the different stages of the cycle?

Again, good one. I won't say this is a problem which is solved a hundred percent no matter because not everybody's having data like Google and Amazon that my models are trained very well. So usually what we do is we keep what is known as a human in the loop. So explainable AI helps in the sense that wherever the algorithm is not sure it actually transparently tells you on the screen that I am 30% sure or 40% sure and I need a human intervention. So that's one way we try to, what we call, we try to mitigate the risk by not claiming that the AI agent knows it all. Thank You.

Okay, thanks a lot.

Like this?

Don't like this?

Why don't you like this?

Explainable AI for Internal Threat Detection in a Mixed Identity Landscape