Event Recording

"That’s Not Fair!": Detecting Algorithmic Bias with Open-Source Tools

Name: "That’s Not Fair!": Detecting Algorithmic Bias with Open-Source Tools
Uploaded: 2021-09-15T12:00:00+02:00
Duration: 22 min 44 s

Mike Kiser

Director, Strategy and Standards

SailPoint

Posted on Sep 15, 2021

The harm that the misuse of AI/ML can have is obvious, from the ProPublica Recidivism piece from 2016 to the latest discovery of bias in facial recognition classifiers by Joy Buolamwini.

The need for tools to use AI/ML ethically is concentrated in two particular areas: transparency and fairness. Transparency involves knowing why an ML system came to the conclusion that it did—something that is essential if we are to identity bias. In some forms of ML, this is difficult. We’ll cover two tools to assist with transparency: LIME and SHAP. We’ll highlight where each of these tools performs well and poorly, and provide recommendations for utilizing them in unison where appropriate.

Once transparency is established, we’ll pause to evaluate potential sources of bias that would affect the fairness of a particular algorithm. Here the number of tools available is far-reaching. We’ll start with an explanation of bias metrics, explaining the roles that true/false positives and true/false negatives play in calculating various accuracy metrics. The basics of fairness established, then we will explore various tools used against a few, publicly available sample ML implementations. Tools in this review will include: Aequitas, AIF360, Audit-AI, FairML, Fairness Comparison, Fairness Measures, FairTest, Themis™, and Themis-ML. We’ll compare these tools, providing recommendations on their usage and profiling their strengths and weaknesses.

Mike Kiser, Senior Identity Strategist, SailPoint

Video Description

The harm that the misuse of AI/ML can have is obvious, from the ProPublica Recidivism piece from 2016 to the latest discovery of bias in facial recognition classifiers by Joy Buolamwini.

Mike Kiser, Senior Identity Strategist, SailPoint

Show Transcript

In late 2017, Tara Simmonds stood before the Washington state Supreme court and asked them to allow her to become a lawyer. She was an ex drug addict, the daughter of ex addicts herself, and she had been convicted and sentenced to number of years in jail in the United States. After that, however, she went to law school recovered her, her footing. If you will graduated at the top of her law school, but the Washington state law board denied her the right to become a lawyer.

They said, no, no, no, no. Your past is too dangerous. You might be the old Tara. There's a new Tara, but they decided that her past determined her future and the law review board didn't want her to become a lawyer. So she stood there and asked them the Supreme court to overrule basically her case resided on her past determining her future. Right. And that's nothing new for those of us in this room. Algorithms predict, use our past to predict our future all the time, right?

Whether it's what we're going to watch next on our streaming device, or if we're going to be financially solvent in the near to long term future, or even if we are likely or not to commit a crime. But as a previous speaker alluded to, we've all seen the danger inherent in using some of these algorithms to determine our future based on our past. Sometimes they get it wrong and sometimes they cause quite a bit of harm. So taking an ethical approach to AI, just like we just heard is essential. If we're going to use this for our benefit and avoid doing harm as much as possible.

Now, when we talk about AI, it's really important at the very beginning to draw that connection from just magic AI and machine learning to identity, which is what we're here this week, right? If we define identity as kind of three areas, attributes or characteristics about that identity, the access that identity has and then the activity or the behavior that identity has, whether it's working from particular locations or using that access.

Well, that's all the data that AI and ML is drawing from in the identity arena, right? So if we're talking about identity, what we're actually talking about is algorithms. So anytime you hear someone making decisions about people based off of algorithms, you should immediately be thinking they're using AI and machine learning to make decisions based off identity and identity attributes and, and data, which would therefore definitely concern us.

There is a much longer discussion about building ethical frameworks from defining harm all the way down to even user data rights and using standards to protect privacy and things like that. Given our time today, I just want to focus on two specific areas of these frameworks, transparency and fairness. Now the level set, let's talk about transparency first and transparency can have meaning on a couple of different levels. Part of it is that since these algorithms learn on their own, we're not telling them how to learn. Sometimes it's really unclear why they're making decisions they're making.

And if we can't figure out why the algorithms are making the decisions they're making, we can't explain it to our end users who also need that transparency. Right? If I'm making an algorithm to make choices about you, like as in Tara's case, I should be able to explain why so that you can push back if you think it's unfair or if data's not correct, or if the algorithm has gone horribly wrong in some way. Right?

And the way I like to explain this to normal people is to use the story of a conversation I had with my child, my youngest child, when he was two, two and a half, he was on the playground. And lots of times on playgrounds, there are bubbles floating around, which is super exciting, right? He runs up to me and he says, Papa, why are bubbles round? Right? And now I have a choice, right? I can do explain this a couple ways.

One, I can go into math, right? And start with, from wherever I want to go. High order calculus. That's gonna take some time. There are a lot of big words. He's gifted at math, but not that gifted, right? That's not gonna work. I need to think of a way to explain it in terms that he can understand and use in some way. And so what I did and what I do in general now with kids, if they ask this question is I say, look, water is made up of little bits of water and water, just like you has friends.

And they, they wants to be as close to his friends as possible. And the shape that makes when they get as close as possible are these little bubbles.

Now, basically I explained surface tension on our very macro level to him. I didn't use those words, but what I did was I gave him information that didn't lie, but it's information he could understand. He could process. It makes sense. He likes most of his friends and he ran off and he's set up for the future. We need to understand what these algorithms are doing so we can do the same thing for our users. Right? And I won't talk too much about that today. I'm gonna talk about the figuring out what the algorithms are doing, but that's also a key point after you figure it out.

Now, transparency also has benefits for how we use these algorithms, right? Whether it's safety, think of a, a test, a Tesla, not being able to identify a pedestrian or a cyclist and running them over or model adjustment, right? As these things learn, you can say, oh wait, no, that's a terrible conclusion because I, as a human know, this is not the right conclusion. So you can adjust how the model is using the data, how it makes those conclusions. If you know how it's trying to think, and then objective assessment, right?

Figuring out why it's making these conclusions can give us new insight into these decisions. But also again, help us to say, that's not the essential part, or it is the essential part of the algorithm, whatever it is, transparency can actually help improve the algorithm as well.

Now, as we seek to try and figure out how it's ma these algorithms are making decisions, there's a couple of different types of model here. Some of them are really easy to interpret, right? Think of some AI and ML using like a decision tree, basically a nested set of if thins, if this then a, if not then B those are really clear, right? Cause it's just a series of if thens and yes, that could be an actual AI ML algorithm. Other times it is nearly impossible as this cartoon suggests, right. A very popular method is using linear regression.

It's kind of a first shot and interpreting a model basically saying, is there some factor that's connected, that's correlating or causing this relationship. A lot of times you don't have it. And so if it's easy to interpret, your job is halfway done already. If it's not, then you need some kind of tools to help you through that process. Let's dive into one more thing and we'll, we'll get to these tools here. I'm gonna take you through a concept and then we'll take you through two tools. The first is global versus local interpretation at a high level.

What this means is that globally, a lot of times, if you try and explain what a model is doing globally, it won't make sense. There are too many factors, too many features in your AI ML data set and the resulting algorithm that generates so that you can't really do it. So one of the things that these tools will do is they will reduce the scope to a very, very localized portion, a small part of the data. And they say for this set of features or for this part of the result set the feature map here is the reason the decision line. If it's over here, it's, it's one element.

If it's over here, another basically it's reducing the problem set so it can interpret it locally. Instead of globally, there are two tools here. And for both of these tools, I'll show you examples using this readily available diabetes progression set from Stanford, basically what this does takes in a number of attributes about patients. And then it gives you a prediction target as a result. Basically it is predicting how likely you are to have diabetes in the near future. Just as a understanding, a level set of what I'm talking about.

First, these tools is called Lyme, and this is using local interpretation. That's in the acronym.

Don't, you can, I'll send to the URL here in a minute, you'll have access to this. So don't stress out what this does is it reduces everything to a local interpretation and then uses kind of like linear aggression to get a dividing line and to see how each of the features each of the attributes, if you will impacts that final number, you see the predicted value at the top, right? 1 75. It makes sense if you understand the, the data set and what the algorithm is trying to do, just take it that it's, it's giving a prediction of how likely diabetes is to result.

And then you get this impact each characteristic, whether it's body mass index, which is BMI, how heavy set you are, or age or sex or anything else, how much of an impact positive or negative those factors have on that final number? Now, this is super helpful because Lyme doesn't need to know anything about how you're doing a IML, right? It's gonna take the result set and the data set the entry attributes and give you a chart like this this way.

You know, that body mass index has a relatively high significance on whether or not you're going to get diabetes. Whereas age has less.

So, right. This is a relative factor shop is a little bit different, what it does instead of using a linear regression model, it's going to use game theory. And what it's doing is it's saying, let's say all of these factors, body mass index, age, gender, let's say they all walk into a room one by one, and they start playing a game. If you're ever at a party and you're playing with three people and then four more people join, you know, the game changes, right? The feel of it changes different people are winning. That's what's happening here.

As each of these attributes are included in the model. It shifts that number left or right. Positive or negative. So you get a relative impact, much like Lyme just with a different algorithm to do. This shows you what the impact of each of these features are on that final number.

Again, model agnostic. It doesn't have to know what you're doing, which is really helpful. Summarizing these two back to back. Like again, you'll have these slides later on lime uses the local model approximation it's relatively fast. Sometimes it gives results that you need to think about, but it is quick and cheap. SHA is newer. It's more popular in the community right now, side note, I'm not a data scientist. So also the grain of salt, but SHA is the more recent tool for this. It's faster on tree structures, slower on things like K nearest neighbor.

You have the links down there below that bottom link also is super helpful for helping you work through all the different tools for transparency and understand how they're used and has examples. Everything that I'm showing you today also is on GitHub and also has examples that are super helpful to go and play with. Cuz I don't understand things usually until I go and play with them myself. So we've talked about transparency, right? Trying to figure out what it's doing, which sometimes can be super difficult. Now let's talk about fairness and by fairness, another word for that is bias, right?

And bias. There's a whole lot to say about bias. That's a completely different talk, but bias comes in lots and lots of different forms, right? It might be, you're just rushing to a conclusion. And so you take the shortest path and so you get a biased result. That's biased to the easiest available data, or you think you're really a good judge of things. Or the system is really, really fair that you're judging well. And you can be trusted when you really can't because you're human or it could be very, very straightforward.

You could be self-interested if you are testing a pharmaceutical drug, for instance, that cures some disease or potential, you know, COVID treatment, you're gonna be biased if you're producing it in favor of it working. And so that tends to infect your data and how you do things, whatever it is learning about all the different avenues of bias is really helpful because it can help you identify what those are and be on the lookout for them. Now identifying bias is all together, difficult on a lot of levels. It creeps in, in many areas.

It's easier if the bias is more directly measurable, this is the famous German credit dataset. Again, all these are publicly available and what this was was, you know, credit rating based off of different factors. The nice part about this dataset is that it had gender as a primary source of the data. In other words, you knew each individual, whether they identified as male and female, it wasn't hidden at all.

What that meant was when you measure the relative impact of each of each of these attributes, each of these features on the resultant number, you could say, oh, does gender have an impact or not? And you can see that the financial characteristics here are way more significant. Gender still does have an impact. Green here is positive. Purple is negative. I'm not sure why that color scheme was chosen, but the idea is gender still does have an impact. If you're male, you are more likely to have a higher credit score. You could be for a number of reasons. That would be interpretability.

If you're female, you tend to have a lower score, but way more impact are your past financial records and, and other data such as that other times, it's much more difficult to find. There was a study on renal failure in the last two or three years, it was published in a medical journal and this graph is slightly confusing, to be honest, but basically what the AI ML did. Was it predicted how likely you were to have renal failure for your livers to just mean your kidneys to just give up right. And what the algorithm did, was it predicted your risk? All right.

Later on, they actually went and collected medical data from people from the subjects as well. And so they had the true risk. They knew who actually had Reno failure. Right? And so what would happen is based on your predicted risk, which is these, which are these curves here, if your predicted risk was above a certain threshold, you qualified for advanced care, maybe you got preventative treatments, maybe you were pre slated for some treatment that would help you or some experimental drug or just additional attention and monitoring. If you didn't, then you didn't get that treatment. Right?

The original data set did not include racial data, did not include who was what race in the study. And when they went back and did a, a revision of the data and they took different attributes to backfill racial data, what they found out was that the algorithm, even though they weren't planning on it, the algorithm was discriminating based on race. If you were white, you hit the predicted threshold way earlier than if you were black or you were a minority. And that was based in part off of historical data.

The, the states in general has tended to spend more on the health of majority, the health of white people and less in black communities. And that spilled over into the data in subtle ways. It's not like they were trying to do it was optimized for cost, right? Cuz it was part of the industry effort to figure out who needed care. But only through digging and through investigation, could they find that different groups were treated desperately and unjustly there was hidden bias if you will.

Now, there are a couple tools here as well to help you with fairness, my personal favorite, again, all open source. My personal favorite is called Aquita. And what this does is it goes in and tries to find bias between groups. It's a normal process. This is actually kind of a screenshot almost from their, their website. Basically you upload the data. You try and identify groupings that you may suspect are being unfairly treated or biased against you.

Choose what metrics you want and you get a report and there are four different areas they cover looking for making sure that your results are either straight up equal or equal in proportion to the representation within the data set or within society. And then it checks for things like false, positive and false negative rates. In other words, if someone is flagged for something or characterized for something, you don't want false positives obviously, and you don't want false negatives. And so it checks for those rates among those disparate groups as well.

Acuitas is particularly great because it not only has the interface for people actually doing the data science, but it also has an interface for policy administrators and decision makers. It's something you can point your board to point your IRB to point decision makers to saying here is a check on how biased our algorithm is.

Here is a dashboard they can look at and get reports in long form, text explaining what's going on as opposed to just a chart like I had to explain on the previous slide, if that makes sense it's while the only tools that does both the administrator view and the practitioner view, which is particularly helpful. There are four or five that have been recently in Vogue. AI fairness is from IBM and they've done great work in terms of AI ethics and fairness and bias Aquias is at the bottom. You can read through this later.

I won't, I won't read this slide to you. Some of these tools are, are black box and some are white box. In other words, along with the transparency tools, some of them you need to actually inject code bits of code into your existing algorithms for this to, to work. A lot of them don't though, obviously it's a lot nicer if it just does it from outside. Some of them are more API based Aquias for example, has that the front end Goey as well. That's really great for delivering reports and super helpful. There's a lot of work. There's a lot of new work out there in this area as well.

It's constantly developing and it's an entire field of academic research on its own. So both of those are important and it it's, it's in doing this in, in having transparency and fairness with our algorithms. It's part of an ethical approach here that prevents us from falling into just simply relying on past patterns to predict the future, right? As for Tara Simmonds the Washington state Supreme court ruled in her favor, they said, yes, she has changed. She has gotten help. Her life has changed. She is a different person. There's an old Tara and a new Tara.

And she's the new Tara that the past does not predict her future. And, and in fact, this power that these algorithms have, if we don't check them with open source tools for transparency and fairness and do what Bosch was doing before, in terms of mapping out an ethical approach to using these things, we can tend to fall into that pattern of just relying on the past to predict the future.

And in fact, we should, we would do well to remember what the Washington state Supreme court actually wrote in their decision, which is that they said, we affirm this court's long history of recognizing that one's past does not dictate one's future. If we're to use AI and ML and keep our ethics and our morality intact, then we need tools like these to help us through that process. Thank you.

Like this?

Don't like this?

Why don't you like this?

"That’s Not Fair!": Detecting Algorithmic Bias with Open-Source Tools