Thank you so much. I'm glad to be here. I hope you are able to see my screen and hear me loud and clear.
We can hear you very well. Yeah.
Perfect. Thank you so much. Yes. So we will be talking about something called responsible ai and I'll introduce myself before, after this. But you know, the most interesting thing we'll be talking about is a responsible AI 2.0 and in meanwhile, I'm introducing myself. You can try to understand why 2.0 before we deep dive. Okay. So I'm Al. I'm principal consultant and factor analytics. I am also a member of United Nations as an expert in ai.
I am also Microsoft, most valued professional for three years, and I'm also an author of a book on responsible ai. Now talking more enough about me moving forward now we will talking about responsible AI 2.0. So till last year whenever I used to give a conference or presentation, I used to talk about responsible AI only. But this year we'll be talking about responsible AI 2.0. Now what is, what does 2.0 entails?
What does, what is the difference between responsible AI and responsible AI 2.0?
We need to understand that till last year, AI was most about machine learning, deep learning, predictive modeling and so on and so forth. But this year AI is more of also more about generative ai. So the problem of the invent of generative ai, the invent of charge GT and other tools have changed the entire landscape of responsible ai. So you know, so for example, if you see somebody on LinkedIn, a professional salesperson, that person would never exist on this earth. These are computer or AI generated profiles and images.
There has been areas where lethal molecule combinations around 40,000 potential lethal molecule combination was developed by AI in few hours. If you do it manually, it'll take years and maybe it may not be approved. South Korea presidential candidate created his AI of thought, Russia, Ukraine, war, they used AI a lot.
AI can predict pregnancy even before you go to a doctor. Your smartphone Apple Watch can do that. There has been cases where AI has misdiagnosed patient and there has been cases where AI has not been fair and inclusive. Now we'll go in detail on each of these.
So this is what is the responsible AI framework which I have built. It talks about principle behavior and enablers. So if you talk, see that it talks about privacy and safety, fairness and equity, accountability and transparency. So privacy and safety talks about your data privacy, that your data is being used for what, what it was meant to be used, it's not getting leaked. Fairness and inequity talks about that Your algorithm does not discriminate on basis of co color, you know, basis of color, ethnic group, nationality, gender and marital status, accountability.
Who should be accountable for the entire issues with AI Transparency?
Do I have a choice of understanding how the decision was taken? So these are the old principle. The new principles are social, well wellbeing and planet inclusiveness.
So you, we need to understand that with lot of generative VA especially charge G P T and you know, generative way, which can create videos and or create presentation by, by the way, this presentation is not created by charge g, pt, create presentation videos, stack text, movie music. Lot of people are getting quite insecure about their career. They're getting very insecure about their jobs and that is where social wellbeing come into play.
Similarly, planet inclusiveness. There has been reports that this generative AI algorithm takes amids a lot of carbons. Carbon emission is huge. When you run this generative AI tools, there has been report that it is more than going round earth seven to eight times robustness and stability.
It talks about how stable is your answers. For example, if you write a prompt to charge G P D, it returns you an answer. You write the same prompt after a few hours, it'll return you another answer.
Now you take the answer, put it in charge G P D and say, Hey, is this answer correct for this particular scenario? It'll say, okay, I'm sorry this answer is not correct. We gave you a wrong answer. Right? And you don't have any knowledge of what we say, knowledge of the accuracy of the model and that is where the issue is, right? So we need to understand that there are a lot of issues coming up. Then the next issue, which is very, very important is attribution.
You never know that an image, text, music, video, which has been returned by an AI system, especially a gen AI system, is taken from where was it?
Which all articles, places artifacts was referred. So lot of copyright infringement.
So if you, if you use a chat g P T music and use it for your purpose, whom do you pay for it? Do you pay the developers of charge G P T, which person it will get? What contribution? You never know that from where this music was compiled and your new music was created. So copyright infringement and monetization are becoming very, very important. This is a typical data science lifecycle talks about how data science is conducted. So this is, this is, most of us know about that.
We first do business understanding and hypothesis testing, then input data, and then we do e D A and pre-processing and feature selection. Then we do model development, model evaluation, model selection, prediction, deployment and monitoring. So this is typical data science lifecycle workflow. But in each of the areas of this workflow, you would see that responsible AI can be embedded.
So when you take input, when you create the business hypothesis, you need to create a AI definition where you need to talk about what are the different R aai pillars, which I will take care of.
What are the different KPIs which I'll take care of, which are different algorithms I will use to check R aai problems and solution. When you take input data, you need to see that. Do you have p i data?
If yes, yes. What are the kind of privacy algorithms would you put when you do E D A pre-processing and feature engineering? You will see that is there right representation of the data. Are you using proxies if you're not using gender mari status, but do you have data which will reflect gender marital status? So do you have proxies? If you have proxies, how do you need to, you know, ensure that data biases absent and data privacy is completely there.
Then when you develop a model, you talk about explainable AI and privacy, that how the AI can be explained, how the data which get gets into a model would be explained how the features would be explained. How do you explain what's happens inside the model? How do you explain the output of the model? How do you explain the counterfactuals? How do you explain the risk of the model? How do you explain the error of the model? That is what is explainability. And similarly comes model accountability or drifts.
If you are using macroeconomic factors or external factors, if some situation has changed externally, is your model still fit to you? So for example, if there's a huge inflation or if there's a war, if the people have changed their purchasing behavior, have your underlying data distribution changed?
If yes, is your model still right to use or do you need to replace your model?
So this is in brief is how data science lifecycle needs to be combined with responsible AI and talking about secure AI privacy. So this is a GDPR code which talks about how you need to be very, very particular about personal data and also need to ensure that everything is secure and not leak. You need to ensure that you are not able to identify somebody from the data and that and more, and that is what GDPR talks about.
In brief, there was a case in 2006 where Netflix released the data set with hundred million movie ratings, which was given by 500,000 users. There was this person who diverse, engineered this entire data and was able to understand which person rated which movie and and was able to understand your watching.
You know, you not like somebody to know what you actually watch on Netflix. So there are a couple of ways you can protect your data and create privacy.
There are few just like sampling data aggregation and de-identification, query auditing.
However, the most important technique is differential differential privacy where you add adequate amount of noise to your actual data so that the noise data you have, nobody can identify somebody from that noise data, but you still get the right amount of insight. So for example, if you know that I get my salary from fractal and yesterday night I spent 20 pounds on in a pub, you can add both of them and find out all, you can query me and find out all that is.
However, if there is some noise in that 20 pounds which I spent last night, you will not be able to find my information because there are many people who get salary from factor and you don't have the exact amount. So whenever you put that combination of query, you will not get an answer and that noise in that 20 pounds should be minimum enough so that the insights that you use for a machine learning model does not get lost.
That is what differential privacy is accountable.
AI talks about drifts, which I told you that you know, we need to understand if the data distribution have changed and does that change calls for the retraining of the model, retweeting of the model, replacement of the model, changing the features, changing the data and so on and so forth. And these are some technical ways of doing it. Then come fairness, so that is your algorithm should not discriminate between people based on color, gender, race, age matter status, nationality and disability and many other combination. This is just indicative num names.
So for example, there was a case where a bank in India assigned higher weight related to income and thus people with lower income did not get a loan and most of the time females face pay parity and thus they were not given loan. You know, there was lot of case where Google ad displayed ads to only men for higher paying job compared to female because they thought female does not deserve a higher paying job.
Similarly, there are ways to remove bias. I'll not get into the technicalities, but there are ways to even identify bias in your data.
So there are metrics like average odd ratio which talks about false positive and true positives. How much is the false positive and true positive ratio between say Asian and American and then it, it should be equal. If there's a difference, huge difference. Then there there is some bias in your data equal opportunity.
That is, it is not only about giving the right decision but also giving enough, giving the equal right opportunity. For example, for men, if your model takes, give a positive outcome to men who do not deserve 80% of time, it should be same for female because it is giving men an opportunity of a favorable outcome despite they don't deserve it should be same for female or same for, you know, married and single, same for Asian and Americans, same for white and colored people and so on and so forth.
And these are couple of metric which you may use.
So I, I, I, so if you see this plot, I used this on a data for bank loan and if you see there are lot this black bars which shows that originally there was lot of gaps, a lot of bias in the data, but when I applied a leaving method, the bias reduced significantly and that is one thing which we need to understand that there are techniques. Number one, you need to use all this, all this techniques to find out if there's a bias and then use another technique to remove that bias. This another, this is another plot using another algorithm where I was able to remove bias a lot.
So this one was for a classification problem, this is for a regression problem where I see that okay, this problem is for example, for somebody I want to know did he default or not, will he pay or not?
So basically when the output is between yes and no here, how much will somebody pay? How much loan do I need to give? So the number, the outcome here is completely different, right? So this is where this plot talks about. So on the left hand side is a plot before addressing the bias.
So if you see this blue line and red lines, it is it, there's a lot of gaps and both the distribution looks quite different. The average outcome was difference between male and female.
However, when I applied an additive counterfactual fairness algorithm, you would see that the distribution are overlapping. The red line is behind the blue line and every, the difference, the average difference between men and female, the outcome difference has almost zero. It's 2.8 to the par E to the minus 14, which is almost zero. So we see a lot of improvement.
The next one, which I'll talk about is explainable ai. Explainable ai. AI is about what happens inside a model. Most of the time we use black box model.
So how can we use a white box model on top of black box model to explain our model. So for example, lineman chaps, algorithms are quite common, but you can build your own surrogate model. The one model which I prefer is gab generative generalized additive model, which is a very good replacement or not replacement. A very good complimentary model to a black box model, which is again non-linear but white box model.
So as I told lime shop, we all use their, Microsoft have come out with multiple packages like MS explanation, explanation package.
IBM has explainable AI 360 and then there's counter factors. This is very interesting. So for example, if you have denied the loan to somebody and if you say, Hey, you do not deserve a loan because your model has said you know that you don't deserve a loan, that in that case you, how can you satisfy that customer by telling him what all he should have done to counter the outcome. So what all should he changed in his portfolio so that he now deserves a loan?
Now that is what counterfactual talks about, not only explaining the negative outcome, but also telling that person what all was required to be changed in order to get a positive outcome. His counterfactuals then comes generative ai. So there are a couple of tools, if you see here, the tools talks about that. These tools talks about how can I explain the generative AI outcome. If I give you an, we have two minutes video. That's right.
Two minutes left.
Yeah, yeah,
Sure, sure. If I give you an image or an audio, will you be able to tell me that was it originally developed or was it developed by generative AI tool? Now since I have two minutes left, there's nothing more. I would thank you for your time and welcome any questions please. Thank you.
Thank you Sharon, very much. We could have some, at least one question. We still have time?
Yeah.
Yes, please.
If not then alright, then I will ask you to stay online with us because you're gonna be part of the next panel,
Okay?
All right.
No questions. Thank you so much for your time and thank you sir. All the best for all the meetings today.