So now, again, going on with our theme of cyber security resilience, our next speaker has a doctorate in access control, as you do, and he's yet to tell us more about integrating IDPs with modern business IT environments. So please welcome independent IAM expert Heiko Klarl. Thank you very much.
Hello, everyone. Great to have you here.
Today, I'm speaking about enhancing cyber resilience and how to integrate IAM and multi-cloud strategies and a bit more. So it was interesting back in time when I submitted the abstract, I guess it was early summer this year, and when I finally started working on putting things together in late summer, early autumn this year, there was also a shift in figuring out, oh, this might be interesting and I have to bring in this and bring in this. So probably I would have adapted the abstract a bit earlier on, but this journey to the content was quite interesting. Thanks for the introduction.
So basically a little bit more to my background, I'm serving more than 20 years in the identity and access management industry, did a PhD a long, long time ago there, served for a large consultancy, stepped out in end of August. Currently, I'm enjoying my free time and doing stuff like this year, which I'm really enjoying, and I hope I provide you a good content for the next 20 minutes. Does this work?
Oh, no. So I've structured my agenda in sharing a couple of insights on cyber resilience at the beginning to set the base for the topic. Coming to Kubinco's identity fabrics, unfortunately, I have not the most current one that was presented earlier this day to avoid any conflicts in presentation, but I think we'll do it anyway. And then come to resilience of IDPs and some takeaways at the beginning, and hopefully there will be a bit of time of discussion. 200 million US dollars.
200 million dollars is the number that global 2000 companies, so the large enterprises have due to unplanned outages and unplanned downtime of their IT systems. So not only IDPs, but IT in general. So that's a lot of money to break it down into a more understandable number.
That's $9,000 per minute. I'm talking roughly now two to three minutes. So basically it's something between $20,000 and $30,000 a company might have lost in the time slot I'm speaking here. So a huge amount of money. And how are those companies prepared to recover? Another interesting number, so 50% need up to 24 hours to be finally back and have sorted out everything. To be back again, have all systems or the system up and running, 45% doing quicker within one hour. So they have done an excellent job already, but the maturity and more than 50% need a pretty, pretty long time.
You might ask yourself, I'm ready. I'm cool. I can't be affected. Like Kuma is probably saying. So the question is really, really think about the stuff that can affect your business, the stuff that can affect your IT systems and your IDP in specific cyber attacks. You can for sure prevent stuff, but to the morning when the guy from FC Bayern Munich told these amount of cyber attacks they are facing, it's the sooner or later you will have probably some hiccups. Or think of a scenario most of you have been faced with. I think it was early 2023 when I'm right or 2022, hiccups of hyperscalers.
You probably remember the hiccups Microsoft had in its Azure cloud. This time of the year, I remember it very, very well. We have been serving a customer and they have been preparing for a huge go live migration of an IDP. Everything was well prepared and tested and aligned and top management awareness and so on. And everyone felt very confident to make it happen today. During the morning, we got the information from the team that the go live was canceled.
Basically, you think, oh my gosh, what happened? What was messed up? But it was not canceled because of a bad preparation of technical issues on the project team side. The project was canceled because of the hiccups of Microsoft Azure, despite the deployment was targeting AWS. So basically not, but the team is a remote team communicating in MS Teams, Microsoft Outlook and this kind of stuff. And basically, when you have the risk in losing your communication infrastructure, you can't fix and communicate and discuss with colleagues during a deployment.
So it's a very, very high risk, especially when you are migrating an IDP with hundreds or over 1000 integrated applications, then things can go really, really wrong. And so it was postponed. So you can be affected by a third party having problems. And last but not least, everyone has probably experienced it in the past as well. Messed updates. We think probably about the time recently when the crowd was striking or messed deployments. For today's talk, I prepared with my buddy Chet Chibiti a definition of cyber resilience and he made it a good job, so I was happy with it.
Basically, cyber resilience is focusing on your ability to prepare for, respond to and recover from cyber incidents whilst, and this is the important part, maintaining operational continuity. So it's for sure cool and important to recover, backup and recovery strategies and so on. But if it takes a week, or if it takes longer than a week, or if it takes several days or even several hours, you will still have a problem. So maintaining the operational continuity is a very, very important part of that. The industry has done a lot of things to increase resilience of IT systems, of IAM architecture.
And a part of that, and this is the identity fabric slide, the older one, not the future one that will be released in January, by thinking about how to slice systems, which capabilities are important, providing a proper API ecosystem, providing those services to other services and building a certain base.
And I always recommend when I'm talking to people setting up their IAM or thinking about their IAM strategy, I'm a big fan and a believer in this kind of, so to say, correct me when I'm wrong, formal frameworks where you have a guide to figure out, okay, this capability has to be there, this capability is there, and you can decide which systems or which vendors or which products do fulfill the capabilities on your side. So it's not a one size fits all thing, but you can say, I don't have a need for this capability for whatever reason.
And when you are not working with end customers or when end customers are not in your responsibility, you don't need any IAM related stuff. So you can probably remove it, but it gives you the structure to build up reliable and resilient systems at the end of the day.
However, nothing happens until someone accesses something. So in the access management piece, the act of accessing a system is the crucial point when something is not working, it could be related to the IDP. So let's now dive deeper into the resilience of IDPs of identity providers. I prepared three scenarios on working through, and they are kind of step by step getting a little bit more complex.
So very, very traditional approach, you have an on-prem IDP somewhere in your data center, hopefully not on a server under your desk. And what's the biggest risk is for sure the outage of the IDP and how can it be caused? Data center has struggles, you have messed the deployment, you have messed an update, the program, the product has a bug, something like that. But you have proven mechanisms from the past to cope with those scenarios. So you have hopefully set up high availability concepts. You have for sure a second data center or even a third, fourth, sixth one whatsoever around the globe.
You have a proper backup and recovery in place, which is sometimes also not that easy. Think about backuping active directories, for example. And you are probably able to maintain your IDP in a local on-prem setup in a very, very well manner.
Now, let's look on another scenario, the private cloud IDP. And this basically the slides refers to a series of conversations I had in the past with a customer of mine, Gerald is in the audience today as well. So we have spoken last time at EIC in May this year and the year before and kind of guided the audience through the journey we have done with this client in the automotive space for now several years. What's the scenario? You have an IDP in a private cloud setup.
So basically you have an AWS cloud, you have the GCP, Microsoft, Azure, or whatever you name it, and you're hosting your IDP as a private service into this cloud. What's the risk? So basically everything that can happen on-prem or nearly everything that can happen on-prem can happen to your cloud setup as well. So basically apply those things that are important on-prem as well of the cloud. But additionally, you are faced with potential outages of your cloud provider. And back in time, before we started this project with this automotive client, I had a conversation with another client.
It was also about central IDP for all employees. It was a roughly 100K employee company, plus partners, plus extras. So it's summed up to a lot of people. And they have been doing 100% GCP, Google Cloud Platform Strategy. And he was asking, when GCP has hiccups and your IDP is in GCP and there is an outage, when do you recover and so on? And from a gut feeling, I said, no problem. When GCP is out, you have far other problems than the IDP because everything is not working. So let's focus on fixing everything and then the rest. From an early thought, gut feeling, that was the answer.
But when you think a little bit, and I've done it, the answer is not correct. The answer is not correct and not correct by far, because what does it mean? So even if you have a strategic cloud provider and you have decided to go all in for Amazon, or Microsoft, or for Google, you are consuming most likely hundreds of SaaS applications. And a lot of those SaaS applications might not be hosted at the hyperscaler you have selected. So basically, when you're an AWS shop and AWS is down, probably 50%, 60% or 70% of your SaaS services are still up and running.
But you can't access them anymore when your IDP is hosted in an AWS environment, when you have a central IDP for sure. So basically, you limit your capabilities to react and basically keep the lights on, at least in certain areas of the business. And when you think about SaaS services, it's not just systems for the rest of the business.
It's also, there is a lot of things out for the IT folks. So basically, you might not be able to reach or log into important systems to recover your infrastructure completely. So what was our approach to work on this? Our approach was setting up a multi-cloud setup. So you can say a kind of hot standby between two hyperscalers and GCP and an AWS, or an Azure or whatever combination you prefer to provide on hotspot over in case of something is failing or an outage.
This is complex, this creates effort, and that's also not that cheap because someone has to pay for it, for doubling the infrastructure, for doing the project, for developing the concepts in a global setup to make the switch, for enabling all applications. You have hundreds or even thousands of integrated applications, and they have to be capable of supporting such a switch.
However, as you have learned, $200 million outage a year due to unplanned downtime. So if your IDP is the crucial part and affecting a lot of your business, and probably the output of a production line so that you can't produce cars anymore, and your retailer that you can't sell your things anymore to end customers, you have an immediate financial loss. Then you can do the math, and then you see the likelihood of the importance of an IDP that is running all the time. Another possibility might be identity orchestration, and I will come later a bit deeper to identity orchestration.
My last scenario is you're using a public SaaS IDP, it can be an Okta, it can be a Microsoft Entra or anyone providing an IDP as a public service. When you read the paperwork, you have normally agreed to SLAs.
And very, very likely the SLA is not 100%. It's 99 dot, and then you have a couple of nines, but it's for sure not 100% so you are faced with an outage. The question for you is, basically you have an SLA and you have probably also some penalties when you had luck, and you get some money when the IDP is down, but does the money and does the penalty really heal your pain when you have an outage, when you have a big trouble? Most likely not. It's more a construct of making leaders happy, on raising the discipline, on making it a serious relationship between two business partner.
But at the end of the day, when you have a longer production outage of your IDP, all the penalties you get from your SaaS provider won't most likely heal the problems you're creating within your business. So think about to be resilient on that way. So what can happen in a risk scenario? So the outage of the SaaS or the cloud provider, because they are hosted at the end of the day, again, in the hyperscalers, so GCP, AWS, or Microsoft Azure. How can you solve it?
At least in part, you can think of kind of creating some emergency access, ideally with a kind of just-in-time provisioning, probably based on your integrated PEM system or your IDM solution. You can have locally broken glass accounts, highly secured, you can only access when you really have the need that you can't access the systems with your IDP anymore. Or you can utilize identity orchestration with a kind of failover to another IDP. Let me quickly explain a bit of identity orchestration. It's kind of on simplified, so say a kind of workflow engine with a couple of capabilities.
IDP broker can be a part of it, figuring out which identity store I can use that might make sense. Or as a global installation, you say, okay, the US folks are going to a US IDP, the European to the European one, and those folks residing in APEC are utilizing an APEC IDP. It has some ITDR and health check components to figure out, are there any hiccups? And it could be that it starts step-by-step getting more critical. So response times of your IDP getting longer and longer and longer, and then you have a certain flavor and you have to establish a threshold to switch it over.
I know it's complex. And most of you will say, ooh, this was my first thought as well, paying two subscriptions to an identity provider, so paying PING identity on Okta gets then pretty expensive. And one is just a fallback you don't really use, that's true. So think about who really needs access in a total crisis to your IT systems. Are these your 100,000 plus employees, or is it a small team of IT professionals in your crisis team, a couple of hundreds or a couple of thousands?
Is Microsoft Entra ID and fallback you have established anyway when you're in a Microsoft shop and using Microsoft Office 365? So figure out what can be a suitable plan for that to keep subscription costs, risk, and effort in a reasonable manner. Time is nearly coming to an end, so my wrapping up and takeaways, three minutes. So build a resilient architecture, modernize your IAM, get rid of old stuff, follow structures like the identity fabrics, and keep it up and dated. Think about identity orchestration, whether it makes sense in your setup or not.
And think about multi-cloud strategies as well. Despite a hyperscaler feels like always up and running, it must not be the case, or it won't be the case all the days long. Establish proper principles of engineering, coding, and DevOps principles. So try to get to 100% automation. That means 100% automation for configuration and also infrastructure as code. So if something goes down that you're really able to set up new systems in a completely automated way without manual interactions that take time and that are error-prone immediately. Think about your backup and recovery concepts.
It might not be all the time that easy as it sounds to get configuration, infrastructure, and data back and having it working together. Think about changing security identifiers and an on-prem active directory and this stuff. So stabilize and the details. For outages and attacks, think about reworking your playbooks and your recovery procedures. Are they really up to date or are they more focused on ransomware stuff?
So in a talk in a session earlier this week on Tuesday, we had a good conversation on IDP outages and recovery procedures and playbooks, and so many clients have not addressed the IDP case. They are very well prepared for other IT crisis, but not for this one. Think about a digital jump back. This is a term I found from a co-SD.
Basically think about when all your communication breaks down, who has all the telephone numbers for the most important IT folks, legal folks, managers, somewhere that you can call them when your directory doesn't work anymore and you can't grab the numbers in real time. Who has a secure copy of your playbooks for incident management when you can't access your whatever system you are using, let's say a confluence, and you can't access it anymore via your IDP. And for sure, think about identity threat detection and response in order to limit things. One more thing, and then I'm coming to an end.
It's not the time for discussing probably right now, but later on in the break. Let's think what central authorization means in the future. So are we creating another single point of failure when we have a central authorization engine there on the one hand, and the other thing, let's think about decentralized identities and how decentralized identities can be a part or play an impact on resilience of your IT infrastructure. Thank you very much. Thank you. That brings us quite nicely to time. I think in the interest of time, we'll crack on, but you're around.
I mean, people can... I'm around. So if anyone has a question, I'm going out and then just join me for a good dialogue. Thank you very much. Thank you.
Well, certainly a lot for you to think about there. I love the scenarios and I like the concept of the jump bag.