Welcome to the KuppingerCole Analyst Chat. I'm your host. My name is Matthias Reinwarth, I'm Advisor and Analyst with KuppingerCole Analysts. Today we want to talk about a topic that everybody's talking about. And we've mentioned that already in that podcast in several episodes. We want to talk about artificial intelligence, about machine learning, about the application of AI in cybersecurity. And first of all, we want to learn more what AI actually is, not the definition, but the shapes and sizes and the forms in which we can observe it in reality and then how it can be applied to cybersecurity. And for that, I have invited Alexei Balaganski. He's a Lead Analyst with KuppingerCole covering the cybersecurity area. Hi, Alexei. Good to have you back.
Hello, Matthias. Thanks for having me again.
Great to have you. as I mentioned, we want to start with the typical notion that an IT savvy, a tech savvy guy or lady has when it comes to what is AI. So what is the definition AI? And we all think in the first place of ChatGPT. So generative AI. That is the AI that everybody thinks of, but there is a category of non-generative AI, right?
Well, let me kind of start with a short kind of look into the history because this whole topic of AI applications in cybersecurity is definitely not new at all. It predates ChatGPT by quite a few years and even we at KuppingerCole have started writing about the subject many years ago. just, before we started this recording, I looked into my own blog posts from like six years ago and yes, we were already talking about that, of course, we did not know what Gen AI is or probably nobody knew what a GPT even supposed to mean because they weren't invented yet. But you're absolutely right. This is just such a shame that for a lot of people, even in the IT field, AI basically means ChatGPT, a large language model that generates texts for your prompts, which of course isn't true at all. Yes, we all know AI as an academic research subject started decades ago in like 50s or 60s. And some practical applications were already available before I was born, probably even before you were born, Matthias. And even back then, cybersecurity probably did not exist, but some of those early applications were already at least usable in that particular area. And absolutely, we already have quite a large number of tools which are definitely AI based, definitely used in cybersecurity and have nothing to do with ChatGPT. But let's again kind of stop for a second and think why, what's wrong with ChatGPT and LLMs and cybersecurity? They are great, they are awesome, everybody can use them. Yes, absolutely. And this is exactly why they are actually a risk, a liability for a lot of companies because a typical large language model requires access to tons of sensitive data, which while being sensitive, it's already a huge liability and an asset for cybersecurity tools. But also all those technologies, they are extremely computationally inefficient. They require tons of computing power, which is why most of those LLMs actually run in huge cloud data centers and they require tons of electricity and water and air conditioning, you name it. And when I'm currently reading about such great innovations like for example, using full homomorphic encryption to secure your large language models, I cannot but think about how much electricity will it require and how closer and how faster it will bring us to a climate catastrophe. Because if you remember a few years ago, we were talking about Bitcoin mining, basically bringing us to the heat death of the universe. Well, LLMs running a fully homomorphic encrypted environment would probably require orders of magnitude more resources. So in a nutshell, this is a completely unsustainable approach in the long term. And unfortunately, instead of thinking about, you know, like conservation of those resources, optimization and whatnot, people are still throwing more and more money and resources at LLMs because it's, well, we are riding a hype train with regards to that. And this is why today we actually wanted to approach this whole thing from a totally different perspective and try to identify sensible, lean, and practically possible solutions for all those AI use cases, right?
Right, maybe also use cases which do not demand for such a large footprint when it comes to calculation power, the size of the model, the energy consumed, and maybe even the location where the service is provided and where it is used from. It does not necessarily be something that is provided from the cloud with all these mechanisms that you described. So really a large memory, large installations, a huge number of CPUs or GPUs running that. Maybe there are use cases, especially when we talk about cybersecurity, where there is only the need for a limited model, providing limited functionality and maybe even not generative AI, because this is just a part of the equation. This is by far not the full picture of the types of algorithms, of models that... and the outcome and the way how they work, that machine learning covers. And just to go back to that, I like to tell that story because when I went to university, I studied at the Deutsches Forschungszentrum für Künstliche Intelligenz, at least at the professor who worked there, and that was 1987. And we did AI, but we did some completely different AI. We did automated reasoning, we did lambda calculus, we did... We programmed in Scheme and Lisp and these languages. And it was AI way back then. It was a very different type of AI. So we could talk about this history of AI for hours. But as you said, it's really a topic that accompanied me from my early days at university up until now. And now it's just exploding. But it's exploding in a very limited visible sector and that's what you said. So everybody talks about ChatGPT, that's machine learning. Yeah, it is, but it's not fully. It goes far beyond that.
Well, just recently I've read an interesting article somewhere online and I was kind of impressed by a short quote I remember from it. It says, were promised artificial intelligence decades ago. What we got instead is artificial mediocrity. And I know that you probably do not subscribe to my kind of skepticism of LLMs in general, but we cannot but accept the fact that ChatGPT specifically was designed and adopted and kind of exploded in popularity exactly because it's not really AI in that traditional academic sense. It's not a thing that emulates human brain. It's a thing that emulates all those monkeys with typewriters and it does it exceptionally well. And the problem is, yes, this kind of model can do a lot to automate processing of texts and videos and audios, whatever, it can draw interesting pictures, but it is fundamentally incapable of, on the one hand, creating anything truly original, and on the other hand, at least until now, to understand its own limitations. This is why we are here so many examples like ChatGPT recommending you to put glue on your pizza or something like that or for pregnant women to smoke a lot of cigarettes daily because it just cannot tell things which are fundamentally true from those which are fundamentally satire or just plain lies, whatever. So LLMs are easily manipulated and this is why it's extremely important for any application in business or cybersecurity in particular you have to feed any language model with correct data. Because otherwise you get garbage in, garbage out. But again, going back to our original list, I guess we can confess to our listeners that we actually cheated a little bit and we asked ChatGPT to give us a list of the most interesting potential use cases for AI in cybersecurity. And we will just go through that list and try to evaluate those suggestions from a human perspective, I guess. So why don't we start from the number one.
Yeah, and it's really also important to understand which kind of AI, which kind of machine learning actually is required to solve these tasks. And as you said, we created that list and it really was a larger prompt. You and I, we are quite fans of a proper prompts and proper content stuffing to get to into the feedback. And this is what we not did. We asked it just for what it already knew. So not providing too much knowledge from our side, but just to say, okay, what do people think of when they say AI use cases in cybersecurity? And of course, everybody has this notion. You are in a big SOC, in a big operation center when it comes to security. Everybody says, okay, AI must be good at threat detection and analyzing lots of data, identifying anomalies and suspicious activities, and then at least signaling that there is something wrong. That should be a typical AI use case, am I right?
Well, first of all, we have to define our terms very specifically. A threat and an anomaly are two completely different things. Detecting anomalies, detecting outliers in a statistical sample of data is a problem which has been solved probably decades, if not centuries before the invention of AI as a thing, because it's a purely statistical problem and well, I have actually, a university degree in statistics and this is what I've studied for six years, how to basically analyze large data sets and find anomalies. You don't need a computer for that actually. But of course, if you are using the quote unquote traditional machine learning methods, which have nothing to do with generative AI by the way, you can apply those methods to identify those outliers. Which on its own is, again, a problem which has been solved for at least a decade, and it's widely utilized in many existing security tools. But detecting anomalies alone isn't very productive. I always like to remember that one project we had a few years ago where a company acquired the best of breed vulnerability scanner and ended up with a list of three million vulnerabilities. So you can run a machine learning based anomaly detection tool and you will get millions of those anomalies. What do you do? How do you even start approaching those millions? This is why a really high quality and modern next-gen AI based security analytic solution has to do better than that. It should be able to not just identify anomalies, it should be able to filter out all those kind of noisy points, the false positives, and only focus on the real things, meaningful things, then it should be able to somehow correlate those findings with an existing threat framework, if you will, like famous MITRE attack framework. Basically, it should not just tell you something odd happened, it should tell you, I am detecting an intrusion through a specific attack vector and it looks like this kind of attack and it looks like the hackers are doing this from a specific dangerous domain and here are the artifacts which will help you even further narrow down the problem and this is again, this is what existing tools are able to do for years it has nothing to do with Gen AI or ChatGPT. Does ChatGPT have a use here? Absolutely it does. For example, no matter how deep and fine your funnel is, you would still end up with probably at least dozens of findings. How do you even start choosing the most dangerous one? Well, this is where your AI assistant can probably offer some kind of recommendation based on the history of previous incidents or based on this huge knowledge base shared across other customers of the tool, or just kind of on external threat intelligence. It might just give you an additional hint which incident is more important. Where do you have to act first to avoid a bigger breach, for example, or to contain the impact and so on. Absolutely, AI can support you with that. The only problem is that should you trust the recommendation of that AI assistant, can you even give it the opportunity to act automatically? Or do you want to have at least a kill switch in your hand?
I think the importance is the human factor still being added. So we have on the one hand the statistic models that you've mentioned so that you correlate data, that you collect the data, that you have lots of data, which of course includes lots of noise. So the signal to noise ratio is just much too bad to act upon. So the next thing is really to understand what are the thresholds, what are the patterns that I want to identify, that I want to apply, to boil down this signal to noise ratio to something where I can start acting upon. So that would be the next thing. But that can only work on existing relevant training data. That is what an AI then does. Everything else would require somebody really thinking very deep to define manual rules, to put it that way. And that is the way the point where the AI really comes in. And I will fully agree. It requires, in addition, at least for spot checks, the human factor to look into things. Would you agree?
Well, to put it into a slightly different perspective, what you need in the end, you need to be able to quantify your findings, to quantify the risks. And there is a multitude of approaches, but those approaches, they cannot be defined by an AI. It has to be a scientifically proven and better tested methodology, which would probably take years of development. And it would be tailored specifically to your business or your industry or your geography because it would depend on a multitude of external factors. And this is again, perhaps generative AI can help you on that journey, but it will never be able to create such a methodology for you. perhaps it would help you to act upon that methodology later. But again, this is something which needs a lot of controls at every stage where a decision is made, right?
And I think what could be possible and what is possible could be a kind of iterative improvement process, all the time on the one hand injecting the knowledge of the scientific approach that you've mentioned, but also refining and evaluating results and then getting better in defining these processes so that it can be a duo of the AI and the human factor to improve these specific bespoke and very detailed mechanisms that work fine for that scenario and will fail in any other scenario. really, we're having a bespoke environment and a bespoke ruleset.
And by the way, I can totally imagine that in the future there will be multiple AI agents working on problems like that, coordinated or maybe at least like orchestrated by a human person, decision maker. But still, this is not something which you can get from ChatGPT today.
Absolutely, think there I fully agree. But we are also in an evolution that is really happening fast. So we are looking at things happening right now, which are really, really surprising and interesting. These are, apart from geopolitical facts, we are living in interesting times when it comes to the evolution of machine learning in general. A few weeks ago, I took part in a... Now I'm missing the words. in a conference for for AGI, the real AI, and they are daring at universities in Seattle to execute these conferences on AGI for quite a while already. So there is development there as well. So it's interesting times to watch things happen. But when you go back to, again, cybersecurity and the machine learning that we have at hand right now. es, there are limitations. They need to be clearly understood. But maybe adding three AIs with different aspects can at least do some heavy lifting also when it comes to solution design, mechanism specification for getting better at cybersecurity without human ingenuity.
Okay, what's next on our list?
Behavioral analytics. So quite close to that. It's just a different aspect of data to look at to identify A, anomalies, B, threats.
I would say it's also a different scale. Usually when you are thinking about behavioral profiling, it's something which takes place over weeks and months of analysis. You have to understand how actors, people, systems, malware, whatever, operate normally and how to detect large-scale deviations from those normal profiles. So absolutely, it's the same methodology, it's the same technology, it's statistics or machine learning. It's nothing to do with generative AI. And again, it's largely a solved problem for cybersecurity industry. And again, it depends enormously on the quality and quantity of input data. This is why the leading vendors in this area are usually those large companies which operate their huge own security clouds, like Microsoft or CrowdStrike, for example, to name a few company names. And this is why it's something which ideally, it would depend a lot on this notion of the wisdom of the crowd. If you are a customer of such a solution, you should be able to opt in into sharing your data with other customers because they would be sharing theirs with you and you will be basically improving each other's quality of life and security.
Right. And I remember way back three years, four years ago when we did our first AI related conference at KuppingerCole, I did a talk about When is AI Useful? And one important aspect was when there is enough training data available, which is historic and enough data available at any given time to act upon. That's what you just said when it comes to behavioral data. That might just not be sufficient to get to reasonable results that are actually beneficial to anything. But there are other aspects where there is enough data. What would be such a thing? Network data. I think looking at large data centers with tons of traffic passing through, finding anomalies, threats in there. That would be something where you have A, training data and B, a constant throughput of good data you can act upon. Would that be the case?
I just absolutely guess, I even think ChatGPT has suggested this later in this list. Yeah, absolutely.
Exactly, exactly. That's what I was aiming at because it's a good point to say, okay, yeah, we actually, we have anomaly detection when it comes to behavioral analytics, but anomaly detection network traffic. And if I go back from, or go away from that list and just compare those two and apply the metrics that I applied four years ago or so, yeah, there's the training data and then, an automated mechanism leveraging AI, machine learning can provide benefit because there's enough volume to act upon.
You have, however, to consider one kind of additional factor here. A lot of network traffic nowadays is encrypted by default, which of course kind of makes it not particularly useful for AI training. It is still possible to a degree, and there are some interesting developments in that, kind of combining the deep packet inspection with some really fancy ML methods, you can probably infer something about that encrypted traffic. But if you really need to look deeply into every security event, you have to do this traffic inspection at a point where you can actually have this traffic unencrypted. So either you have to set up a huge gateway, which would do TLS termination, decryption analysis, and then re-encryption again, which in a lot of industries and even countries is not a good idea at all. Or you have to be able to do it in a hugely distributed way. Basically, you would have a network tap in each of your IoT devices and microservices and endpoint. And somehow they should be able to send their findings into some centralized place for analysis. But in a privacy preserving and secure way, like for example, only collect your traffic metadata and not the actual data. There is a lot of unsolved or it is like very difficult collateral problems to solve in this area. And this is, guess, where the majority of innovation is happening now. So yes, absolutely it is about network traffic, but there is a lot of unsolved problems here to address.
Right, fully agree. So if we quickly walk through a few more of these items on our list, the next thing that we have is cloud security monitoring. Is this something where, and you are the expert, you're talking to the vendors, you're talking as an analyst in different segments, is this something where actually AI plays a meaningful role already?
I would say, first of all, it's a really broad subject on its own. The problem with the cloud is that it's not the actual detection, which is the hard problem. Because again, most of those solutions, they would look for known vulnerabilities and known misconfigurations. The question is, how do you correlate and prioritize those findings? Because if you don't want to end up with millions of vulnerabilities to deal with, you have to know, is this particular vulnerability in this particular environment under these particular conditions really a threat? Can it be exploited? Is this device actually connected to the public internet? Is it being targeted? Is there an exploit in the wild? You have to bring in a lot of separate technology. You have to have great threat intelligence. You have to have a lot of software security analytics, code analysis, probably, and stuff like that. You have to maintain large databases of known vulnerabilities and so on and so forth. I guess it's not the AI that would be the biggest differentiator. It's packaging all those capabilities into a cohesive platform, a useful tool. And again, we do have a lot of interesting published research on that area. And we can probably discuss specific vendors in a one-to-one call with the customer, but it would probably take another hour of our podcast. So let's just move on.
Right, right. think already and time flies when you're having fun. We're already far in this episode, but you've mentioned that and I really would like to maybe focus on that as the final item for today. Do I trust the AI doing something on my behalf when it comes to, for example, identifying measures that protect my network, my data, my environment, even maybe my employees when it comes to physical security? The fourth thing on that is this automated incident response. So the AI helps, I read it out, helps automate responses to cyber incidents, reducing response time and mitigating damage. A, is this possible? B, what is required? C, do we trust it? And D, should we watch it?
Well, this is absolutely possible. have been solutions offered on the market for years already. The biggest challenge for them was to exactly, you mentioned, to overcome the mistrust of the customers. For a lot of industries, they would just say no, absolutely no automated instant response, whatever rules-based or script-based or AI-based. We do not want any automated responses at all because we put safety and continuity of our processes above everything else. Like, why do we care if a hacker is stealing our data if when we attempt to block him, our entire power plant goes down? Absolutely impossible. But I have to say that kind of the whole notion of ChatGPT and this whole evolved public opinion on AI in general has changed this attitude a lot. What was an absolute impossibility like five years ago, now it's a chance. Like a lot of companies say, yeah, we actually might at least consider it. The question is, can your solution give us the opportunity to adopt this at our own pace? So first, want to see what it would do in a dry run mode. Then we would probably test it on some of our less important systems, not less risky ones, maybe doing something which is like... won't disrupt anything important. And then kind of expand and adopt it. And yeah, this is absolutely happening. But again, you never know what can happen. And the same CrowdStrike incident comes to mind. This is what happens to companies who were a little bit too eager to adopt blanket security monitoring. And again, it had nothing to do or little to do with AI. But again, this is where you have to be extremely careful understanding the criticality of your own systems. And nobody can actually understand it better than you, so you have to do it. You have to learn the methodologies, you have to talk to experts, but you have to decide.
Right, think that is an important, actually almost final statement. I want to provide one other aspect and now we are leaving parts of cybersecurity, although it's a part of cybersecurity. When we do access governance and identity and access management, there is this huge exercise of recertifying access across lots of people with lots of entitlements. And from the outside, this looks like something where AI can really provide real value to organizations to say, if you give me the rules and you've given me the training data, and if you give me previous results, I can approve or at least give a hint whether an access can be approved or should be removed, et cetera. And there are solutions out there that exactly promote this, that help in large scale recertification campaigns. The problem is that the auditors don't like that. When it comes to having an AI doing the recertification process, then we are missing out the human factor and it is required for this recertification because you need the application owner, the data owner, the system owner, the governance department in that equation and that would be eliminated unless you have a proper process and then really is reduced when it comes to the usability, to the benefits that you want to achieve with that. And I think we are on that learning curve that you just mentioned. People see that ChatGPT, any AI, this is not only ChatGPT, any generative AI can on the one hand provide impressive results that are really striking and really surprising on the one day and then the next minute it provides utter unusable results to not use any other words. So it's really the balance between both. And it still says when you log into these machines, it says, it can provide wrong results, it can provide unproven results, or please check the results. And the same holds true for everything that we're doing. So it's really an interesting new technology. It's not that new as we have already learned, but it's making its way into traditional solutions, providing new capabilities. And I think that's really an interesting journey to watch as a user, as a vendor, as an analyst. So we will continue that analysis. And it is changing quite dramatically right now in the perception. Any final thoughts from your side?
Well, absolutely, the technology is almost there already. As you just mentioned, there are some amazing developments, are some amazing embarrassments as well. I guess it all boils down to finding the right balance between technology, risk, trust and regulation. Again, risk should be the primary decision point for you because your business continuity, your financial success depends on that. Again, kind of you should not blindly trust any technology claim because we know that we are not there yet in terms of real AI of any kind. And then you have also to keep in mind that even if you manage to find a seemingly ideal solution for your problem, there will be someone unhappy with that, be it an auditor or somebody from the state or from the industrial regulation board, whatever. So you have to find the right balance and I guess to do that you have to talk to experts from different areas, including KuppingerCole perhaps.
I think that is a very brilliant approach to choose. So talk to us. So I leave it with that from the topic perspective to the audience as usual, if you have any questions around that topic, if you think we should cover that in more detail differently, if you think we made a mistake, if you think you have questions that we should discuss together with you, maybe a new episode on a slightly different angle of that topic. Please leave your comments in the comment sections on YouTube or reach out to Alexei or me with your questions. We would love to continue that discussion. This is our daily work and we really trying to benefit from you and we're trying to provide useful feedback and information to our audience and our customers. So we are really looking forward to receiving your feedback and that you reach out to us. AI-powered cybersecurity is a topic that will continue to be on our radar for the next years. Maybe it will be a complete disruption of that. I don't know. So let's wait and see how that evolves. For the time being, Alexei, it was a pleasure to have you as my guest today. It was an interesting discussion to talk about the use of ChatGPT while using a list provided by ChatGPT. That was a fun exercise. And I think it really worked out because we now have at least identified a few aspects where maybe generative AI isn't the solution, but where machine learning still can provide in different use cases, proper solutions. Thanks again, Alexei. Looking forward to talking to you soon.
Thank you. Bye.
Bye bye.