Welcome to the KuppingerCole Analyst Chat. I'm your host, my name is Matthias Reinwarth. I'm an Analyst and Advisor at KuppingerCole Analysts. My guest today is Alexei Balaganski. He's a Lead Analyst with KuppingerCole focusing on cybersecurity. Hi, Alexei.
Hello, Matthias. Thanks for having me again.
Great to have you again and we are having this topic again. I had just a few weeks ago talk about the same, a similar topic with our colleague Marina. And we talked about the impact of cybersecurity, of generative AI on cybersecurity. And this is something that you just wrote a blog post about. And this is a topic that everybody is talking about. So let's start the discussion maybe from a... somewhat different angle, from the cybersecurity analyst, from the cybersecurity practitioner. There has been a change in the public notion of how generative AI plays a role in cybersecurity. Can you briefly explain this shift in public business perception towards generative AI? Everybody's talking about it, but what did the change?
Right, right. Well, first of all, I, of course, listened to the podcast episode with Marina. And I believe you were discussing a more general impact of AI on everything, including cybersecurity. Today, I would really want to focus on practical implications. But yeah, we have to go, actually, back in time to understand the effect and the impact. So yeah, as we have probably said multiple times already, AI as a thing, as a science is not at all new. I mean, it started probably over 70 years ago as an academic field of research. The practical applications began to appear maybe like in the 90s. The actual business use cases became possible and affordable enough in the rise of the cloud, especially the public cloud. And of course, AI tools in cybersecurity, we've been covering them for at least like almost a decade already. And every time when we said, look, there is now a new tool that would help you to cut through the noise in your results, reduce the false positives, help you understand the nature of an incident, support you in your decision making. It sounded great at the time, but like everybody around was suspicious at best. They would say, how can you trust AI to make decisions? What if something breaks? Basically, it was extremely difficult for vendors to push their AI power with tools to the bigger audience. And then suddenly, ChatGPT happened. And overnight, we now have this generative AI craze. It's literally not even blockchain all over again. It's thousand times more. It's like the tulip mania in Holland in the 17th century, where everybody was suddenly growing tulips and you could like buy a house for one bulb for one flower basically and of course we all know how it ended, it did not last Some people made tons of money, some people lost tons of money, but nowadays we just know, while Holland is a lovely country, lots of tulips, so what? And I believe we have to approach this whole ChatGPT mania from somewhat similar perspective. So yeah, ChatGPT is a great tool, really interesting technology, lots of potential applications. Let's see how it works for cybersecurity specifically. Let's ignore the hype.
Right, and with everybody having potentially access to that technology just by using the free version of ChatGPT or paying a few bucks a month to use it, that of course brings also the benefits and the effect and just the usefulness of generative AI to everybody's desktop. So I think that, as you said, is really something that is elevating the notion of generative AI being useful in any area, including cybersecurity. But there's of course also, and you've mentioned that already, the conversation about risks, about bias, about compliance issues. And from your perspective, what are the most pressing challenges that business have to face and have to deal with when they integrate generative AI into their cybersecurity strategies, especially for defenses and detection?
First of all, you are absolutely right. Everybody is using ChatGPT now simply because they can. It's really extremely affordable. You only need a browser basically. And yeah, the very idea that you now have this little virtual assistant living in the cloud, which can basically do almost every kind of job for you. I mean, whether you are an analyst or a software developer or a journalist or even an aspiring artist. Generative AI can do a lot for you. Some people even fear that they can do everything that you were doing before and you could totally automate yourself out of your job completely and those things have already happened for journalists for example. So yeah there are risks, there are benefits on the kind of the personal emotional level but of course we have to think about other levels as well. For businesses, it's a huge risk in terms of data leaks, because if you just let everyone copy paste your sensitive documents into essentially a third party cloud application, without any guardrails and controls and DLP tools in place, well, you will be facing the consequences pretty quickly. And again, we already know that such things have already happened. On the other hand, just trying to prohibit this usage completely, or at least in the office, is futile. We know the same story happened with bring your own devices a few years ago. And basically, the real working strategy is not to prohibit, but to limit, to guide, and essentially to establish the acceptable use policy for AI. And this is what we've been talking about internally and to our customers for months already. And also something which just takes time. And again, this is a topic on its own. Today, we want to actually focus on cybersecurity specifically, right?
Exactly. And you say that we are using or we should be able with proper guardrails, we should be able to use generative AI also as part of our cybersecurity strategy. What are the most pressing challenges? Where would you focus on when it comes to using that? Of course, this AUP, this acceptable usage or use policy, is a standard, a basis to build upon. But on top of that, what would be a good starting point?
Well, I mean, the most trivial use case for generative AI and cybersecurity is actually the same like everywhere else. It's just... you and ChatGPT. It's an ideal assistant which can help you increase your productivity, reduce the time you waste on boring stuff, and most crucially, it can help you with your continuous self-education. And this is a must for every cybersecurity expert, regardless of a specific area or field. This market, this field moves so fast that you have to... basically, learn daily. Learn something new daily. Otherwise, your expertise, your knowledge will be outdated pretty soon. And ChatGPT is a great tool for that. You used to have Google for research, then you had Wikipedia, now you have ChatGPT. So you can ask lots of questions and get answers immediately. Of course, the most crucial question is, can you trust those answers? And basically no, well, you have to take them with a huge pinch of salt every time. We know, for example, that the ChatGPT, the actual namesake, has a cut-off limit, so it doesn't know things after some date in 2021 or something.
Something like that. Yeah.
So basically, you cannot use ChatGPT for asking questions about real recent events. We also know that all LLMs, large language models are prone to hallucinating. Basically, they will make up answers which would look good, but will be completely pulled out of thin air. So for this trivial use case, just using ChatGPT for research and learning, trust but verify. So our motto should be Zero Trust for generative AI, if you will. So you can use it, absolutely, but you have to validate all the results. And the same approach should actually be extended to every more sophisticated use case as well. For example, one other very popular scenario would be using it for training. And again, kind of training, especially in the field of business continuity, incident response, a huge topic for a lot of companies, you know, ransomware, natural disasters, internet cables cut off by a bulldozer, whatever. Something happens and somehow your business just cannot work anymore. Now you have to act quickly, you have to learn to act quickly, you have to train for that in advance. Some huge companies can afford hiring an external contractor, which would create a perfect testing environment with real journalists, cameras rolling everywhere, individually crafted scenarios and stuff. But it's really expensive. I mean, we've seen such examples, for example, at some of our earlier EICs presented by companies like IBM. They're great, but they are really expensive. With generative AI, you can basically do the same, but delegate a lot of that world building and incident crafting to an LLM. So it not just reduces your costs significantly, it also makes it more real in the sense that the actual hackers are probably using the same tools to plan their next real attack on you. So you will be basically, you will have much more, much better feeling of realism and groundiness, although, of course, the entire scenario is made up.
Right, and this is also where we get to the topic that also Marina in an earlier podcast episode mentioned. It's the topic of synthetic data, data that looks like as if it was real. It feels like it is real and it structures like it, like it's real, but it's synthetic, it simulates. It can be used for training. And of course, this synthetic data can also be used by the attacker for creating real looking attack scenarios. So this is one of the more modern use cases everybody's used to writing texts and making ChatGPT act like a poet, act like a journalist, act like a politician. But there are so much more use cases where these generative AI systems, be it ChatGPT or others, are really good and usually often mentioned in podcasts, in videos, on YouTube, is the generation of software, of code, code fragments supporting developers in creating more efficient, better code. Also more secure code: is secure by design software through generative AI a thing?
Right. Well, first of all, you mentioned the term synthetic data. And yeah, absolutely. One could even argue that everything, any kind of data GenAI generates is basically synthetic data of different kind. It can be text, it can be an image or audio or even video. I mean, have you seen the latest examples by OpenAI Sora? I believe it's called. It's greatly realistic. But of course, businesses are also interested in more structured
Yes, yes.
kinds of data. Source code for applications you just mentioned. Data collected and processed through free format sources, but made palatable for automated analysis, for example, by ChatGPT and similar solutions. That's great for business intelligence and analytics. Or testing. There are so many potential use cases, and again, they're all great. Some are directly related to cybersecurity. You have to understand that this is even more work in progress than natural language or audio or images. I mean, I've seen some recent examples of code generated by LLMs. And of course, there are already tools, even free tools available. If you are... given a boring task, I don't know, like create a sorting algorithm for a specific kind of data. Absolutely, ChatGPT will generate your function in Python or Java or whatever other supported, and you can reuse it, and it will probably work even better than the similar function crafted by yourself. The problem, of course, is can you scale it up? Can you, for example, say, okay, let's fire the entirety of our development team and let ChatGPT create a business application for me? Well, so far, I believe we are not there yet, but it might change very quickly. I mean, the progress is amazing. However, who told you that the code generated by ChatGPT is error-free? That it's inherently better than the code crafted by humans. That's again, it's absolutely not true. Again, Zero Trust for AI. Trust but verify. And we actually have this whole process figured out decades ago. For every line of source code, you have to create unit tests, for example. In your development pipeline, you have to build and integrate automatic tools, searching for vulnerabilities, searching for hard-coded secrets, searching for other kinds of mistakes. And sure, all of those tools can be also powered by other language models and maybe we will end up in a few years time somewhere when 99% of the entire software development process would be automated. The question is can we actually make this leap of faith and go to 100%? No. For one simple reason: This is still someone's liability and at least now you cannot assign liability to an AI. There still has to be a human with a last name and a signature and a bank account who is responsible for any problems caused by that AI. And it better not be your CEO or even yourself, right?
Absolutely. So it's the same principle that you mentioned when it comes to training and using GenAI developed or presented information for your own training. Just verify that things are correct and this is even more the same principle when it comes to generating code, making sure that it does not only look good and work good at first sight, but that it's properly designed, properly secured and really up to the standards that you want to have them. I think that's the same Zero Trust for AI just in the coding area. But another topic that's really gaining more and more attention also through products that are out there is actually turning things a bit around when you have the security analyst looking at an event log, at a scene when it comes to analyzing. You said earlier cutting through the noise or reducing the noise, filtering the noise, trying to identify what's really important in all these signals that you see. There are those specialized security trained LLMs come up that come with a promise to support the analyst in finding the important stuff, something like Microsoft Security Copilot, but there are others, efforts by IBM, by other vendors that come with a promise that they can do things much better, more efficient than the analyst so that they ideally can focus on the final 10% where it needs to be actionable. Are there any limitations in these systems that you would want to make organizations aware of?
Well, speaking of those specialized security tools, I just wanted to make a hint that our colleague, a fellow analyst, Warwick Ashford, just recently published an entire Leadership Compass on those quote unquote intelligent SIEM platforms, which is exactly that kind of tool, which not only just collects everything that's happening within your organization with regards to IT threats, but also, well, kind of... applies intelligence, if you will, including applying a generative AI to those findings and essentially helping you do your job as a security analyst. So there is a lot of those tools. And those tools, they didn't just appear overnight after ChatGPT. We know that machine learning, specialized NLP algorithms, and other AI applications have been used by those tools for years now, almost a decade. And of course, with ChatGPT and other LLMs, they will become even more convenient, even more automated and even more approachable by less technically inclined people. That's great. There is one fundamental thing you have to always keep in mind. It's not the AI or ChatGPT specifically that makes those tools great or mediocre. It's kind of the rest of the technologies and tech which you usually do not see directly. You know that principle garbage in, garbage out. It was actually, if not invented, but at least acknowledged by Charles Babbage himself, the guy who invented the first mechanical computer which happened like in the early 19th century. No sophisticated machine can give you the right answer if you are giving wrong inputs to it. And it especially applies for security tools, because if you do not collect enough security telemetry, if your machine learning model is strained insufficiently on an inherently biased or poisoned data set, it will never be able to produce a sensible response, even if you are asking the right questions. And even if you are observing the real attack happening. You mentioned Microsoft Security Copilot. It's a popular tool, but again, it's popular not because it has the best AI. It's popular because it has the best threat intelligence, telemetry network, and Microsoft even has their own cloud to run all that. So yeah, absolutely. Those tools are already there. They will even become better as we have more and more sophisticated GenAI technologies embedded into them. They will be more convenient. But again, never trust a label. Look behind the fancy bells and whistles. Ask serious questions to the vendor. How they are collecting the data. What kind of data they're using for training. Are those data exclusively synthetic? Or are they based on real life events happening to their customers? How many customers do they have? Like, are they representative enough to cover your environment as well, for example? Again, when we are talking about AI biases, it's not necessarily related to race or color or something else. It also effectively relates to you as a potential peer among all the other users of the security tool. Whether you are a bank or a tiny analyst house like we are, your entire digital footprint is completely different. The question is, can the vendor, can this particular security product model actually efficiently cover your digital footprint and understand what's going on? That should be the highest priority question to every vendor, not how many bells and whistles they have.
Absolutely, fully agreed. We're talking about this topic and the initial spark for this conversation today was your blog post that you put online. I just want to highlight, first of all, it's really worth the read. And I think you're the only analyst that we have who can cover in such a topic that's called Generative AI in Cybersecurity - It's a Matter of Trust, both the Tulip Craze and Charles Babbage and still focus on the topic and still bring and transfer all the important messages. So I just want to highlight to the audience, please go to our website, read Alexei's blog post. It's really worth the read. A final thought that you mentioned in the blog post is the idea of leveraging collective wisdom. So using the wisdom around GenAI, not from GenAI, but to talk to your peers, to similar companies to learn from the community. Why did you end up on that note?
Well, again, this is, of course, at least like, at least I tried to hint on more than one thing. One is obviously any kind of generative AI by default relies on the crowd wisdom because it sources all the inputs from around the world. That's one thing. The second thing is that, yes, you have to always remember that you are the member of a crowd, whether you like it or not. And you have to make sure that the specific tool understands you in that regard. But finally, last but not least, of course, we are talking about a very real and very interesting crowd. We are going to collect this June in Berlin at another European Identity and Cloud Conference where we will be hosting a really interesting discussion around all these topics. And there will be definitely more than one and even more than a dozen of really high caliber experts in that field who would give you the opportunity not just to listen to interesting sessions and keynotes, but to actually talk to them directly and again, kind of tap into the real wisdom of the real security crowd. So let's hope we all meet there and have a really interesting continuation of this discussion.
Absolutely, just to highlight these even more. It will not only cover just identity. It will go beyond. It will go towards identity-related security and towards cybersecurity. As we've mentioned today and as we've learned today, cybersecurity and GenAI are strong partners and they need to be well understood to make them partners and not implicit enemies. So thank you very much, Alexei, for being my guest today. I'm looking forward to seeing you in June in Berlin, but hopefully talking much earlier to you in another podcast episode. Running up to EIC, we will have... a lot of episodes that will cover topics that will be relevant at EIC and beyond, otherwise they wouldn't be relevant. And I'm really looking forward to talking to you again about that. Thank you Alexei for being my guest today.
Thank you, Matthias, and goodbye.
Bye bye.