I can imagine that some of the readers might look at the title and wonder: what do these three terms even have in common? Well, I must confess that the last one I’ve made up (or should I say “invented”?) recently – and I will explain it later. But the former two are, in fact, connected in more than one way.
Although Generative AI has been with us for quite some time in various forms (remember deepfakes, for example?), Generative Pre-trained Transformers (GPT), the new-generation large language models, have literally taken the world by storm. After people saw what ChatGPT can do, they immediately decided they wanted these capabilities in every application, for every industry, and on every occasion. Unfortunately, after a period of initial uncontrolled hype, it became clear that most organizations simply cannot afford to let their employees post sensitive information, like intellectual property, financial records, or customer data into every online chatbot – you don’t need to be a compliance expert to understand the consequences.
So, while governments are working on future legislation to regulate AI usage in general, organizations are scrambling to define security and compliance policies to prevent uncontrolled leakage of their business data into generative AI systems. But how to reconcile the two completely opposite trends and make GenAI usage safe and controlled? Well, there are multiple potential approaches…
The most obvious one is “building your own ChatGPT” – creating, training, and operating a new isolated instance of a large language model (LLM) with all its training and operational data strongly protected. However, only large enterprises with enough skills and budgets can afford such an endeavor. Also, training a general-purpose AI model requires years of hard work and enormous amounts of data from a multitude of sources. Some of those sources, having experienced ChatGPT’s effect on their future job security, are no longer interested in collaborating with AI researchers…
An alternative, more savvy approach is fine-tuning an existing AI model – instead of starting from scratch, you only need to adapt a pre-trained model to a specific domain with your own, much smaller set of data. This is a much more resource-efficient way to get your own LLM, but it still requires a lot of work and skills and is much less flexible in terms of supported use cases. Of course, an even more efficient approach is to use a much leaner specialized model instead of a general-purpose one: for example, a model only trained to perform a specific subset of generative tasks.
What does it all have to do with databases though, and why is everyone suddenly talking about vectors? Well, it turns out that there is an innovative technique for adapting existing LLMs to your specific tasks or even your private, highly sensitive information called Retrieval-Augmented Generation (RAG). Instead of teaching (sorry, fine-tuning) a model with your custom data and essentially forcing it to memorize it entirely, you just augment each request with a set of additional information that can help its reasoning. It could be a set of documents containing fresh information about recent events or simply some organization’s sensitive data.
RAG not only provides responses much better tailored to the user’s prompts and thus dramatically improves the efficiency of the conversation, but also ensures that the augmenting content remains private and is not shared with other parties. And here comes the twist: to implement RAG you need to be able to quickly produce a set of documents relevant for a certain prompt and to accomplish that, you need a vector database!
Vectors themselves are simply sequences of numbers. While we’re more used to two- or three-dimensional vectors that encode points in space, vectors used for AI applications are usually much longer – with hundreds or thousands of dimensions. Various machine learning algorithms use vectors to store mathematical representations of their inputs. Documents, images, audio files, or any other things processed by an AI model can be stored as vectors called embeddings and by calculating a distance between vectors it is possible to determine similarity between their original objects.
If your model is working with text documents, for example, its embeddings can be used to find the most semantically similar documents just by sorting them by distance. And this is precisely what we need to quickly determine which relevant documents to feed to an LLM when we are crafting a RAG request! And thus, vector databases – solutions that can store data in vector format and perform efficient indexing and searching on it – became a critical part of every AI strategy almost overnight…
Recently I attended Oracle CloudWorld, the company’s flagship yearly event where it makes its most important announcements and presents the latest innovations in its products. Do I need to tell you what was the hottest topic this year? From Larry Ellison proclaiming Generative AI the most important technology ever to announcing a strategic partnership with Cohere, an enterprise AI platform vendor, to promising to integrate AI assistants into every business application – “GenAI” dominated the entire event.
More importantly, however, were the announcements that not just Oracle Database but MySQL Heatwave are getting native support for vectors. Of course, it is possible to use a separate specialized database for performing vector operations, but keeping all kinds of data in the same database enables not just faster performance with fewer transformations needed but ensures that sensitive data never leaves your secure perimeter.
With AI Vector Search, it is now possible to combine multiple data models including vectors in a single SQL query without any additional effort for developers. MySQL Heatwave’s Vector Store goes even further by providing a complete Generative AI and machine learning stack with embeddings created, stored, and processed directly within the database. On top of this foundation, Oracle provides high-level developer features like generating SQL statements or even entire applications with the APEX platform using natural language prompts. But more importantly for our topic at hand, you can now have an entire stack for implementing RAG-powered business applications, while ensuring that your sensitive data is always protected with multiple layers of database security controls.
OK, and finally, we can talk a bit about AI agility. Again, I just basically made this term up to illustrate the problem developers of AI-powered applications are already facing, even if not many realize it yet. You might be familiar with the notion of cryptographic agility, which concerns an organization’s ability to quickly replace outdated cryptographic primitives throughout its entire IT infrastructure, should a particular encryption or hashing method be deemed not strong enough, for example, against quantum algorithms. This should be possible without significant downtime, application refactoring, or other business continuity issues.
Working with AI models and with vector data poses the same challenge. Embeddings produced by a certain ML model are completely proprietary and not compatible with any other model. When an organization wants, say, to upgrade its e-commerce recommendation engine with a more modern and sophisticated model that can produce better recommendations, it has to consider the effort of updating this vector data for its entire catalog, potentially across multiple data stores, cloud services, and third-party integrations. Exchanging data with suppliers will be challenging as well, since they can be using other even more incompatible embedding solutions.
Is there a simple way to address this problem? Perhaps by implementing industry-specific standards? Or by introducing additional layers of abstraction into data and application architectures? To be honest, I have no idea! But perhaps other experts have, and you have an opportunity to meet them at our upcoming cyberevolution conference, which takes place in less than a month in Frankfurt, Germany.