What is RAG, and Why is KIP decentralizing it?

HtJi...burz

19 Feb 2024

KIP Explainer Series: #2

TLDR: RAG is an innovative technique used in Generative AI, involving 3 key Value Creators in AI (app owners,, model owners , data owners ).

KIP's successful decentralisation of RAG essentially gives a framework for the decentralisation of all of AI, and is a necessary first step to fight encroaching AI Monopolies.

RAG IN A NUTSHELL

AI models are trained by feeding in data. They learn from the data, adjusting their internal weights to recognize patterns, enabling them to make predictions or decisions based on new data. The model can then answer user queries from its newly-gained "native" knowledge.

But this training process requires the entire dataset to be exposed to the model, and essentially results in the data being 'absorbed' into the model. If the data includes confidential or copyrighted information, there's a risk that the model will spit that information out verbatim at some point in the future.

So what if you don't wish to put your data at risk?

That's where Retrieval-Augmented Generation, or RAG comes in.

RAG is a sophisticated technique that enables AI models to generate answers it doesn't natively know, by retrieving data & information from external knowledge bases & databases it is given access to.

It's like an intelligent assistant who does not know the answer to your question, but is able to expertly research to find the answer from external data sources.

1. User Query Input:
The process begins with a user posing a question or query to a chatbot running a RAG system.

For example, "What are the symptoms of COVID-19?"

2. Retrieval from External Databases:
The model initiates the retrieval phase by searching through linked external knowledge bases & databases, such as medical journals, health websites, and clinical databases, to retrieve only relevant chunks of data and info related to the user query.

3. Data Processing, Filtering and Generation:
Retrieved data undergoes processing and filtering to extract key information and eliminate irrelevant data points. The AI model synthesizes the retrieved data with contextual cues from the user query to generate a response.

In the case of the COVID-19 symptoms query, RAG might generate a response listing common symptoms such as fever, cough, and shortness of breath, but also potentially including information the latest medical research papers that was not available when the model was trained - a higher quality response.

4.Response Delivery:
The generated response is presented to the user via the chatbot interface.

Thus, RAG allows external data to be used to answer AI queries without needing that data to be "absorbed" first by a model through the training process.

RAG techniques are getting more sophisticated all the time, and in our research paper here, we show that quality of answers under RAG can outperform trained models. https://arxiv.org/pdf/2311.05903.pdf…

 IMPORTANCE OF RAG

RAG is going to become increasingly important because:

1. Model training is a highly technical and specialised activity, and often very expensive to do - not everyone will have the necessary skillsets or resources to able to train models.

2. There is a lot of data (confidential, proprietary etc.) whose owners may never feel comfortable to expose fully to models they don't fully own or control.

One important point you may also have noticed is:
Under a RAG framework, app owners,, model owners and data owners work together and each contribute to the answering of user queries.

Thus, in a equitable state of affairs, each party should be fairly compensated for their contributions.

But there is currently no easy way to do this without compromising each party's independence or ownership rights. (Incidentally, this problem is exactly what prompted us to start building KIP, more than a year ago.)

This is the "money problem".

 "THE MONEY PROBLEM" WITH RAG & CENTRALISED AI

Let's imagine a situation where one entity owns all three levers of AI value creation: there's no need to split payments collected from the users between the parties, as it's just internal accounting.

But the flipside of that is: if we are not ok with ONE ENTITY OWNING ALL 3 LEVERS OF AI VALUE CREATION ( , , ), we must solve the issue of how to split money between the different industries of AI Value Creators.

Without solving "The Money Problem", ( , , ) cannot each maintain their independence and freedom to trade.

And a monopoly is already forming right now.

Here's our opinion on how the monopolistic battleplan of OpenAI will work:

- OpenAI obviously has some of the most powerful models - closed-source models like GPT-4, which were trained using our collective knowledge as published and scraped from the open internet over many years. That powers their apps like ChatGPT, and the user-made GPTs.

- Via their Copyright Shield - that is, their commitment to pay the legal fees of anyone found to be uploading copyrighted data to their platform - they embolden and encourage their users to upload data to their closed platform without fearing legal consequences.

- Given that OpenAI is a centralised, closed-source web2 platform, we should ask ourselves: does the data uploaded by users - whether to ChatGPT or the GPTs apps - still belong to the uploaders?

- So with their existing models, unapologetic scraping of any and all data, Copyright Shield, and their huge war-chest, you have probably the most voracious data vacuum cleaner ever created, sucking in data and resources to feed their models.

Put all the above together (and their 7 tttttrillion dollar raise for hardware) and it's not difficult to see that total monopolisation of AI development by one or a few companies will be inevitable, unless something is done.

For reasons we've already shared, we passionately believe that AI monopolisation is bad for humanity, and are actively fighting against it.

THE SIGNIFICANCE OF DECENTRALISING RAG

RAG involves all 3 core levers of AI value creation ( , , ).

Thus, by building a framework for decentralising RAG, KIP essentially builds a framework for decentralising control over value creation in AI, thus giving a level playing field for all value creators to fight AI monopolies.

We allow AI to function efficiently as a collaborative effort involving millions of small- and large-scale creators, without the need for one huge company to coordinate each of the core functions.

We will do that by first solving 3 base level problems that have been a barrier to the decentralisation of RAG:

1. Ownership: Ensuring that ( , , ) can publish easily and securely to web3 easily, creating their web3 "trading entity" in the form of ERC 3525 Semi-Fungible Tokens, thus enabling them to prove their digital property rights on chain.

2. Connectivity: Ensuring smooth off-chain and on-chain interactions, providing an open environment for , , to connect to each other easily and freely

3. Monetisation: Providing a common framework for recording & accounting for the contributions of each AI Value Creator, as well as an automated revenue share and withdrawals.

By bringing about decentralised RAG (d/RAG), KIP is crafting the first crucial blueprint for fighting AI Monopolies.

Unlocking digital property rights for each AI value creator, and empowering each to transact while remaining independent, is the exact opposite of what Big Tech is trying to achieve.

KIP Protocol arms AI Value Creators with the weapons necessary to fight the monopolists in AI.