If you have been using Large Language Models (LLM) since they were first made available, you probably remember the hallucinations they had. They could switch language mid-sentence, invent numbers, other data and so on. Using LLMs nowadays seems so much safer as we don’t see this kind of behaviour anymore. LLMs do not make easy mistakes to spot now, they got very good at hiding their shortcomings. To efficiently use LLMs nowadays you have to make them reliable, so you don’t have to fact check everything they do and/or can integrate them into an automated process.
Improving the quality of predictions of large language models has been a subject since their first appearance. As extensive as training can be nowadays it is impossible to produce a model that knows everything and makes no mistake. There are several ways to improve the confidence one can have in the answer of a language model. The two most known solutions are model fine-tuning and retrieval augmented generation (RAG). Both solutions rely on additional data. While fine-tuning will incorporate the data directly into the model, RAG will provide the right data at inference time to improve the quality of the answer made by the language model. As always when dealing with data and machine learning the saying « garbage in, garbage out » stands whichever solution you choose for your problem. Let’s look in a bit more details at each solution before answering what you should choose.
Fine-tuning a large language model
Fine-tuning is the process of continuing the training of a large language model to improve its knowledge on specific areas while benefiting from its original training, knowledge and reasoning of the model. You can fine tune a model to give it knowledge you think it lacks for example on medical or financial data. Fine-tuning can also help your model get better at specific tasks like classification or Q&A. With fine-tuning you can also format the answer given by the model if you only want bullet-points or some other specific formatting. It can also be used to modify the alignment of the language model if the one it has doesn’t suits you.
Fine-tuning a large language model is not an easy task, even if it remains simpler than training a model from scratch. Fine-tuning stills consists in training a model, even if you only train the last layers and do not modify all the weights. It then comes with all the risks you face when training a model from scratch like overfitting, if your dataset is too small then the risk is that the language model will memorize the dataset instead of learning from the examples and therefore the model will lose its ability to generalize to unseen examples. Fine-tuning also comes with its own risks like catastrophic forgetting, where the model loses some of its original capacities. This can happen when the fine-tuning is too aggressive and modify the weights too much.
Fine-tuning is truly a powerful process as it can substantially modify the pre-trained model used but it is not a fast process. Except if you are using a very small model the fine-tuning process could easily take days to complete.
While being really powerful, fine-tuning is not an easy task to do and it comes with many risks, so that’s why when the objective is only to bring some knowledge for the generation phase people tend to turn to retrieval augmented generation.
Retrieval Augmented Generation or RAG
RAG is powerful technique to enhance the reliability of large language models. By providing accurate context right before the LLM answer helps it reduce drastically its hallucinations and errors.
The idea behind RAG is to provide the language model with at least one database that it can or must query before answering. You can either provide one big database that covers everything the model should know or separate the knowledge into smaller domain specific databases to speed up the information retrieval.
At first RAG was done using vector databases, you put all the information you want the model to use making it accessible using the model embedding so the model can compute a vector based on your query and retrieve some context for its answer. Usually, the information we want to store is contained into documents and you have to split it into small chunks of text before computing the embedding. There exist many tools that can do that for you so setting up a RAG is very easy.
Classic RAG has then evolved into graph-RAG where one uses a knowledge graph as a database. Using a knowledge graph to retrieve information is a great help for large language models. The information is not available through chunks of text anymore but centralised around entities and their relations. Using a knowledge graph can therefore help the language model provide more complete answers. By querying the graph for and entity and getting all its neighbourhood at one or two steps in the graph helps tremendously the language model build an answer. You have several ways to query the graph, you can use embeddings on the graph entities to first get the relevant entities and then build a graph query or query the graph directly. Modern LLMs knows many different graph query languages like Cypher or NGQL. Even if you use an exotic language, you can always teach the language model how to build queries.
Using a knowledge graph is great but building one is not so fun. Thankfully the language model can help with that. They are very good at extracting entities and their relation from a text. And as said previously they know how to build graph requests; therefore they can build and update the knowledge graph themselves.
RAG, whether based on a knowledge graph or not, is truly useful for large language models. It reduces the hallucinations the model might suffer from and help improve the trust the user has in the model. Maintaining and updating a database is much easier than fine-tuning a model. You can easily add new information or remove erroneous/obsolete information. It is a much easier and safer way to keep your language model up to date.
In the end fine-tuning or RAG ?
As we have discussed in this article, we can see that it is not that difficult a question to answer. Fine-tuning and RAG do not really improve large language models in the same way and do not bring the same benefits. The questions you should ask yourself are “Is fine-tuning right for me?”, “Is RAG right for me?” The answers could be any combination of yes/no that you can think of. Some problems might require fine-tuning, others RAG, some both and some neither. By having a great understanding of these techniques and what they have to offer you’ll be able to decide which is right for you regarding your problem, the data you have, your budget, etc.
Large language models are not to be trusted when used directly, it is always a good idea to bring more data if you need them to be accurate and not just reformulate what you type in. Retrieval Augmented Generation is a good addition to any LLM. It can be done for cheap or not that expensive compared to other techniques and you can be sure that the right information will be in context before writing the answer.
Fine-tuning is a much more expensive and complicated technique, but it can do much more than simple RAG. Fine-tuning can of course be used to add knowledge to a language model, but it can also alter the way the model will answer questions by adding sentiment analysis directly into the model or changing permanently the tone of voice used by the model. Also using a fine-tune model will speed up the inference time since the model won’t have to build a query and get an answer from a database.