Large language models appeared a couple of years back and proved a real advance in artificial intelligence and the way we interact with it. But one should keep in mind what large language models are, which is huge statistical models of the language. They don’t have any semantic knowledge or understanding of what they write, they predict a sequence of tokens given some context. They are now trained on humongous datasets, and their predictions are getting so good that one might be tempted to think that they know what they talk about. While they do not hallucinate as they use to, they just got better are hiding their mistakes.
A lot of attention has been given on how to make LLMs trustworthy. It is very important to provide large language models with the facts you want them to use. It gave birth to retrieval augmented generation or RAG. With RAG when you ask something to a LLM, it first try to retrieve some context from a dedicated source created by the user. The idea is for the LLM to have a more precise and relevant context to use for its answer. The first RAG attempts were done using vector-based databases. It helped large language models provide more trustworthy answers. Knowledge graphs were added to the retrieval process and prove to be very useful at this task.
Knowledge graphs are data structures that represent information in the form of graphs, where nodes symbolize entities or concepts, and edges represent the relationships between these entities. They enable the modeling and organization of knowledge in a structured manner, thereby facilitating access to and understanding of information. Since the database incorporates a lot of semantics directly into the data it will help the language model gives more precise and relevant answers.
Modern LLMs know knowledge graphs languages and can request them to get some context. The advantage of large language models against traditional databases is that when you request an entity in the graph you can also request its neighborhood without specifying the nature of the relationship which helps you get more complete and diverse information. Now that language models can manipulate huge context before answering it is possible to not only request the immediate neighbors of an entity but an extensive neighborhood which will bring into context also relations between entities that are different from the one requested. This can help the language model fake a broader understanding of the question asked and give a better answer to the user.
Just like public ones like Google Knowledge Graph, knowledge graphs can aim at general knowledge of specify themselves on one topic. They are databases after all, and it only depends on what we want to do with them. Building a knowledge graph to work with large language models, you can only put verified information into it, so the language models won’t give erroneous answers. If you want your KG to be exhaustive it might prove too hard a task as information is not scarce anymore and verify everything is a Herculean task. You may want to specify your knowledge graph on certain areas where you have a real expertise or access to truthful sources of data.
Once you have assembled of corpus of data, transforming it into a knowledge graph is not a straightforward process. You must extract the entities and the relationship from your corpus and insert those into the knowledge graph. Here we are in luck since large language models are very good at extracting entities from documents. Large language models can request knowledge to get information, they are also able to write requests to build the graph. It is then possible to give the corpus of documents to the language model and have it extract the entities and build the knowledge graph. You can specify the schema you want to use to the language model or give a very simple one with type of entity and relationship that store the information into the attributes. Either way the language model will be able to extract the information from the knowledge graph when requesting it.
The synergy here is interesting, large language models can help build knowledge graphs that will help them in return improve the quality of their answers. By organizing the information themselves and storing it to retrieve it later large language models can drastically reduce their hallucinations and improve the truthfulness of their answers which is of tremendous importance.
A knowledge graph is only rarely static, it evolves with the knowledge we collect and incorporate into it. When you update your graph, you do not want to make duplicate entities or relationships to have the small graph possible for a given amount of information. Having duplicates can slow the retrieval and cause repeating in the answer of the language model which should be avoided at all costs. Also having duplicates inside a knowledge graph may decentralize the information as you have to get the neighborhood of all the duplicates to get the complete information.
Recently LLMs have also got very good at finding missing relationship inside knowledge graphs and making those connections therefore “fixing” the graph. If your corpus was not exhaustive it might have missed some relationships between its entities. And identifying things you have missed because you don’t know them is really hard it goes without saying.
Knowledge graphs are a powerful tool to help LLMs help themselves. The relationship can be viewed as symbiotic. The language model will help build and maintain the knowledge graph and the graph will help the language model give better answers by making information more easily accessible to it. Knowledge graphs move the semantics form the application to the data itself. Having semantics directly embedded into the information is a formidable opportunity for large language models as they can know answer with better precision and fewer mistakes. The relationship between knowledge graphs and large language models is mutually beneficial and helped both become much more useful and it could have been really hard to build a knowledge graph from scratch and no one wants to use a language model that lies constantly.
The interplay between knowledge graphs and large language models (LLMs) represents a significant leap forward in the pursuit of more accurate and informed AI systems. By leveraging the structured, semantic-rich data of knowledge graphs, LLMs can enhance their contextual understanding and reduce errors, leading to more trustworthy responses. Conversely, LLMs can aid in the construction and maintenance of knowledge graphs, ensuring they remain up-to-date and free of duplicates. This symbiotic relationship not only improves the performance of both technologies but also paves the way for more reliable and efficient AI applications. As we continue to explore this synergy, the potential for advancements in AI that benefit from both structured knowledge and advanced language processing is immense.

Knowledge graph of the crew member of the Going Merry