A History of Semantic Search and LLMs

I recently Googled “why do I want to cry when I do couch stretch?”, and it gave me a detailed explanation about pain associated with tight hip flexor muscles. And then I looked at what I literally searched, and I asked myself “how did they know I was talking about hip flexors… I never even said the word ‘hip flexors’?”.

That magic is called semantic search. It is a way of storing and traversing data to understand the underlying meaning of searches, rather than just matching keywords in a literal sense. When you search for “Bulgarians hurt,” semantic search allows the app to infer that you’re talking about Bulgarian Split Squats, instead of a literal Bulgarian person.

Semantic search’s ability to infer meanings is deeply intertwined with how large language models work under the hood.

Open Table of contents

Traditional Keyword Search
The Beginnings of Semantic Search & Linear Algebra
Onto Large Language Models
Vector Databases Made Search Uber Efficient
Semantic Search, We Love You

Traditional Keyword Search

Back in the day, we only had what was called lexical, or keyword search. This is classic information retrieval. It has been around since the 1960s. You could call it ‘dummy search’, but in reality, it was a really important creation that laid the groundwork for the modern internet.

It performs exact word matching. Searching for “red car” will only find documents containing both the word “red” AND the word “car”… literally. This approach creates serious problems because it completely misses synonyms like “crimson automobile” which means exactly the same thing. It demonstrates zero understanding of actual meaning beyond surface-level word matching.

The Beginnings of Semantic Search & Linear Algebra

In the late 1980s, researchers began to come up with a solution to keyword search. They began to analyze the meaning of words and concepts by mapping them into data points using linear algebra.

Then by the mid-2010s, smart people extended this concept and figured out how to embed words into vectors, and map their similarities. Google came out with BERT in 2018, which allowed for a search question to be answered contextually. Words and phrases were translated into tokens which had numerical meanings and relationships. Search was getting more and more sophisticated, and Google benefitted heavily from their advancement.

Onto Large Language Models

Large Language Models use tokens embedded as vectors as their fundamental data storage: Large language models process text through a pipeline of Text → Tokenization → Embeddings → Transformer layers → Output, and critically, the embeddings that power semantic search are exactly the same type of mathematical representations that LLMs use internally to understand and process language.

What modern large language models do differently, is generative pre-training. They trained models on immense amounts of data, and the models learned linguistic patterns. Then they probabilistically generate a response to you based on those patterns. It is kind of like how a child learns to write well. They read well written books, internalize the quality, and hopefully recreate it in their own unique way.

The embeddings directly capture what LLMs learn during training: The vectors store syntactic understanding including how “runs” versus “running” versus “ran” relate to each other, or semantic relationships like the famous “king” - “man” + “woman” ≈ “queen”. These are the types of things that LLMs internalize during their training process.

And so, the contextual understanding of search became so great, it could actually generate new language on the fly, which is one of the beautiful aspects of large language models.

Vector Databases Made Search Uber Efficient

Since all of this linguistic data is a vector, these new systems needed new efficient ways to store data so they could quickly generate responses. As you can image, if you shove 114,000,000 King James Bibles worth of information into a traditional database and try to query it, it is going to be slow.

Vector databases like Pinecone were built to query these vectors at a great speed. They made new clever data structures to store the numbers, and work quickly with similarity search. This allowed for the scalable, fast large large systems we know today.

Semantic Search, We Love You

Semantic search is a part of a long lineage of core information retrieval tech. Without it, we wouldn’t have been able to create modern AI systems. So kudos to those that contributed. If we have seen further, it is by standing on the shoulders of giants.