AI Large Language Models
Large Language Models (LLMs) are digital tools trained to understand and generate human language, widely used for tasks like answering questions, writing assistance, and chatbot interactions
Introduction
Large Language Models (LLMs) can be visualised as digital brains trained to understand and utilise human language. By exposing them to vast amounts of textual data from books, articles, websites, and more, they learn about words and their contexts. After rigorous training and optimisation, these models become adept at understanding and generating language, allowing users to interact with them for a variety of tasks, from answering questions to generating stories.
Training an LLM (Large Language Model)
One way to think of an LLM is as a digital brain that you're teaching to understand and use human language.
- First, you expose the LLM to a massive collection of books, articles, websites, and more. It's like feeding it all the knowledge from the internet. The more data it sees, the more it learns about words and their contexts.
- Once the LLM has gone through all the data, you test it. You pose questions, ask it to complete sentences, or even write short essays. Every time the LLM gets something wrong, you provide the right answer. This is its training/learning phase.
- Inside the LLM's structure (which consists of a lot of interconnected points, like a digital web) it makes tiny adjustments every time it makes an error. Over time, and with enough corrections, it gets better at predicting the right words to use. (Model Optimisation)
- After processing tons of data and undergoing numerous tests, the LLM becomes proficient at understanding and generating language. Now, when you interact with it, it can respond accurately and informatively.
- With the trained LLM, you can ask it questions about what it's learned, get writing suggestions, generate stories, or even engage in a chat!
How an LLM Works with Probability
The LLM is like a super-powered predictive text tool. It uses the patterns it learned from vast amounts of text data to assign probabilities to words and then selects the most likely words to generate meaningful completions.
- At its core, an LLM tries to predict the next word in a sentence based on the words that came before it. For example, if you type "The cat climbed up the...", the LLM might think "tree" is a likely next word.
- The LLM assigns a probability score to every possible word it knows based on the context provided. In the example above, it might assign a high probability to "tree", a lower one to "roof", and an even lower one to “airplane".
- The LLM will typically choose the word with the highest probability as its next word in the completion. So, it's more likely to complete the sentence as "The cat climbed up the tree" than "The cat climbed up the airplane”.
- This doesn't stop with just one word. Once it picks the next word, it continues the process to predict the word after that, and the one after that, and so on, until it forms a complete sentence or paragraph.
- The reason the LLM knows these probabilities is because of its training. It has seen billions of sentences during its training phase, so it has learned the typical patterns of how words follow one another in English (or any other language it's trained on)
- The LLM doesn't just rely on fixed patterns; it's very adaptable. If you give it a unique or specific context, it will adjust its predictions accordingly. For instance, if you mention "airplane" earlier in your text, the LLM might then think "The cat climbed up the airplane" is a more probable completion!
Delving Deeper: LLM's Inner Workings
An LLM doesn't directly understand words as we do. Instead, it breaks down input text into chunks called "tokens". These tokens, which can represent a word or part of a word, are then converted into numbers. These numbers form vectors, which are like a list of values that the model can process. It's essential to keep the order of these tokens because the sequence gives meaning to a sentence.
Once tokenized, the LLM gauges the significance of each token in relation to others and applies a self attention weighting to each token. Imagine reading a sentence and underlining important words that give the sentence its main meaning. Self-attention is the model's way of doing this, deciding which words (or tokens) should get more focus.
The LLM doesn't just pick the most probable word every time. While it can do that (known as "greedy decoding"), sometimes it selects words based on a mix of probabilities, called "random sampling". It's like sometimes following the most popular choice, and at other times trying something new.
The "temperature" is a setting the LLM uses. High temperatures make the LLM more creative but possibly less accurate, while lower temperatures make its responses more focused and predictable.
Some models, like BERT, are "bidirectional", meaning they consider words before and after a point to make predictions. Others only look in one direction, either forward or backward. This directionality affects how the model understands and generates content.
The LLM's knowledge comes from reading vast amounts of text. This extensive reading helps it learn the context, how words relate, and the nuances of language. When you give it a prompt, it recalls all its learning, breaks down the prompt, and crafts a response that aligns with the patterns it has observed in its training.
Examples of LLMs
Here are a few of our favourite LLMs at the moment. New ones are popping up all the time!
- Anthropic : Claude,
- OpenAi : GPT4
- Inflection AI : Pi
- Google: BERT (Bidirectional Encoder Representations from Transformers)
- Hugging Face: DistilBERT (a distilled version of BERT), CamemBERT (a model for French language)
Conclusion
LLMs have revolutionised the way we interact with machines, offering a deeper understanding and generation of human language. Their intricate inner workings, based on tokenisation, probability, and attention mechanisms, enable them to produce coherent and contextually relevant outputs. As technology advances, we can expect even more sophisticated and versatile LLMs to emerge, further bridging the gap between human and machine communication.
Image generated using OpenAI's DALL·E.