Leveraging Existing Foundation Models for AI Applications
Pretrained LLMs can be leveraged for your AI application
If you want to build an application that can understand and generate natural language, thankfully, you don't have to start from scratch! There are many existing foundation models that you can leverage as a starting point.
Foundation Models
Foundation models are pretrained large language models (LLMs) that have already learned deep representations of language from vast amounts of text data. Many open-source models are available for members of the AI community to use. Frameworks like Hugging Face and PyTorch provide hubs to browse these models, along with model cards describing intended use cases, training data, and limitations.
The exact model you choose depends on the details of your task. Different transformer model architectures suit different language tasks, largely due to variations in pretraining:
- Encoder-only models like BERT excel at sentence classification tasks.
- Decoder-only models like GPT are great at text generation.
- Encoder-decoder models like T5 work well for translation and summarization.
Here are some key factors to consider when selecting a foundation model:
- Model architecture: Is an encoder-only, decoder-only, or encoder-decoder model best suited for your task?
- Model size: Bigger models are more capable but more expensive to run. Start smaller and scale up.
- Intended use case: Refer to model cards and choose a model suited for your needs.
- Compute requirements: Larger models require more compute power. Ensure your hardware can support it.
By selecting the right existing foundation model as a starting point, you can build an effective natural language application without starting from scratch. Leveraging these pretrained models allows you to tap into vast learned knowledge.
Pre-training Foundation Models
Pre-training is a key phase where models like LLMs develop their linguistic capabilities. LLMs encode a deep statistical representation of language. This understanding is developed during pre-training when the model learns from massive amounts of text.
The pre-training data can be hundreds of gigabytes to multiple petabytes of text. This data is pulled from diverse sources, including web scrapes and curated text corpora specifically assembled for training language models.
In this self-supervised learning phase, the model internalizes the linguistic patterns and structures present in the language data. The model then uses these learned patterns to complete its training objective, which depends on the model architecture.
The model weights are updated during pre-training to minimize the loss function for the training objective. Pre-training LLMs requires substantial compute resources, including leveraging GPUs for faster training.
When scraping data from public online sources, the data must be carefully processed to increase quality, address bias, and remove harmful content. Typically only 1-3% of collected tokens end up being used for pre-training after this curation process.
Instruction Tuning Foundation Models
Instruction tuning is a method used in natural language processing (NLP) where a model is fine-tuned using textual instructions rather than specific datasets for each task. The model is given input-output examples across various tasks, allowing it to perform tasks it hasn't been directly trained on. All that's needed are prompts for these tasks. This method is especially useful when there aren't large datasets available for certain tasks.
Since 2020, there's been a lot of research on instruction tuning, leading to the discovery of various tasks, templates, and methods. One significant method that has emerged is called "Finetuning language models," or Flan for short. Flan works by combining the available research into a "Flan Collection," creating models that can understand a broad range of instructions. Flan models are notable because they perform as well as models fine-tuned for specific tasks and they can also handle instructions they weren't specifically trained on.
FlanT5 is a family of instruction models which can perform zero-shot and few-shot in-context learning tasks, such as text summarisation, natural language infererence, answer questions, classify sentence and sentiments, translate and perform common sense reasoning.
The individual Flan-T5 models include :
- Flan-T5 XXL – The full model, loaded in single-precision floating-point format (FP32)
- Flan-T5 XXL FP16 – The full model, loaded in a half-precision floating-point format (FP16)(uses less GPU memory, performs faster inference than Flan-T5 XXL
- Flan-T5 XXL BNB INT8 – The full model loaded in an 8-bit quantized version loaded onto the GPU context using the accelerate and bitsandbytes libraries. Useful for instances with less compute