Few‐Shot Learning in Generative AI - 180D-FW-2024/Knowledge-Base-Wiki GitHub Wiki
Introduction
The goal of this article is to offer an introduction to the concept of "few-shot learning" in the context of generative AI and large language models (LLMs). The article will use OpenAI's API for the purpose of practical examples and showcases, and will assume the reader has surface-level knowledge of LLMs.
A Brief Overview of LLMs
To understand few-shot learning and why it is relevant to generative AI, one must first understand how LLMs work; for an LLM to form coherent and accurate responses, the LLM must have something to base these responses off of—it can not simply materialize an accurate answer out of thin air. As a result, LLMs need to be trained with existing data—this data forms the basis upon which LLMs respond to inputs, and can be extremely large in size. ChatGPT, for example, was trained with hundreds of gigabytes worth of varied texts including articles and books. Using large amounts of data as such to train machine learning models is also known as supervised learning.
Naturally, LLMs will be influenced by the data they ingest; an LLM trained purely off childrens' books will produce vastly different results compared to one trained off research papers. What happens then, if there simply isn't enough data of a certain type to properly train an LLM? What if an LLM needs to be designed to recognize a certain pattern, but modifying the LLM's training data to emphasize this pattern is simply too time-consuming? In such situations, few-shot learning can be useful.
What is Few-Shot Learning and Why is it Useful?
Few-shot learning is a concept in machine learning where machine learning models are able to learn or emulate patterns given relatively limited training data that exhibit said pattern. It does this, on a surface level, by using meta learning, or "learning to learn," to utilize information like context clues similar to how a human would. As an example, GPT-4, as mentioned prior, is normally trained off hundreds of gigabytes of text; yet, using OpenAI's api, simply providing GPT-4 with three or four sentences for few-shot learning can massively change GPT-4's outputs. These outputs would be heavily influenced by the content of the few-shot learning inputs provided, including the inputs' subject matter and sentence structure.
Visual representation of few-shot learning by Rohit Kundu, James Skelton
Few-shot learning can be extremely useful in situations where it is impractical or simply impossible to gather enough data for a machine learning model to undergo supervised training. Perhaps a technology is too new and there is hardly any data available talking about this technology as a whole, let alone about how it works—few-shot learning could help form a rudimentary machine learning model revolving around this new technology without having to wait for or create relevant ingestible data.
How does it work
We can explore a few strategies with which we can generate models that train well on few examples.
Model Agnostic Meta Learning (MAML)
In Model Agnostic Meta Learning, parameters (theta) are initially determined through a process of sampling tasks from a set of known tasks. There is a parameter update theta' done through normal gradient descent, and then an updated step where the overall parameters theta are updated based on how well theta' does for all tasks. This way, we sample relatively few tasks to do our training on, and arrive at a theta from which the model can easily be fine-tuned for a wide variety of tasks.
Transfer Learning
In Transfer Learning, a pre-trained model can be prepared on a huge dataset and a variety of tasks, and can be brought in to provide the first layers in a final model. We can then use standard model training techniques to build our final layers based on our use case. This works well for NLP applications, and in fact our transferred layers can focus on finding specific features or patterns that might be useful to all tasks (ex. identifying eyes for different face detection models).
Metric Learning
In Metric Learning, a model can be trained to determine a distance representing the similarity between any two data points. This means that for new data, the same distance calculation can be made to identify similar data points already seen in the training set. For example, if a model is good at determining similarity between breeds of dogs, it might be better at determining breeds of horses based on relatively few images, if it knows what to look for.
Few-Shot Learning in LLMs
Few-shot learning has many applications in generative AI, with some of the most common being to properly format responses and ensure guidelines/scope are followed. Take, for example, a healthcare website chatbot that takes user queries as input and attempts to output relevant data to the user—it should remain professional and patient while outputting useful answers for users, no matter what the user might type as input. If the user poses an irrelevant command or asks a question that is out of scope, the chatbot should offer guidance to the user or diplomatically decline to answer. Luckily, this is all behavior that can be trained through few-shot learning; the LLM may be fed an example input where a user is asking the chatbot to add 2+2, which is obviously out of scope for a healthcare chatbot, alongside an example output declaring that such a question is out of scope. This example is illustrated below using OpenAI API formatting.
{"role": "system", "name": "example_user", "content": "Add 2+2."},
{"role": "system", "name": "example_assistant", "content": "I'm sorry, but I can not answer this question, as it is outside of my scope."},
The chatbot could have many more of these examples to both enforce the idea that the bot should not entertain certain questions or commands as well as solidify the structure of the chatbot's response to out-of-scope queries. If the above example were repeated several times with different irrelevant commands, the chatbot would have essentially learned to respond to all irrelevant commands with "I'm sorry, but…outside of my scope."
Showcase of Few-Shot Learning with GPT-3.5 Turbo
To better showcase the effectiveness of few-shot learning, the below examples demonstrate how outputs differ between models with and without few-shot learning.
Generation function without few-shot learning:
def generate():
# --------------------------------------------------------------------------
generation = openai.ChatCompletion.create(model = "gpt-3.5-turbo", messages = [
# --------------------------------------------------------------------------
# Tell system what it's supposed to be
{"role": "system", "content": "You are a chatbot."},
# --------------------------------------------------------------------------
# The below line models user input
{"role": "user", "content": "Please give me a word that starts with A."},
# --------------------------------------------------------------------------
],
temperature=0.9)
This function's output was: Sure! How about "apple"?
Generation function with few-shot learning:
def generate():
# --------------------------------------------------------------------------
generation = openai.ChatCompletion.create(model = "gpt-3.5-turbo", messages = [
# --------------------------------------------------------------------------
# Tell system what it's supposed to be
{"role": "system", "content": "You are a chatbot."},
# --------------------------------------------------------------------------
# --------------------------------------------------------------------------
# The below section of code provides few-shot learning examples.
# "example_user" models potential input, while "example_assistant" models potential output from the LLM.
{"role": "system", "name": "example_user", "content": "Please give me a word that start with B."},
{"role": "system", "name": "example_assistant", "content": "[Bakery]"},
{"role": "system", "name": "example_user", "content": "Please give me a word that start with L."},
{"role": "system", "name": "example_assistant", "content": "[Lanky]"},
{"role": "system", "name": "example_user", "content": "Please give me a word that start with J."},
{"role": "system", "name": "example_assistant", "content": "[Jingle]"},
# --------------------------------------------------------------------------
# The below line models user input
{"role": "user", "content": "Please give me a word that starts with A."},
# --------------------------------------------------------------------------
],
temperature=0.9)
This function's output was: [Apple]
While the first generation provides an accurate answer, the output is not formatted in any particular way since the LLM has no few-shot learning examples to base the output off. In contrast, the second generation follows the format provided by the few-shot learning examples, encasing the output word with brackets and capitalizing the first letter. Few-shot learning, in cases like this, can ensure uniformity and structure in an LLM's outputs.
Research in New Applications
Multimodal Models and Few-Shot Learning
Multimodal models combine input of several different types (ex. vision, speech, text) to generate an output, for example classifying images of dogs by a prompt "classify these dogs by breed". Researchers at Amazon found that using LLMs adapted to few shot learning, and using a computer vision model to feed it input, allowed a multimodal model to effectively classify new images that it might not have seen before. The vision model was able to generate data describing the images (semantic embeddings), so the LLM was able to effectively make classifications with conventional few-shot learning methods. This ongoing research might apply to many computer vision tasks with limited available training data, for example identifying rare events seen by autonomous vehicles.
Retrieval Models and Few-Shot Learning
Retrieval Augmented Generation, or RAG models, are used to answer questions given a large context database. These have a wide variety of applications including informational chatbots, handling internal documents, and coding assistants. A team of researchers were able to train a RAG model designed to perform effectively on common question and answers (given a database of articles) with as few as 64 examples using up to 50x less compute than previous methods. Few-shot learning could be used to allow RAG models to quickly adapt to new databases, contributing to a one-size-fits-all RAG solution that can adapt to information text of many different forms.
Conclusion
Especially with the rapid growth of generative AI and the increasing use of LLMs in commercial applications, few-shot learning is a very useful tool to understand; just as an employee may be trained to follow behavioral guidelines and abide by company policies, so too can generative AI. Accordingly, few-shot learning, likely in conjunction with other machine learning techniques, may even be the way forward for generative AI to sound more and more human. Until then, however, few-shot learning remains a good tool to rein in otherwise unwieldy LLMs for commercial use.
References
https://storage.prod.researchhub.com/uploads/papers/2022/08/12/2208.03299v1.pdf
https://www.sciencedirect.com/science/article/abs/pii/S0031320323000821
https://www.datacamp.com/blog/what-is-few-shot-learning
https://www.datacamp.com/tutorial/transfer-learning
https://www.digitalocean.com/community/tutorials/few-shot-learning