LangChain Design Analysis - dashjim/acmd GitHub Wiki

The purpose of this doc is to learn from LangChain to decide which modules to be used in acmd.

LangChain Overall View

image

Here is langChain's design target in their own words

LangChain is a framework for developing applications powered by language models. It enables applications that:

  • Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
  • Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)

Concepts in LangChain

  1. Model I/O
  2. Retrieval
  3. Agents
  4. Chains
  5. Memory
  6. Callbacks

Model IO

Prompt

Simple format:

from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
    "Tell me a {adjective} joke about {content}."
)
prompt_template.format(adjective="funny", content="chickens")

In case the prompt has few shots (in context learning), it is good to have a example-selector to choose some from it. LangChain has some build in selecting strategies including MMR and n-gram overlap.

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

# Examples of a fictional translation task.
examples = [
    {"input": "See Spot run.", "output": "Ver correr a Spot."},
    {"input": "My dog barks.", "output": "Mi perro ladra."},
    {"input": "Spot can run.", "output": "Spot puede correr."},
]

example_selector = NGramOverlapExampleSelector(
    # The examples it has available to choose from.
    examples=examples,
    # The PromptTemplate being used to format the examples.
    example_prompt=example_prompt,
    # The threshold, at which selector stops.
    # It is set to -1.0 by default.
    threshold=-1.0,
    # For negative threshold:
    # Selector sorts examples by ngram overlap score, and excludes none.
    # For threshold greater than 1.0:
    # Selector excludes all examples, and returns an empty list.
    # For threshold equal to 0.0:
    # Selector sorts examples by ngram overlap score,
    # and excludes those with no ngram overlap with input.
)
dynamic_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the Spanish translation of every input",
    suffix="Input: {sentence}\nOutput:",
    input_variables=["sentence"],
)

Other modules in Model IO provides features like: LLM's Sync and Async API, Caching, token usage calculator, retry, rate control.

It is interesting to take a look at the caching feature:

rm .langchain.db

# We can do the same thing with a SQLite cache
from langchain.cache import SQLiteCache
set_llm_cache(SQLiteCache(database_path=".langchain.db"))

# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")

# The second time it is, so it goes faster
llm.predict("Tell me a joke")

LLM result parser

from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

output_parser = CommaSeparatedListOutputParser()

format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    template="List five {subject}.\n{format_instructions}",
    input_variables=["subject"],
    partial_variables={"format_instructions": format_instructions}
)

model = OpenAI(temperature=0)

_input = prompt.format(subject="ice cream flavors")
output = model(_input)

output_parser.parse(output)

'''
return:
    ['Vanilla',
     'Chocolate',
     'Strawberry',
     'Mint Chocolate Chip',
     'Cookies and Cream']
'''

# Ref implementation
# https://github.com/langchain-ai/langchain/blob/5a920e14c06735441a9ea28c1313f8bd433dc721/libs/langchain/langchain/output_parsers/list.py#L22

class CommaSeparatedListOutputParser(ListOutputParser):
    """Parse the output of an LLM call to a comma-separated list."""

    @classmethod
    def is_lc_serializable(cls) -> bool:
        return True

    def get_format_instructions(self) -> str:
        return (
            "Your response should be a list of comma separated values, "
            "eg: `foo, bar, baz`"
        )

    def parse(self, text: str) -> List[str]:
        """Parse the output of an LLM call."""
        return text.strip().split(", ")

    @property
    def _type(self) -> str:
        return "comma-separated-list"