LangChain - DrAlzahraniProjects/csusb_fall2024_cse6550_team2 GitHub Wiki

LangChain

LangChain is a robust open-source framework designed to streamline the development of applications that integrate with large language models (LLMs) like OpenAI's GPT and similar AI models. It offers various tools and abstractions that empower developers to create sophisticated AI-driven applications, particularly for tasks involving natural language processing, information retrieval, and task automation.

Important Features :

  • Chain of Thought (CoT) Reasoning: LangChain facilitates complex reasoning and workflows by enabling the chaining of multiple prompts and models. This allows developers to build advanced pipelines where the output from one step can seamlessly feed into the next.

  • Document Interaction: LangChain provides built-in tools to interact with text-based documents, enabling efficient search, retrieval, and interaction with large datasets or knowledge bases using LLMs.

  • Integration with External APIs: LangChain can connect to external APIs and data sources, allowing LLMs to work in conjunction with other tools such as search engines, databases, or custom APIs to fetch real-time data and enhance functionality.

  • Memory and Persistence: LangChain includes a memory system that allows applications to retain context over multiple interactions, making it ideal for use cases like chatbots and assistants that require continuous context throughout a conversation.

  • Modularity: LangChain offers modular components, making it easy to swap models, tools, and data sources. For instance, you can change between different LLMs (like GPT-3, Cohere, etc.) without altering the core logic of your application.

  • Agents: A standout feature of LangChain is the ability to create “agents” — LLM-powered applications that can make decisions based on external information. These agents can run tasks autonomously, making the system more dynamic and interactive.

  • Customizability: Designed with flexibility in mind, LangChain can be tailored to suit a wide range of needs, whether you're building a customer support bot, a content generation tool, a knowledge assistant, or more complex AI workflows.

Use Cases of Langchain:

AI-powered Chatbots: Create chatbots that can manage complex conversations with memory and context. Automated Research Assistants: Tools for searching, summarizing documents, and answering queries. Task Automation: Systems that interact with APIs or databases to execute tasks based on user instructions. Content Generation: Applications for generating human-like text for blogging, marketing, or educational purposes. By providing high-level abstractions and interfaces, LangChain enables developers to leverage the power of LLMs without needing to manage the complex details of model interactions, memory handling, or API integrations. This makes it easier to build intelligent, context-aware applications with enhanced capabilities.

Table of Content

  1. Installation
  2. Configuration
  3. Implementation
  4. Usage
  5. Troubleshooting

1.Installation

  1. Create the requirements.txt File

To begin, you must create a requirements.txt file, which will list all the necessary Python libraries required for your LangChain project. This file should include LangChain and other dependencies like FAISS and Hugging Face, among others.

Create a file called requirements.txt in your project folder with the following content:

requirements.txt:

   huggingface_hub
   ipykernel
   jupyter
   langchain
   langchain-community
   langchain-huggingface
   langchain-mistralai
   pypdf
   python-dotenv
   roman
   streamlit
   sentence-transformers
   sqlalchemy
  1. Copy requirements.txt into the Docker Container

To ensure that the requirements.txt file is available inside the Docker container, copy it into the container using the following Dockerfile command:

   COPY requirements.txt /app/requirements.txt

This command copies the local requirements.txt file from your project directory into the /app/ directory inside the Docker container. 3. Install Python Packages from requirements.txt

Once the requirements.txt file is inside the container, use the following command to install the listed dependencies. We will use Mamba instead of pip because of its faster dependency resolution

   RUN mamba install --yes --file requirements.txt && mamba clean --all -f -y

Explanation:

  • mamba install: Installs the packages listed in requirements.txt using Mamba.

  • --yes: Automatically accepts all installation prompts.

  • --file requirements.txt: Tells Mamba to install the packages from the requirements.txt file.

  • mamba clean --all -f -y: Cleans up unnecessary files, reducing the image size.

Why Use Mamba Instead of Pip?

Mamba is faster than pip for managing environments and dependencies, particularly when dealing with complex scientific libraries (like FAISS or NumPy). It is optimized for faster dependency resolution and is generally more efficient in managing Conda environments.. Benefits of Mamba:

Faster dependency resolution: Mamba can install packages faster than Conda or Pip because it uses a more efficient dependency solver.

Better for conda environments: Mamba is used for managing conda environments, which are often needed for scientific Python projects.

  • LangChain can be installed on Windows, macOS, and Linux. Below are detailed instructions for each operating system.

Prerequisites

  • Python 3.8 or later
  • pip (Python package manager)

Windows

  • Install Python: Ensure that Python 3.8 or higher is installed. You can download Python from the official Python website.

  • During installation, check the box for "Add Python to PATH". Install the pip: pip is bundled with Python, but if you need to reinstall or upgrade it:

   python -m ensurepip --upgrade
  • Set Up Virtual Environment:

   python -m venv chatbot-env
   chatbot-env\Scripts\activate

Screenshot 2024-11-25 101829

This screenshot describes the environmental setup of LangChain.

  • Installing through command prompt:

 pip install langchain

Screenshot 2024-11-25 101849

  • Install LangChain and any additional dependencies using pip:

   pip install langchain
   pip install transformers
   pip install torch

Screenshot 2024-11-25 102118 Screenshot 2024-11-25 102145

The above three screenshots show how additional dependencies of LangChain are installed using pip statements

MacOS

  • Install Python: macOS comes with Python, but it might be outdated. It’s recommended to install a modern version via Homebrew:

   brew install python
  • Set Up Virtual Environment:

   python3 -m venv chatbot-env
   source chatbot-env/bin/activate
  • Install LangChain and any additional dependencies:

   pip install langchain
   pip install transformers
   pip install torch

Linux

  • Install Python: Most Linux distributions have Python pre-installed, but ensure it's the correct version:

   python3 --version
  • Install pip: Install pip if it’s not already installed:

   sudo apt-get update
   sudo apt-get install python3-pip
  • Set Up Virtual Environment:

   python3 -m venv chatbot-env
   source chatbot-env/bin/activate
  • Install LangChain and additional dependencies:

   pip install langchain
   pip install transformers
   pip install torch

2. Configuration

  • After installation, you need to set up your environment for Langchain.

Environment Variables

  • Langchain may require API keys or configuration parameters based on the services you use. Create a .env file in your project directory and add your configurations:

   OPENAI_API_KEY=your_openai_api_key
   # Add other relevant environment variables here
  • For no API keys:

   from transformers import pipeline

   # Load a local GPT-2 model
   generator = pipeline('text-generation', model='gpt2')

   # Example usage
   prompt = "Explain the importance of machine learning."
   response = generator(prompt, max_length=100, num_return_sequences=1)

   print(response[0]['generated_text'])

3. Implementation

This section provides a detailed walkthrough of setting up and enhancing a chatbot using LangChain with a local Hugging Face model and memory integration.

Basic Setup

The following code demonstrates the basic setup of a chatbot using the Hugging Face pipeline and LangChain integration. This implementation creates a simple academic assistant that responds to user questions.

   from langchain import HuggingFacePipeline
   from transformers import pipeline
   from langchain.prompts import PromptTemplate
   from langchain.chains import LLMChain

   # Load a local model (GPT-2 in this case)
   generator = pipeline('text-generation', model='gpt2')

   # Wrap Hugging Face pipeline for LangChain compatibility
   llm = HuggingFacePipeline(pipeline=generator)

   # Create a prompt template
   template = "You are an academic assistant. Answer the following question: {question}"
   prompt = PromptTemplate(input_variables=["question"], template=template)

   # Create a LangChain instance
   llm_chain = LLMChain(llm=llm, prompt=prompt)

   # Example question
   question = "What is the difference between artificial intelligence and machine learning?"
   response = llm_chain.run(question)

   # Print the result
   print(response)
  • The implementation starts by loading a Hugging Face GPT-2 model using the transformers library pipeline, which serves as the chatbot's core. This pipeline is wrapped in HuggingFacePipeline for LangChain compatibility. A PromptTemplate is created to structure user inputs, defining the chatbot's role and expected variables. An LLMChain then links the language model with the prompt, forming a reusable chatbot.
  • Working of the above implementation is below:

Screenshot 2024-11-24 175816

Adding Memory

  • If you want your chatbot to remember context across conversations, you can implement memory:

   from langchain.memory import ConversationBufferMemory

   # Initialize memory
   memory = ConversationBufferMemory()

   # Integrate memory with your chatbot
   chatbot_with_memory = LLMChain(llm=llm, memory=memory)

   # Use the chatbot with memory
   response = chatbot_with_memory.run(prompt)
   print(response)

Group 8

Explanation-

  • Conversation Memory: The ConversationBufferMemory is initialized to store a buffer of the conversation history.

  • Memory Integration: The memory is integrated into the LLMChain. This allows the chatbot to retain previous interactions when generating responses.

  • Multi-Turn Dialogue: When a user continues a conversation, the chatbot uses the stored history to contextualize its answers.

4. Usage

  • Activate the virtual environment:

 .\langchain-env\Scripts\activate
  • Run the python script without API key:

 python chatbot.py

You can now use your Langchain-powered chatbot in various ways:

  • Interactive Chatbot: Create a loop to continuously accept user input and provide responses.
  • Integrations: Integrate with web frameworks (e.g., Flask or FastAPI) for deploying your chatbot as a web app.
  • Logging: Implement logging to keep track of interactions for future analysis.
  • Here is the Langchain code used in the project

 from langchain.chains import RetrievalQA
 from langchain.vectorstores import Milvus
 from langchain.embeddings.openai import OpenAIEmbeddings
 from langchain.prompts import PromptTemplate
 from langchain.llms import OpenAI

 # Initialize Milvus vector store
  vector_store = Milvus(
embedding_function=OpenAIEmbeddings(),
connection_args={"host": "localhost", "port": "19530"}
)

# Define a custom prompt template
prompt_template = PromptTemplate(
input_variables=["context", "question"],
template="""
    You are an academic advisor chatbot. Use the provided context to answer the question as accurately as possible.
    
    Context: {context}
    Question: {question}
    
    Answer:
    """
    )

   # Create a RetrievalQA chain
   qa_chain = RetrievalQA.from_chain_type(
llm=OpenAI(model="gpt-4"),
retriever=vector_store.as_retriever(),
chain_type_kwargs={"prompt": prompt_template}
)

# Process a user query
def process_query(query):
try:
    response = qa_chain.run(query)
    return response
except Exception as e:
    return f"Error: {e}"

 # Example usage
 user_query = "What are the prerequisites for CS 6550?"
 response = process_query(user_query)
 print(response)
  • LangChain is used to implement retrieval-based conversational AI by:

Storing academic resources, policies, and guides in a vector store (Milvus) as embeddings. Searching the vector store using cosine similarity to retrieve relevant content based on user queries. Injecting this retrieved information into the language model's prompt, ensuring responses are grounded in reliable data.

Group 7

5. Troubleshooting

  1. Environment Variables Not Loaded Correctly One possible issue could be with loading the Mistral API key from the .env file. Potential Problem:

MISTRAL_API_KEY not loaded: If the .env file is not properly loaded, the os.getenv("MISTRAL_API_KEY") call will return None, causing the function to raise a ValueError with the message: "MISTRAL_API_KEY not found in .env".

Troubleshooting Steps: Ensure .env file exists: Make sure that the .env file exists and contains the correct API key, i.e., MISTRAL_API_KEY=.

Path issue: Confirm that the .env file is in the correct directory and that the script is executing in the correct working directory.

Check permissions: Ensure the script has permission to read the .env file.

Recheck loading: Try printing the API key for debugging

print(f"Loaded API key: {os.getenv('MISTRAL_API_KEY')}") 2. Document Loading Issues The document loading and embedding step might fail if there are issues with file paths or formats. Potential Problem:

Document path is invalid: The document_path = "data/textbook" might be incorrect or inaccessible.

Troubleshooting Steps: Path check: Verify that the directory data/textbook exists and is accessible from the script's location.

File format: Ensure the files in the data/textbook directory are in a supported format for loading and embedding.

   import os
   print(os.listdir(document_path))
  1. Incorrect Retrieval Configuration The get_hybrid_retriever function might be misconfigured, causing poor document retrieval. Potential Problem:

Retriever misconfiguration: If the number of k results (here 15) is too large or too small, or the retriever is not configured correctly, the retrieval process might fail or return irrelevant results.

Troubleshooting Steps: Test different values for k: Start with smaller or larger values for k to see if retrieval improves.

Check retrieval behavior: Print out the documents retrieved by the retriever:

   retrieved_docs = retriever.get_relevant_documents(question)
   print(f"Retrieved {len(retrieved_docs)} documents for the question.")
  1. Prompt and Model Issues There could be issues with how the prompt is structured or how the model is being invoked.
  2. Error in Response Handling and Source Extraction The function get_answer_with_source() may not handle the response correctly, particularly if the context is missing or misformatted.
  3. General Error Logging To capture any unexpected errors that may occur in the workflow, you can add a try-except block around critical sections.
⚠️ **GitHub.com Fallback** ⚠️