LangChain - DrAlzahraniProjects/csusb_fall2024_cse6550

LangChain Documentation

Installation
Configuration
Implementation
Usage
Troubleshooting

Installation

1.Clone the Repository

git clone https://github.com/DrAlzahraniProjects/csusb_fall2024_cse6550_team4.git

2.Navigate to the Repository

Change directory to the cloned repository:

cd csusb_fall2024_cse6550_team4

In the repository open docker file to install LangChain

3.Install mamba

we are using mamba to install all the necessary packages

RUN conda install -c conda-forge mamba -y

4.Create mamba enviornment

create the mamba enviornment called team4_env using python 3.11

RUN mamba create -n team4_env python=3.11 -y

5.Create requirements.txt

Add langchain and its packages into requirements file to install

langchain
langchain-community
langchain-huggingface
langchain-text-splitters
langchain-mistralai
sentence-transformers
transformers

6.Copy requirements

The requirements.txt file is copied into the container, and all dependencies are installed using Mamba.

COPY requirements.txt /app/requirements.txt

7.Install

mamba is used to install all the langchain and its dependencies

RUN mamba install --name team4_env --yes --file requirements.txt && mamba clean --all -f -y

Configuration

1.Import necessary library's

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.schema import Document
from langchain_core.prompts import PromptTemplate
from langchain_mistralai.chat_models import ChatMistralAI
from langchain_milvus import Milvus
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chains import create_retrieval_chain
from langchain_huggingface import HuggingFaceEmbeddings

2.Set and Load enviornment variables

load_dotenv()
MISTRAL_API_KEY = os.getenv("MISTRAL_API_KEY")

3.Setting Up the Embedding Function

def get_embedding_function():
    embedding_function = HuggingFaceEmbeddings(model_name=MODEL_NAME)
    return embedding_function

4.Setting Up Milvus for Vector Storage

def create_vector_store(docs, embeddings, uri):
    # Create the directory if it does not exist
    head = os.path.split(uri)
    os.makedirs(head[0], exist_ok=True)

    # Connect to the Milvus database
    connections.connect("default",uri=uri)

....

5.Set up the Retriever and Chain

retriever = ScoreThresholdRetriever(
    vector_store=vector_store,
    score_threshold=0.2,  # Minimum score threshold
    k=3  # Retrieve top 3 documents
)

Implementation

Initialize milvus

def initialize_milvus(uri, documents):
    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
    vector_store = Milvus.from_documents(
        documents=documents,
        embedding=embeddings,
        collection_name="research_paper_chatbot",
        connection_args={"uri": uri},
    )
    return vector_store

Create a prompt for the model

ROMPT_TEMPLATE = """
    Human: You are an AI assistant, and provides answers to questions by using fact based and statistical information when possible.
    Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.
    Only use the information provided in the <context> tags.
    If you don't know the answer, just say that you don't know, don't try to make up an answer.
    <context>
    {context}
    </context>

    <question>
    {input}
    </question>

    The response should be specific and use statistics or numbers when possible.

    Assistant:"""

Text Splitter

Split documents into smaller, manageable chunks for efficient indexing. Example:

text_splitter = RecursiveCharacterTextSplitter(
        # Constants for embedding and chunking
        chunk_size=2000,  # Split the text into chunks of 1000 characters
        chunk_overlap=200,  # Overlap the chunks by 300 characters
        is_separator_regex=False,  # Don't split on regex
    )

Screenshot (81)

Create the RAG chain

document_chain = create_stuff_documents_chain(model, prompt)
print("Document Chain Created")

retrieval_chain = create_retrieval_chain(retriever, document_chain)
print("Retrieval Chain Created")

# Get relevant documents
relevant_docs = retriever.get_relevant_documents(query)
print(f"Relevant Documents: {relevant_docs}")

# Generate response
response = retrieval_chain.invoke({"input": query})
response_text = response.get("answer", "No answer found.")

Initialize mistral model

model = ChatMistralAI(model='open-mistral-7b', api_key=MISTRAL_API_KEY, temperature=0.2)
print("Model Loaded")
prompt = create_prompt()

Usage

1. Loading Environment Variables

from dotenv import load_dotenv
load_dotenv()
MISTRAL_API_KEY = os.getenv("MISTRAL_API_KEY")

Usage: This section loads environment variables, which may include API keys or configuration settings necessary for the application.

Screenshot 2024-11-01 230639

2. Define Constants for Milvus and Model

MILVUS_URI = "./milvus/milvus_vector.db"
MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"

3. Creating Embeddings

from langchain_huggingface import HuggingFaceEmbeddings
MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"

def get_embedding_function():
    """
    returns embedding function for the model
    """
    embedding_function = HuggingFaceEmbeddings(model_name=MODEL_NAME)
    return embedding_function

Usage: This code creates an embedding function using a pre-trained model from Hugging Face, transforming text into vector representations for later retrieval.

Screenshot 2024-11-01 230713

4. Retrieval-Augmented Generation (RAG)

def query_rag(query):
    
    # Define the model
    model = ChatMistralAI(model='open-mistral-7b', api_key=MISTRAL_API_KEY, temperature=0.2)
    print("Model Loaded")
    ....

Usage: This initializes a conversational AI model and creates a document chain that combines the model with the defined prompt for generating answers.

5. Create a Prompt Template

ROMPT_TEMPLATE = """
    Human: You are an AI assistant, and provides answers to questions by using fact based and statistical information when possible.
    Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.
    Only use the information provided in the <context> tags.
    If you don't know the answer, just say that you don't know, don't try to make up an answer.
    <context>
    {context}
    </context>

    <question>
    {input}
    </question>

    The response should be specific and use statistics or numbers when possible.

    Assistant:"""

6. Create Document Chain

document_chain = create_stuff_documents_chain(model, prompt)
print("Document Chain Created")

Screenshot 2024-11-01 233003 Usage: Combines the AI model and the prompt template to create a document chain for generating responses.

7. Create Retrieval Chain

retrieval_chain = create_retrieval_chain(retriever, document_chain)
print("Retrieval Chain Created")

Screenshot 2024-11-01 233031

Usage: Establishes a retrieval chain that connects the retriever with the document chain for effective document retrieval and answer generation.

Troubleshooting

Document Retrieval Failures

Solutions

Adjust Score Threshold: Lower or raise the threshold in the retriever

retriever = ScoreThresholdRetriever(score_threshold=0.1)

Screenshot (114)

Check Embeddings: Ensure the embeddings function matches your model

HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

Screenshot (115)

Check Mistral API: Test API connectivity and key validity

curl -H "Authorization: Bearer <API_KEY>" https://mistral.ai/api/v1

Screenshot (116) ntent for doc in relevant_docs])

Check API Connectivity: Verify Mistral API key and connectivity

curl -H "Authorization: Bearer <MISTRAL_API_KEY>" https://mistral.ai/api/v1

Screenshot (117)

Problem: Similarity Search Fails

Symptom: Errors like Error during similarity search Cause: Issues with embeddings or document structure

Solution:

Confirm the embedding function is initialized correctly

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

Screenshot (123)

Common Errors and Solutions

API Key Issues: Ensure your API keys are correct and have necessary permissions.
Dependency Conflicts: Manage your libraries within virtual environments to avoid version conflicts.
Memory Issues: If the model fails to remember context, check your memory implementation.

Debugging Tips

Log Outputs: Print intermediate outputs to understand where issues are arising.
Check API Limits: Be aware of any rate limits imposed by the APIs you are using.
Use Assertions: Implement assertions in your code to catch unexpected behavior early.