LangChain - DrAlzahraniProjects/csusb_fall2024_cse6550_team4 GitHub Wiki
git clone
Change directory to the cloned repository:
cd csusb_fall2024_cse6550_team4
In the repository open docker file to install LangChain
we are using mamba to install all the necessary packages
RUN conda install -c conda-forge mamba -y
create the mamba enviornment called team4_env using python 3.11
RUN mamba create -n team4_env python=3.11 -y
Add langchain and its packages into requirements file to install
The requirements.txt file is copied into the container, and all dependencies are installed using Mamba.
COPY requirements.txt /app/requirements.txt
mamba is used to install all the langchain and its dependencies
RUN mamba install --name team4_env --yes --file requirements.txt && mamba clean --all -f -y
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.schema import Document
from langchain_core.prompts import PromptTemplate
from langchain_mistralai.chat_models import ChatMistralAI
from langchain_milvus import Milvus
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chains import create_retrieval_chain
from langchain_huggingface import HuggingFaceEmbeddings
def get_embedding_function():
embedding_function = HuggingFaceEmbeddings(model_name=MODEL_NAME)
return embedding_function
def create_vector_store(docs, embeddings, uri):
# Create the directory if it does not exist
head = os.path.split(uri)
os.makedirs(head[0], exist_ok=True)
# Connect to the Milvus database
retriever = ScoreThresholdRetriever(
score_threshold=0.2, # Minimum score threshold
k=3 # Retrieve top 3 documents
def initialize_milvus(uri, documents):
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = Milvus.from_documents(
connection_args={"uri": uri},
return vector_store
Human: You are an AI assistant, and provides answers to questions by using fact based and statistical information when possible.
Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.
Only use the information provided in the <context> tags.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
The response should be specific and use statistics or numbers when possible.
Split documents into smaller, manageable chunks for efficient indexing. Example:
text_splitter = RecursiveCharacterTextSplitter(
# Constants for embedding and chunking
chunk_size=2000, # Split the text into chunks of 1000 characters
chunk_overlap=200, # Overlap the chunks by 300 characters
is_separator_regex=False, # Don't split on regex
document_chain = create_stuff_documents_chain(model, prompt)
print("Document Chain Created")
retrieval_chain = create_retrieval_chain(retriever, document_chain)
print("Retrieval Chain Created")
# Get relevant documents
relevant_docs = retriever.get_relevant_documents(query)
print(f"Relevant Documents: {relevant_docs}")
# Generate response
response = retrieval_chain.invoke({"input": query})
response_text = response.get("answer", "No answer found.")
model = ChatMistralAI(model='open-mistral-7b', api_key=MISTRAL_API_KEY, temperature=0.2)
print("Model Loaded")
prompt = create_prompt()
from dotenv import load_dotenv
Usage: This section loads environment variables, which may include API keys or configuration settings necessary for the application.
MILVUS_URI = "./milvus/milvus_vector.db"
MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
from langchain_huggingface import HuggingFaceEmbeddings
MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
def get_embedding_function():
returns embedding function for the model
embedding_function = HuggingFaceEmbeddings(model_name=MODEL_NAME)
return embedding_function
Usage: This code creates an embedding function using a pre-trained model from Hugging Face, transforming text into vector representations for later retrieval.
def query_rag(query):
# Define the model
model = ChatMistralAI(model='open-mistral-7b', api_key=MISTRAL_API_KEY, temperature=0.2)
print("Model Loaded")
Usage: This initializes a conversational AI model and creates a document chain that combines the model with the defined prompt for generating answers.
Human: You are an AI assistant, and provides answers to questions by using fact based and statistical information when possible.
Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.
Only use the information provided in the <context> tags.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
The response should be specific and use statistics or numbers when possible.
document_chain = create_stuff_documents_chain(model, prompt)
print("Document Chain Created")
Usage: Combines the AI model and the prompt template to create a document chain for generating responses.
retrieval_chain = create_retrieval_chain(retriever, document_chain)
print("Retrieval Chain Created")
Usage: Establishes a retrieval chain that connects the retriever with the document chain for effective document retrieval and answer generation.
Adjust Score Threshold: Lower or raise the threshold in the retriever
retriever = ScoreThresholdRetriever(score_threshold=0.1)
curl -H "Authorization: Bearer <API_KEY>"
ntent for doc in relevant_docs])
curl -H "Authorization: Bearer <MISTRAL_API_KEY>"
Symptom: Errors like Error during similarity search Cause: Issues with embeddings or document structure
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
- API Key Issues: Ensure your API keys are correct and have necessary permissions.
- Dependency Conflicts: Manage your libraries within virtual environments to avoid version conflicts.
- Memory Issues: If the model fails to remember context, check your memory implementation.
- Log Outputs: Print intermediate outputs to understand where issues are arising.
- Check API Limits: Be aware of any rate limits imposed by the APIs you are using.
- Use Assertions: Implement assertions in your code to catch unexpected behavior early.