LangChain - DrAlzahraniProjects/csusb_fall2024_cse6550_team4 GitHub Wiki
git clone https://github.com/DrAlzahraniProjects/csusb_fall2024_cse6550_team4.git
Change directory to the cloned repository:
cd csusb_fall2024_cse6550_team4
In the repository open docker file to install LangChain
we are using mamba to install all the necessary packages
RUN conda install -c conda-forge mamba -y
create the mamba enviornment called team4_env using python 3.11
RUN mamba create -n team4_env python=3.11 -y
Add langchain and its packages into requirements file to install
langchain
langchain-community
langchain-huggingface
langchain-text-splitters
langchain-mistralai
sentence-transformers
transformers
The requirements.txt file is copied into the container, and all dependencies are installed using Mamba.
COPY requirements.txt /app/requirements.txt
mamba is used to install all the langchain and its dependencies
RUN mamba install --name team4_env --yes --file requirements.txt && mamba clean --all -f -y
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.schema import Document
from langchain_core.prompts import PromptTemplate
from langchain_mistralai.chat_models import ChatMistralAI
from langchain_milvus import Milvus
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chains import create_retrieval_chain
from langchain_huggingface import HuggingFaceEmbeddings
load_dotenv()
MISTRAL_API_KEY = os.getenv("MISTRAL_API_KEY")
def get_embedding_function():
embedding_function = HuggingFaceEmbeddings(model_name=MODEL_NAME)
return embedding_function
def create_vector_store(docs, embeddings, uri):
# Create the directory if it does not exist
head = os.path.split(uri)
os.makedirs(head[0], exist_ok=True)
# Connect to the Milvus database
connections.connect("default",uri=uri)
....
retriever = ScoreThresholdRetriever(
vector_store=vector_store,
score_threshold=0.2, # Minimum score threshold
k=3 # Retrieve top 3 documents
)
def initialize_milvus(uri, documents):
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = Milvus.from_documents(
documents=documents,
embedding=embeddings,
collection_name="research_paper_chatbot",
connection_args={"uri": uri},
)
return vector_store
ROMPT_TEMPLATE = """
Human: You are an AI assistant, and provides answers to questions by using fact based and statistical information when possible.
Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.
Only use the information provided in the <context> tags.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
<context>
{context}
</context>
<question>
{input}
</question>
The response should be specific and use statistics or numbers when possible.
Assistant:"""
Split documents into smaller, manageable chunks for efficient indexing. Example:
text_splitter = RecursiveCharacterTextSplitter(
# Constants for embedding and chunking
chunk_size=2000, # Split the text into chunks of 1000 characters
chunk_overlap=200, # Overlap the chunks by 300 characters
is_separator_regex=False, # Don't split on regex
)
document_chain = create_stuff_documents_chain(model, prompt)
print("Document Chain Created")
retrieval_chain = create_retrieval_chain(retriever, document_chain)
print("Retrieval Chain Created")
# Get relevant documents
relevant_docs = retriever.get_relevant_documents(query)
print(f"Relevant Documents: {relevant_docs}")
# Generate response
response = retrieval_chain.invoke({"input": query})
response_text = response.get("answer", "No answer found.")
model = ChatMistralAI(model='open-mistral-7b', api_key=MISTRAL_API_KEY, temperature=0.2)
print("Model Loaded")
prompt = create_prompt()
from dotenv import load_dotenv
load_dotenv()
MISTRAL_API_KEY = os.getenv("MISTRAL_API_KEY")
Usage: This section loads environment variables, which may include API keys or configuration settings necessary for the application.
MILVUS_URI = "./milvus/milvus_vector.db"
MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
from langchain_huggingface import HuggingFaceEmbeddings
MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
def get_embedding_function():
"""
returns embedding function for the model
"""
embedding_function = HuggingFaceEmbeddings(model_name=MODEL_NAME)
return embedding_function
Usage: This code creates an embedding function using a pre-trained model from Hugging Face, transforming text into vector representations for later retrieval.
def query_rag(query):
# Define the model
model = ChatMistralAI(model='open-mistral-7b', api_key=MISTRAL_API_KEY, temperature=0.2)
print("Model Loaded")
....
Usage: This initializes a conversational AI model and creates a document chain that combines the model with the defined prompt for generating answers.
ROMPT_TEMPLATE = """
Human: You are an AI assistant, and provides answers to questions by using fact based and statistical information when possible.
Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.
Only use the information provided in the <context> tags.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
<context>
{context}
</context>
<question>
{input}
</question>
The response should be specific and use statistics or numbers when possible.
Assistant:"""
document_chain = create_stuff_documents_chain(model, prompt)
print("Document Chain Created")
Usage: Combines the AI model and the prompt template to create a document chain for generating responses.
retrieval_chain = create_retrieval_chain(retriever, document_chain)
print("Retrieval Chain Created")
Usage: Establishes a retrieval chain that connects the retriever with the document chain for effective document retrieval and answer generation.
Adjust Score Threshold: Lower or raise the threshold in the retriever
retriever = ScoreThresholdRetriever(score_threshold=0.1)
HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
curl -H "Authorization: Bearer <API_KEY>" https://mistral.ai/api/v1
ntent for doc in relevant_docs])
curl -H "Authorization: Bearer <MISTRAL_API_KEY>" https://mistral.ai/api/v1
Symptom: Errors like Error during similarity search Cause: Issues with embeddings or document structure
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
- API Key Issues: Ensure your API keys are correct and have necessary permissions.
- Dependency Conflicts: Manage your libraries within virtual environments to avoid version conflicts.
- Memory Issues: If the model fails to remember context, check your memory implementation.
- Log Outputs: Print intermediate outputs to understand where issues are arising.
- Check API Limits: Be aware of any rate limits imposed by the APIs you are using.
- Use Assertions: Implement assertions in your code to catch unexpected behavior early.