Section 3: Image Analysis and RAG - calisley/dpi-681 GitHub Wiki
Activity 1 — Images 20 Questions! (section-3/bulk_image_analysis.py)
Important: Before starting, run
git pullin thedpi-681folder to ensure you have the latest version of thesection-3folder. This is critical to avoid issues with missing or outdated files. You won't have a "section-3" folder unless you run git pull!
As a reminder, open terminal in the dpi-681 folder like we did together at the start of class. This is where you should run git pull from.
Installing required packages for today
Before doing anything today, we have to update our packages and download new ones. Run: On Windows
pip install --upgrade openai
pip install faiss
Mac/Linux
pip3 install --upgrade openai
pip3 install faiss
Activity Overview
What You'll Do
In this activity, we will work with the section-3/bulk_image_analysis.py. Our goal is to learn how to query the OpenAI API with both text and image inputs. I've compiled a small dataset of famous photos from the past year or two. Your task is to try and uncover what their common theme is!
To begin, lets change into the section 3 directory, as a reminder, that is cd section-3. Your terminal should look like this:
Remember to set your OpenAI API Key at the top of the file!
Designing the Prompt:
To design the prompt, change the code on line 29:
{"type": "input_text", "text": "what's in this image?"}
Experiment with different phrasing and details to see if you can receive more informative responses.
Important Every time you change the prompt, make sure you save the file before running the script again! Otherwise your new prompt won't be run.
After changing the prompt and saving the file, run
Mac\Linux
python3 bulk_image_analysis.py
Windows
python bulk_image_analysis.py
If it looks like this, you've got it!
Common Errors
Forgot API Key
To fix it, ensure you've added your API key into the top of the file.
File not found error
To fix it, ensure you've cd into the correct directory! As a reminder, that is
cd section-3
Script Explanation
Below is a step-by-step breakdown of what the script does:
1. Importing Required Libraries
The script begins by importing three key libraries:
pandas: For handling CSV file operations and data manipulation.OpenAI(from the OpenAI library): To interact with the API.tqdm: To visually track the progress of image processing.
Example:
import pandas as pd
from openai import OpenAI
from tqdm import tqdm
2. Loading the CSV File
A CSV file containing image details is loaded. This CSV must have at least two columns: image_id and url.
The CSV file path is set:
csv_file = "./images.csv"
The CSV file is then read into a DataFrame:
df = pd.read_csv(csv_file)
3. Initializing the OpenAI Client
Before making API requests, the OpenAI client is initialized with your API key:
client = OpenAI(api_key="YOUR API KEY HERE")
Ensure you replace "YOUR API KEY HERE" with your actual API key.
4. Iterating API Requests Over the Images
A list is created to store responses for each image. The script iterates over each row in the DataFrame using a loop that integrates a progress bar provided by tqdm:
for index, row in tqdm(df.iterrows(), total=df.shape[0], desc="Processing Images"):
image_id = row["image_id"]
image_url = row["url"]
# Make the API query that includes both text and image input
response = client.responses.create(
model="gpt-4o-mini",
input=[{
"role": "user",
"content": [
{"type": "input_text", "text": "what's in this image?"},
{"type": "input_image", "image_url": image_url},
],
}],
)
For each image:
image_idandimage_urlare extracted.- An API request is made using both text (our prompt!) and the image URL, passed as a variable.
5. Extracting and Storing the Response
After receiving the API response, the output text is extracted:
output_text = response.output_text
and then printed for you to analyze:
print("Image ID:", image_id)
print("Model Response: ", output_text)
The script then saves this along with image_id and image_url in a results list.
7. Saving the Results
Finally, the list of results is converted into a DataFrame and written to a CSV file:
results_df = pd.DataFrame(results)
results_df.to_csv("./images_analysis_results.csv", index=False)
A message is printed to indicate that the analysis is complete:
print("Image analysis complete. Results saved to images_analysis_results.csv")
Activity 2 — Legal Assistant with RAG (rag_example.py)
Important: Ensure you've placed your API key in the script at the top where
client = "YOUR API KEY HERE"is defined. This script requires that you have already built the FAISS index and metadata file; if these files are missing, build the vector database first.
Installing Required Packages
If you haven't already, install the necessary packages using the commands below.
On Windows
pip install --upgrade openai
pip install faiss
On Mac/Linux
pip3 install --upgrade openai
pip3 install faiss
Activity Overview
What You'll Do
In this activity, you will work with a script that allows you to ask questions about Massachusetts real-estate law. The script uses Retrieval Augmented Generation (RAG) to retrieve context from a pre-built FAISS index and generate responses via the OpenAI API.
Make sure to:
- Set your OpenAI API key in the script.
- Run the script from the folder containing the FAISS index and metadata.
Script Explanation
Below is a step-by-step breakdown of what the script does:
1. Defining Key Variables and Constants
Several key variables are defined at the top of the script:
client: A placeholder for your API key. The 'head' of our API request, as we've discussed.BASE_SYSTEM_PROMPT: The base prompt that instructs the model to act as a legal assistant for Massachusetts real-estate law queries.FAISS_INDEX_FILEandMETADATA_FILE: The file paths to the FAISS index and JSON metadata. You should have gotten these files fromgit pull- Other configuration parameters such as
TOP_K(how many different parts of the legal code to return),EMBEDDING_MODEL, andMODEL.
3. Loading FAISS Index and Metadata
Example code snippet:
faiss_index = faiss.read_index(FAISS_INDEX_FILE)
with open(METADATA_FILE, 'r', encoding='utf-8') as f:
metadata = json.load(f)`
4. Generating an Embedding for a Given Text
A function named get_embedding sends a text to the API to generate an embedding using the specified embedding model. The resulting embedding is converted to a NumPy array for further processing.
Function definition
def get_embedding(text: str):
response = client.embeddings.create(
input=text[:8150],
model=EMBEDDING_MODEL
)
embedding = response.data[0].embedding
return np.array(embedding, dtype=np.float32)
5. Retrieving Context from the FAISS Index
The retrieve_context function generates an embedding for the user query, searches the FAISS index for the closest matching documents, and then the full text from these documents. For each document, it constructs a citation in the format (Chapter [Chapter] Section [Section], [link]) and includes a snippet of the document text.
def retrieve_context(query: str, top_k: int = TOP_K):
query_emb = get_embedding(query)
if query_emb is None:
return ""
query_emb = np.expand_dims(query_emb, axis=0)
distances, indices = faiss_index.search(query_emb, top_k)
retrieved = []
for idx in indices[0]:
if idx < len(metadata):
doc = metadata[idx]
citation = f"(Chapter {doc.get('chapter', 'Unknown')} Section {doc.get('section', 'Unknown')}, {doc.get('link', 'No link')})"
snippet = doc.get('full_text', '').replace("\n", " ")
retrieved.append(f"{citation}: {snippet}")
if retrieved:
return "Retrieved context:\n" + "\n".join(retrieved) + "\n"
return ""
6. Formulating the API Request and Generating a Response
The make_query function appends the retrieved context to the base system prompt, sets up the conversation messages, and sends a chat completion request to the OpenAI API. Once a response is received, the content of the response is printed.
def make_query(user_input):
retrieved_context = retrieve_context(user_input)
full_system_prompt = BASE_SYSTEM_PROMPT + "\n" + retrieved_context
messages = [
{"role": "system", "content": full_system_prompt},
{"role": "user", "content": user_input}
]
response = client.chat.completions.create(
model=MODEL,
messages=messages
)
full_response = response.choices[0].message.content
print("\nResponse:\n", full_response)
7. Main Function and CLI Interaction
The main function welcomes the user, informs them about the purpose of the chatbot, and waits for the user to input a question regarding Massachusetts real-estate law. If no question is provided, the script exits. Otherwise, it calls make_query to process the query and display the response.
Example code snippet:
def main():
print("Welcome to the RAG-enabled ChatGPT CLI!")
print("You can ask questions about Massachusetts real-estate law.")
print("Note: This chatbot is NOT a lawyer and cannot provide legal advice.")
user_input = input("Enter your Massachusetts real-estate law question:\n> ").strip()
if not user_input:
print("No question entered. Exiting.")
return
make_query(user_input)
if __name__ == "__main__":
main()
This script integrates vector search using FAISS with the OpenAI API to help non-lawyers get contextualized legal information about Massachusetts real-estate law in plain text.