Section 3: Image Analysis and RAG - calisley/dpi-681 GitHub Wiki

Activity 1 — Images 20 Questions! (section-3/bulk_image_analysis.py)

Important: Before starting, run git pull in the dpi-681 folder to ensure you have the latest version of the section-3 folder. This is critical to avoid issues with missing or outdated files. You won't have a "section-3" folder unless you run git pull!

As a reminder, open terminal in the dpi-681 folder like we did together at the start of class. This is where you should run git pull from.

Installing required packages for today

Before doing anything today, we have to update our packages and download new ones. Run: On Windows

pip install --upgrade openai
pip install faiss

Mac/Linux

pip3 install --upgrade openai
pip3 install faiss

Activity Overview

What You'll Do

In this activity, we will work with the section-3/bulk_image_analysis.py. Our goal is to learn how to query the OpenAI API with both text and image inputs. I've compiled a small dataset of famous photos from the past year or two. Your task is to try and uncover what their common theme is!

To begin, lets change into the section 3 directory, as a reminder, that is cd section-3. Your terminal should look like this:

Remember to set your OpenAI API Key at the top of the file!

Designing the Prompt:
To design the prompt, change the code on line 29:

{"type": "input_text", "text": "what's in this image?"}

Experiment with different phrasing and details to see if you can receive more informative responses.

Important Every time you change the prompt, make sure you save the file before running the script again! Otherwise your new prompt won't be run.

After changing the prompt and saving the file, run

Mac\Linux

python3 bulk_image_analysis.py

Windows

python bulk_image_analysis.py

If it looks like this, you've got it!


Common Errors

Forgot API Key

To fix it, ensure you've added your API key into the top of the file.

File not found error

To fix it, ensure you've cd into the correct directory! As a reminder, that is

cd section-3

Script Explanation

Below is a step-by-step breakdown of what the script does:

1. Importing Required Libraries

The script begins by importing three key libraries:

  • pandas: For handling CSV file operations and data manipulation.
  • OpenAI (from the OpenAI library): To interact with the API.
  • tqdm: To visually track the progress of image processing.

Example:

import pandas as pd
from openai import OpenAI
from tqdm import tqdm

2. Loading the CSV File

A CSV file containing image details is loaded. This CSV must have at least two columns: image_id and url.

The CSV file path is set:

csv_file = "./images.csv"

The CSV file is then read into a DataFrame:

df = pd.read_csv(csv_file)

3. Initializing the OpenAI Client

Before making API requests, the OpenAI client is initialized with your API key:

client = OpenAI(api_key="YOUR API KEY HERE")

Ensure you replace "YOUR API KEY HERE" with your actual API key.

4. Iterating API Requests Over the Images

A list is created to store responses for each image. The script iterates over each row in the DataFrame using a loop that integrates a progress bar provided by tqdm:

for index, row in tqdm(df.iterrows(), total=df.shape[0], desc="Processing Images"):
    image_id = row["image_id"]
    image_url = row["url"]
    
    # Make the API query that includes both text and image input
    response = client.responses.create(
        model="gpt-4o-mini",
        input=[{
            "role": "user",
            "content": [
                {"type": "input_text", "text": "what's in this image?"},
                {"type": "input_image", "image_url": image_url},
            ],
        }],
    )

For each image:

  • image_id and image_url are extracted.
  • An API request is made using both text (our prompt!) and the image URL, passed as a variable.

5. Extracting and Storing the Response

After receiving the API response, the output text is extracted:

output_text = response.output_text

and then printed for you to analyze:

print("Image ID:", image_id)
print("Model Response: ", output_text)

The script then saves this along with image_id and image_url in a results list.

7. Saving the Results

Finally, the list of results is converted into a DataFrame and written to a CSV file:

results_df = pd.DataFrame(results)
results_df.to_csv("./images_analysis_results.csv", index=False)

A message is printed to indicate that the analysis is complete:

print("Image analysis complete. Results saved to images_analysis_results.csv")

Activity 2 — Legal Assistant with RAG (rag_example.py)

Important: Ensure you've placed your API key in the script at the top where client = "YOUR API KEY HERE" is defined. This script requires that you have already built the FAISS index and metadata file; if these files are missing, build the vector database first.

Installing Required Packages

If you haven't already, install the necessary packages using the commands below.

On Windows

pip install --upgrade openai
pip install faiss

On Mac/Linux

pip3 install --upgrade openai
pip3 install faiss

Activity Overview

What You'll Do

In this activity, you will work with a script that allows you to ask questions about Massachusetts real-estate law. The script uses Retrieval Augmented Generation (RAG) to retrieve context from a pre-built FAISS index and generate responses via the OpenAI API.

Make sure to:

  • Set your OpenAI API key in the script.
  • Run the script from the folder containing the FAISS index and metadata.

Script Explanation

Below is a step-by-step breakdown of what the script does:

1. Defining Key Variables and Constants

Several key variables are defined at the top of the script:

  • client: A placeholder for your API key. The 'head' of our API request, as we've discussed.
  • BASE_SYSTEM_PROMPT: The base prompt that instructs the model to act as a legal assistant for Massachusetts real-estate law queries.
  • FAISS_INDEX_FILE and METADATA_FILE: The file paths to the FAISS index and JSON metadata. You should have gotten these files from git pull
  • Other configuration parameters such as TOP_K (how many different parts of the legal code to return), EMBEDDING_MODEL, and MODEL.

3. Loading FAISS Index and Metadata

Example code snippet:

faiss_index = faiss.read_index(FAISS_INDEX_FILE)
with open(METADATA_FILE, 'r', encoding='utf-8') as f:
    metadata = json.load(f)`

4. Generating an Embedding for a Given Text

A function named get_embedding sends a text to the API to generate an embedding using the specified embedding model. The resulting embedding is converted to a NumPy array for further processing.

Function definition

def get_embedding(text: str):
    response = client.embeddings.create(  
        input=text[:8150],  
        model=EMBEDDING_MODEL  
    )  
    embedding = response.data[0].embedding 
    return np.array(embedding, dtype=np.float32)

5. Retrieving Context from the FAISS Index

The retrieve_context function generates an embedding for the user query, searches the FAISS index for the closest matching documents, and then the full text from these documents. For each document, it constructs a citation in the format (Chapter [Chapter] Section [Section], [link]) and includes a snippet of the document text.

def retrieve_context(query: str, top_k: int = TOP_K):  
    query_emb = get_embedding(query) 
    if query_emb is None:
        return ""
    query_emb = np.expand_dims(query_emb, axis=0)
    distances, indices = faiss_index.search(query_emb, top_k)  
  
    retrieved = []  
    for idx in indices[0]:  
        if idx < len(metadata):  
            doc = metadata[idx]  
            citation = f"(Chapter {doc.get('chapter', 'Unknown')} Section {doc.get('section', 'Unknown')}, {doc.get('link', 'No link')})"  
            snippet = doc.get('full_text', '').replace("\n", " ")  
            retrieved.append(f"{citation}: {snippet}")  
  
    if retrieved:  
        return "Retrieved context:\n" + "\n".join(retrieved) + "\n" 
    return ""

6. Formulating the API Request and Generating a Response

The make_query function appends the retrieved context to the base system prompt, sets up the conversation messages, and sends a chat completion request to the OpenAI API. Once a response is received, the content of the response is printed.

 def make_query(user_input):   
     retrieved_context = retrieve_context(user_input)   
     full_system_prompt = BASE_SYSTEM_PROMPT + "\n" + retrieved_context   
     messages = [   
         {"role": "system", "content": full_system_prompt},   
         {"role": "user", "content": user_input}   
     ]   
     response = client.chat.completions.create(   
         model=MODEL,   
         messages=messages   
     )   
     full_response = response.choices[0].message.content   
     print("\nResponse:\n", full_response) 

7. Main Function and CLI Interaction

The main function welcomes the user, informs them about the purpose of the chatbot, and waits for the user to input a question regarding Massachusetts real-estate law. If no question is provided, the script exits. Otherwise, it calls make_query to process the query and display the response.

Example code snippet:

 def main():   
     print("Welcome to the RAG-enabled ChatGPT CLI!")   
     print("You can ask questions about Massachusetts real-estate law.")   
     print("Note: This chatbot is NOT a lawyer and cannot provide legal advice.")   
     user_input = input("Enter your Massachusetts real-estate law question:\n> ").strip()   
     if not user_input:   
         print("No question entered. Exiting.")   
         return   
     make_query(user_input)   
 
 if __name__ == "__main__":   
     main() 

This script integrates vector search using FAISS with the OpenAI API to help non-lawyers get contextualized legal information about Massachusetts real-estate law in plain text.