Section 5: Data Processing Revisited, Advanced API usage - calisley/dpi-681 GitHub Wiki

OCR PDF Extraction Setup Guide

This guide will walk you through all the steps necessary to extract text from a PDF using OCR with Python. It is designed to help both newcomers and experienced users set up the environment correctly. You can skip steps if you have already completed them.

Step 1: Install Required Python Packages

For our OCR script, you need two key Python packages: pdf2image to convert PDF pages to images, and pytesseract for performing OCR on those images.

Open Your Terminal/Command Prompt and navigate to your project folder.

Run the Installation Commands:

Mac/Linux:
```
pip3 install pdf2image pytesseract
```
Windows:
```
pip install pdf2image pytesseract
```

Step 2: Install Tesseract OCR Engine

Tesseract is the OCR engine used by pytesseract. Install it based on your operating system:

Ubuntu:

sudo apt-get update
sudo apt-get install tesseract-ocr

macOS:
- Install Homebrew (if not already installed):
  Visit the Homebrew website for installation instructions. After installing, you must add Homebrew to your PATH. For example, add the following lines to your profile (e.g., /Users/your_username/.zprofile):
```
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> /Users/your_username/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"
```
  The brew module will tell you exactly what to type after you install it in "Next Steps"
  
  Then, install tesseract (for OCR) and poppler (for image to PDF conversion)
```
brew install tesseract
brew install poppler
```
Windows:
- Download the Installer:
  Visit the Tesseract at UB Mannheim page to download the Windows installer.
- Run the Installer:
  Execute the downloaded installer and follow the on-screen instructions.
- Important:
  Ensure Tesseract’s executable is added to your system's PATH so that pytesseract can locate it.

Step 4: Open Your Project and Run the Script

Navigate to the Project Directory:
Instead of creating a new project directory, open the existing section-5 folder that contains your project files.
Locate the Script:
Inside the section-5 folder, find the read_pdf.py script. This file contains the code to convert the PDF pages into images and perform OCR.
Run the Script:
Open your terminal (or command prompt on Windows), navigate to the section-5 directory, and run the script:
- macOS/Linux:
```
cd /section-5
python3 read_pdf.py
```
- Windows:
```
cd \section-5
python read_pdf.py
```
Review the Output:
The script will process the tealbook.pdf file (which should be located in the section-5 folder) and display the extracted text in your terminal. We will talk through ways to make this more useful.

Tool Calling with the Weather Bot

Now that you've practiced extracting structured information from documents, we can start building intelligent tools that interact with live data. This example shows how to use OpenAI's function calling feature to make an AI assistant that can look up the weather based on the user's request.

What This Script Does (`section-5/weather_agent.py`)

This Python script demonstrates how to use OpenAI’s tool calling feature to create an intelligent assistant that:

Accepts a natural language query from the user — such as “What’s the temperature in Boston right now?”
Decides whether to call a tool — in this case, a tool named get_weather that can look up the current temperature using latitude and longitude.
Automatically extracts the arguments the tool needs — like "latitude": "42.36", "longitude": "-71.06" — using the LLM’s understanding of the user’s intent.
Calls your custom function get_weather() — which uses the Open-Meteo API to retrieve real-time weather data.
Returns the weather result to the model, so it can incorporate it into a complete and natural-sounding response to the user.

Key Parts of the Script

Tool Schema:
This defines the function the model can call — what it’s called (get_weather), what arguments it expects (latitude, longitude), and how it’s structured.
OpenAI responses.create():
This is the main API call that sends the user's message and the available tool schema to the model. The model will choose whether to call the tool.
Tool Call Handling:
If the model calls a tool, your script detects it, parses the arguments, executes your local Python function, and sends the result back to the model.
Natural Language Output:
The model then responds using the tool result, e.g., "The current temperature in Boston is 14°C." If no tool is needed (e.g., “Who is the president of France?”), the model will just respond normally.

Advanced Tool Calling: Sentiment Plotter Agent

In this version of our tool-calling system, we go beyond a single API call and create a multi-tool agent that can:

Retrieve live news articles from the Guardian API
Ask the model to analyze their sentiment on a 1–10 scale
Save the structured sentiment results to a CSV
Generate a plot showing sentiment over time

What This Script Does

This more advanced assistant reasons through multiple steps. It chooses when to call tools, chains tool outputs into new inputs, and uses those results to answer complex questions. For example:

"Generate me a plot of the sentiment of the 50 most recent articles about Harvard."

Here’s what happens under the hood:

The model calls get_news_articles(), providing a search query like "harvard" and an output path like "harvard_articles.csv".
You fetch the articles from the Guardian API and save them as a CSV. The article date and bodyText fields are preserved.
The agent is given the list of articles (with datetime + text) and instructed to call analyze_sentiment() on them.
The model rates each article’s sentiment on a scale of 1 (very negative) to 10 (very positive) and returns the results in a structured format.
The sentiment results are saved to a CSV, ready for visualization.
Finally, the model calls graph_data(), specifying which CSV to use, what the x/y axes should be (e.g., datetime vs. sentiment), and how to label the graph.
A line graph is generated, showing sentiment trends over time.

Key Features of This Bot

🧠 LLM-guided workflow: The model reasons about what tools to use, when to use them, and in what order.
📈 Structured sentiment output: Sentiment scores are numeric and tied to article timestamps.
🗃️ Reusable artifacts: Each step produces a file — CSVs and PNGs — that you can use in other tools or apps.
🧩 Tool chaining: The output of one function becomes the input to the next. You don’t need to manually decide what to do next — the agent does.

Why This Matters

This is a foundational pattern in modern AI development:

You provide tools (functions),
The model makes decisions about how to use them,
And your backend takes care of execution and storage.

This approach is scalable, safe, and testable — you can add tools for anything (like send_email, summarize_csv, or classify_tone) and the model will know how to use them.

Next steps might include adding:

Chunking for longer article sets
Support for bar/line/scatter plots
APIs for uploading the results to a dashboard or report

This is how assistants grow from single-turn bots into real agents.