Section 5: Data Processing Revisited, Advanced API usage - calisley/dpi-681 GitHub Wiki
OCR PDF Extraction Setup Guide
This guide will walk you through all the steps necessary to extract text from a PDF using OCR with Python. It is designed to help both newcomers and experienced users set up the environment correctly. You can skip steps if you have already completed them.
Step 1: Install Required Python Packages
For our OCR script, you need two key Python packages: pdf2image to convert PDF pages to images, and pytesseract for performing OCR on those images.
-
Open Your Terminal/Command Prompt and navigate to your project folder.
-
Run the Installation Commands:
- Mac/Linux:
pip3 install pdf2image pytesseract - Windows:
pip install pdf2image pytesseract
- Mac/Linux:
Step 2: Install Tesseract OCR Engine
Tesseract is the OCR engine used by pytesseract. Install it based on your operating system:
-
Ubuntu:
sudo apt-get update sudo apt-get install tesseract-ocr -
macOS:
-
Install Homebrew (if not already installed):
Visit the Homebrew website for installation instructions. After installing, you must add Homebrew to your PATH. For example, add the following lines to your profile (e.g.,/Users/your_username/.zprofile):echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> /Users/your_username/.zprofile eval "$(/opt/homebrew/bin/brew shellenv)"The
brewmodule will tell you exactly what to type after you install it in "Next Steps"Then, install
tesseract(for OCR) andpoppler(for image to PDF conversion)brew install tesseract brew install poppler
-
-
Windows:
- Download the Installer:
Visit the Tesseract at UB Mannheim page to download the Windows installer. - Run the Installer:
Execute the downloaded installer and follow the on-screen instructions. - Important:
Ensure Tesseract’s executable is added to your system's PATH so thatpytesseractcan locate it.
- Download the Installer:
Step 4: Open Your Project and Run the Script
-
Navigate to the Project Directory:
Instead of creating a new project directory, open the existingsection-5folder that contains your project files. -
Locate the Script:
Inside thesection-5folder, find theread_pdf.pyscript. This file contains the code to convert the PDF pages into images and perform OCR. -
Run the Script:
Open your terminal (or command prompt on Windows), navigate to thesection-5directory, and run the script:- macOS/Linux:
cd /section-5 python3 read_pdf.py - Windows:
cd \section-5 python read_pdf.py
- macOS/Linux:
-
Review the Output:
The script will process thetealbook.pdffile (which should be located in thesection-5folder) and display the extracted text in your terminal. We will talk through ways to make this more useful.
Tool Calling with the Weather Bot
Now that you've practiced extracting structured information from documents, we can start building intelligent tools that interact with live data. This example shows how to use OpenAI's function calling feature to make an AI assistant that can look up the weather based on the user's request.
What This Script Does (section-5/weather_agent.py)
This Python script demonstrates how to use OpenAI’s tool calling feature to create an intelligent assistant that:
-
Accepts a natural language query from the user — such as “What’s the temperature in Boston right now?”
-
Decides whether to call a tool — in this case, a tool named
get_weatherthat can look up the current temperature using latitude and longitude. -
Automatically extracts the arguments the tool needs — like
"latitude": "42.36", "longitude": "-71.06"— using the LLM’s understanding of the user’s intent. -
Calls your custom function
get_weather()— which uses the Open-Meteo API to retrieve real-time weather data. -
Returns the weather result to the model, so it can incorporate it into a complete and natural-sounding response to the user.
Key Parts of the Script
-
Tool Schema:
This defines the function the model can call — what it’s called (get_weather), what arguments it expects (latitude,longitude), and how it’s structured. -
OpenAI
responses.create():
This is the main API call that sends the user's message and the available tool schema to the model. The model will choose whether to call the tool. -
Tool Call Handling:
If the model calls a tool, your script detects it, parses the arguments, executes your local Python function, and sends the result back to the model. -
Natural Language Output:
The model then responds using the tool result, e.g., "The current temperature in Boston is 14°C." If no tool is needed (e.g., “Who is the president of France?”), the model will just respond normally.
Advanced Tool Calling: Sentiment Plotter Agent
In this version of our tool-calling system, we go beyond a single API call and create a multi-tool agent that can:
- Retrieve live news articles from the Guardian API
- Ask the model to analyze their sentiment on a 1–10 scale
- Save the structured sentiment results to a CSV
- Generate a plot showing sentiment over time
What This Script Does
This more advanced assistant reasons through multiple steps. It chooses when to call tools, chains tool outputs into new inputs, and uses those results to answer complex questions. For example:
"Generate me a plot of the sentiment of the 50 most recent articles about Harvard."
Here’s what happens under the hood:
-
The model calls
get_news_articles(), providing a search query like"harvard"and an output path like"harvard_articles.csv". -
You fetch the articles from the Guardian API and save them as a CSV. The article
dateandbodyTextfields are preserved. -
The agent is given the list of articles (with datetime + text) and instructed to call
analyze_sentiment()on them. -
The model rates each article’s sentiment on a scale of 1 (very negative) to 10 (very positive) and returns the results in a structured format.
-
The sentiment results are saved to a CSV, ready for visualization.
-
Finally, the model calls
graph_data(), specifying which CSV to use, what the x/y axes should be (e.g., datetime vs. sentiment), and how to label the graph. -
A line graph is generated, showing sentiment trends over time.
Key Features of This Bot
- 🧠 LLM-guided workflow: The model reasons about what tools to use, when to use them, and in what order.
- 📈 Structured sentiment output: Sentiment scores are numeric and tied to article timestamps.
- 🗃️ Reusable artifacts: Each step produces a file — CSVs and PNGs — that you can use in other tools or apps.
- 🧩 Tool chaining: The output of one function becomes the input to the next. You don’t need to manually decide what to do next — the agent does.
Why This Matters
This is a foundational pattern in modern AI development:
- You provide tools (functions),
- The model makes decisions about how to use them,
- And your backend takes care of execution and storage.
This approach is scalable, safe, and testable — you can add tools for anything (like send_email, summarize_csv, or classify_tone) and the model will know how to use them.
Next steps might include adding:
- Chunking for longer article sets
- Support for bar/line/scatter plots
- APIs for uploading the results to a dashboard or report
This is how assistants grow from single-turn bots into real agents.