6 ‐ Raspberry PI Integration - Af-Oliveira/WasteVision GitHub Wiki

1. YOLO Computer Vision on Raspberry Pi 5

1 - Introduction

This comprehensive technical guide presents implementation methodologies for computer vision utilizing YOLO (You Only Look Once) models on the Raspberry Pi 5 platform. This document addresses the integration of sophisticated object detection capabilities on cost-effective, portable computational hardware.

The following topics will be examined in detail:

Implementation of YOLO vision models with OpenCV
Integration with the COCO object detection library
Analysis of various YOLO model architectures
Performance optimization strategies for resource-constrained environments
Hardware control implementation based on detection outputs
Implementation of YOLO World for open-vocabulary detection capabilities
VNC server configuration for remote desktop access

Recent advancements have substantially enhanced the accessibility of computer vision technologies, enabling deployment of sophisticated detection models on modest computational platforms such as the Raspberry Pi. This progress is attributable to the development of efficient frameworks including OpenCV, standardized datasets such as COCO, and optimized algorithms like YOLO.

Understanding the Components

This section provides an overview of the primary technological components:

OpenCV: A comprehensive open-source computer vision library providing the foundational framework and algorithmic tools for image processing operations.
YOLO: An efficient object detection algorithm employing a single neural network to process the entire image, enabling real-time detection capabilities.
COCO: Common Objects in Context dataset, providing the taxonomic structure and training data that defines YOLO's object recognition capabilities.

2. Required Hardware

The following hardware components are required for implementation:

Raspberry Pi 5: 4GB or 8GB RAM configuration (The Pi 4 exhibits significantly reduced computational performance for this application)
Pi Camera: Camera Module V3 recommended for optimal image quality
Camera Adapter Cable: The Pi 5 utilizes a different connector specification than previous models
Thermal Management Solution: An active cooling system is recommended due to increased thermal output during computational processing
Power Supply: Official Raspberry Pi power supply recommended to ensure stable voltage delivery
Micro SD Card: Minimum 16GB capacity with Class 10 speed rating
Display: Monitor with Micro-HDMI to HDMI interface capability (If needed)
Input Peripherals: Standard USB keyboard and mouse (If needed)

3. Initial Raspberry Pi Setup

Basic Hardware Setup

The hardware assembly procedure is as follows:

Connect the camera cable to the camera module (wider connector end)
Connect the opposing end to the Raspberry Pi 5 (narrower connector end)
Carefully elevate the connector retention mechanisms, insert the cable ensuring proper orientation, then secure by closing the retention mechanisms
Connect the display via the micro-HDMI interface
Connect peripheral input devices to available USB ports
Insert the prepared microSD storage medium
Connect the power supply as the final step in the sequence

Exercise caution regarding cable orientation and avoid excessive flexion of the ribbon cables to prevent connector damage.

Enabling SSH and Remote Access

Remote system access significantly enhances development efficiency. Configuration procedure:

Upon initial system boot and configuration completion:
- Access the Raspberry Pi menu (upper-left interface element)
- Navigate to Preferences > Raspberry Pi Configuration
- Select the Interfaces tab
- Enable the SSH protocol
- Apply configuration changes
Determine the system's network address:
- Launch a terminal session
- Execute hostname -I
- Document the displayed IPv4 address (formatted as 192.168.x.x)

Setting Up VNC Server for Remote Desktop Access

VNC (Virtual Network Computing) enables complete remote desktop access to your Raspberry Pi, allowing you to control the graphical interface from another computer. This is particularly valuable for computer vision projects where you need to interact with visual displays and camera feeds remotely.

Enabling VNC Server on Raspberry Pi

The Raspberry Pi OS includes RealVNC Server pre-installed but disabled by default. To enable it:

Using the Desktop Interface:
- Access the Raspberry Pi menu (upper-left corner)
- Navigate to Preferences > Raspberry Pi Configuration
- Select the Interfaces tab
- Enable VNC
- Click OK and reboot when prompted
Using the Command Line:
```
sudo raspi-config
```
- Navigate to Interface Options
- Select VNC
- Choose Yes to enable VNC
- Select Finish and reboot

Configuring VNC Server Settings

After enabling VNC, you may need to configure display settings for optimal remote access:

Set VNC Resolution (for headless operation):
```
sudo raspi-config
```
- Navigate to Advanced Options > Resolution
- Select an appropriate resolution (1920x1080 recommended)
- Finish and reboot

Alternative method using command line:

# Edit the boot configuration
sudo nano /boot/firmware/config.txt

# Add or modify these lines:
hdmi_force_hotplug=1
hdmi_group=2
hdmi_mode=82  # 1920x1080 60Hz

Installing VNC Viewer on Your Computer

To connect to your Raspberry Pi's VNC server, install VNC Viewer on your primary computer:

For Windows:

Download VNC Viewer from realvnc.com/download/viewer
Install the downloaded package
Launch VNC Viewer

For macOS:

Download VNC Viewer from the Mac App Store or RealVNC website
Install and launch the application

For Linux:

# Ubuntu/Debian
sudo apt update
sudo apt install realvnc-vnc-viewer

# Or download from RealVNC website

Connecting to Your Raspberry Pi via VNC

Launch VNC Viewer on your computer
Create a new connection:
- Click the "+" button or "New connection"
- Enter your Raspberry Pi's IP address in the VNC Server field
- Provide a friendly name (e.g., "RaspberryPi-YOLO")
- Click OK
Establish the connection:
- Double-click your saved connection
- Enter your Raspberry Pi username (default: "pi")
- Enter your Raspberry Pi password
- Click OK
You should now see your Raspberry Pi desktop remotely

Optimizing VNC Performance for Computer Vision

When running YOLO and other computer vision applications through VNC, consider these optimization strategies:

Adjust VNC picture quality:
- Right-click on the VNC connection and select Properties
- In the Options tab, set Picture quality to "Medium" or "Low" for better performance
- Enable "Adaptive" if available

Configure VNC server settings on the Pi:

# Access VNC server options
sudo vncserver-x11-serviced

# Or edit VNC configuration
sudo nano /root/.vnc/config.d/vncserver-x11

For bandwidth-limited connections:
- Reduce color depth in VNC Viewer options
- Consider using VNC over SSH tunnel for security and compression

Security Considerations

Change default passwords:

passwd  # Change user password
sudo passwd root  # Change root password (if needed)

Configure VNC authentication:
- VNC uses the same credentials as your Pi user account
- Consider setting up VNC-specific passwords for additional security

Firewall configuration (if needed):

# Allow VNC through firewall (port 5900)
sudo ufw allow 5900

Running Computer Vision Applications Through VNC

When executing YOLO applications through VNC, you can:

View camera feeds remotely: The OpenCV windows will display in your VNC session
Monitor performance metrics: FPS counters and detection results are visible
Interact with applications: Use keyboard shortcuts (like 'q' to quit) through VNC
Debug and develop: Full access to IDE and terminal applications

Example VNC-optimized YOLO script:

import cv2
from picamera2 import Picamera2
from ultralytics import YOLO

# Camera setup
picam2 = Picamera2()
picam2.preview_configuration.main.size = (640, 640)  # Smaller size for VNC
picam2.preview_configuration.main.format = "RGB888"
picam2.preview_configuration.align()
picam2.configure("preview")
picam2.start()

# Model initialization
model = YOLO("yolov8n.pt")

while True:
    frame = picam2.capture_array()
    results = model(frame, imgsz=320)  # Lower resolution for VNC performance
    
    annotated_frame = results[0].plot()
    
    # Resize display window for VNC viewing
    display_frame = cv2.resize(annotated_frame, (640, 640))
    
    cv2.imshow("YOLO Detection - VNC", display_frame)
    
    if cv2.waitKey(1) == ord("q"):
        break

cv2.destroyAllWindows()

Troubleshooting VNC Issues

Cannot connect to VNC:

Verify VNC is enabled: sudo systemctl status vncserver-x11-serviced
Check IP address: hostname -I
Ensure both devices are on the same network
Restart VNC service: sudo systemctl restart vncserver-x11-serviced

Poor VNC performance:

Reduce VNC picture quality settings
Use lower camera resolutions in your applications
Close unnecessary applications on the Pi
Consider using SSH with X11 forwarding for specific applications

Authentication failures:

Verify Pi credentials are correct
Try resetting VNC to defaults: sudo vncserver-x11-serviced -reset

Transferring Files from Windows to Raspberry Pi

File transfer between Windows and Raspberry Pi systems is optimally facilitated using WinSCP, which provides a graphical interface for SSH-based file operations.

Installing WinSCP on Windows

Navigate to ninite.com using a web browser
Locate the "File Sharing" category
Select "WinSCP"
Initiate the download process via "Get Your Ninite"
Execute the installation package
Complete the installation process according to the prompts

Establishing Connection to Raspberry Pi with WinSCP

Launch WinSCP from the Start menu
In the connection configuration dialog:
- Select "SFTP" as the File protocol
- Enter the Raspberry Pi's IPv4 address in the Host name field
- Enter the appropriate username (default: "pi")
- Enter the system password
- Initiate the connection
Accept the host key verification when prompted
The interface will present a dual-pane file management system:
- Left panel: Windows file system
- Right panel: Raspberry Pi file system
File transfer procedure:
- Navigate to the source directory in the appropriate panel
- Navigate to the destination directory in the opposing panel
- Transfer files via drag-and-drop operations between panels

This methodology enables efficient transfer of code files, model data, and auxiliary resources without requiring physical media. The bidirectional capability also facilitates transfer from the Raspberry Pi to the Windows system.

4. Software Setup

Installing Raspberry Pi OS

Utilize the Raspberry Pi Imager utility
Select Raspberry Pi 5 as the target device
Select Raspberry Pi OS (64-bit) as the operating system
Designate the appropriate microSD storage device
Complete the installation process and initiate system boot
Execute the initial configuration procedure and establish network connectivity

For procedural consistency throughout this technical guide, it is recommended to utilize "pi" as the primary username.

Creating a Virtual Environment

The Bookworm OS iteration necessitates the implementation of virtual environments. These environments provide isolated execution contexts for applications without compromising system integrity.

python3 -m venv --system-site-packages yolo_object
source yolo_object/bin/activate

Upon successful environment creation, the environment identifier will appear as a prefix to the terminal prompt. To reactivate this environment in subsequent terminal sessions, execute the source command as previously demonstrated.

Installing Required Packages

Update the system package repository and install the Python package manager:

sudo apt update
sudo apt install python3-pip -y
pip install -U pip

Subsequently, install the Ultralytics package with export capabilities:

pip install ultralytics

This package installation encompasses OpenCV and essential dependencies required for object detection implementation. The installation process may occasionally encounter errors - in such instances, re-execution of the command is recommended.

Upon successful package installation, system reboot is advised:

reboot

Configuring Thonny IDE

Thonny will serve as the integrated development environment for this implementation. Configuration procedure for virtual environment integration:

Launch Thonny (select standard mode if prompted)
Navigate to Run > Configure Interpreter
Access the file selection dialog via the ellipsis button in the Python executable field
Navigate to /home/pi/yolo_object/bin/python3
Select this executable and confirm the selection

This configuration ensures that Thonny executes code within the previously established virtual environment.

5. Running YOLOv8

Create a new Python script in the Thonny IDE and implement the following code:

import cv2
from picamera2 import Picamera2
from ultralytics import YOLO

# Configure camera parameters
picam2 = Picamera2()
picam2.preview_configuration.main.size = (1280, 1280)
picam2.preview_configuration.main.format = "RGB888"
picam2.preview_configuration.align()
picam2.configure("preview")
picam2.start()

# Initialize YOLOv8 model
model = YOLO("yolov8n.pt")

while True:
    # Acquire frame from camera
    frame = picam2.capture_array()
    
    # Execute model inference on the acquired frame
    results = model(frame)
    
    # Render detection visualization
    annotated_frame = results[0].plot()
    
    # Calculate performance metrics
    inference_time = results[0].speed['inference']
    fps = 1000 / inference_time  # Convert to frames per second
    text = f'FPS: {fps:.1f}'

    # Configure text rendering parameters
    font = cv2.FONT_HERSHEY_SIMPLEX
    text_size = cv2.getTextSize(text, font, 1, 2)[0]
    text_x = annotated_frame.shape[1] - text_size[0] - 10  # Position 10 pixels from right margin
    text_y = text_size[1] + 10  # Position 10 pixels from top margin

    # Render performance metrics on frame
    cv2.putText(annotated_frame, text, (text_x, text_y), font, 1, (255, 255, 255), 2, cv2.LINE_AA)

    # Display the processed frame
    cv2.imshow("Camera", annotated_frame)

    # Process termination condition
    if cv2.waitKey(1) == ord("q"):
        break

# Release system resources
cv2.destroyAllWindows()

Upon execution, this code will automatically download the YOLOv8 model and initialize a window displaying the camera feed. The model will perform object detection in real-time, rendering bounding boxes around identified objects with associated confidence metrics.

Initial performance metrics will indicate approximately 1.5 frames per second. Performance optimization strategies will be addressed in subsequent sections.

To terminate the application, press the 'Q' key.

Code Analysis

The implementation consists of several distinct functional components:

Library Integration: Incorporation of OpenCV, Raspberry Pi camera module, and YOLO framework
Camera Configuration: Parameter configuration for the imaging subsystem
Model Initialization: Loading the pre-trained YOLOv8 nano model
Processing Loop: Continuous frame acquisition and analysis pipeline
Object Detection: Neural network inference on each acquired frame
Visualization: Rendering of detection results with bounding boxes and labels
Performance Monitoring: Calculation and display of processing framerate
Termination Control: User-initiated application termination

6. Exploring Different YOLO Models

The YOLOv8 nano model (yolov8n.pt) represents the most lightweight implementation, offering optimal computational performance at the expense of detection accuracy. Model selection can be modified by altering a single parameter in the initialization code:

# Model selection parameter
model = YOLO("yolov8n.pt")

Available model variants for YOLOv8 include:

yolov8n.pt - Nano (optimized for performance, reduced accuracy)
yolov8s.pt - Small (balanced performance characteristics)
yolov8m.pt - Medium (increased detection capability)
yolov8l.pt - Large (high detection accuracy)
yolov8x.pt - Extra Large (maximum accuracy, reduced performance)

Models with larger parameter counts provide enhanced detection accuracy but exhibit reduced processing speed. The extra-large model variant may demonstrate processing rates below 0.1 frames per second initially, but offers superior detection capabilities for distant or partially occluded objects.

Alternative YOLO architecture versions can also be implemented:

# YOLOv5 implementation
model = YOLO("yolov5n.pt")

# YOLOv10 implementation
model = YOLO("yolov10n.pt")

# YOLO11 implementation (latest architecture)
model = YOLO("yolo11n.pt")

Each architecture version exhibits distinct detection characteristics and performance profiles.

7. Optimizing Performance

NCNN Conversion

To achieve significant performance enhancement, the model can be converted to NCNN (Neural Computing on Neural Network) format, which is specifically optimized for ARM processor architectures such as those found in the Raspberry Pi:

Create a model conversion script:

from ultralytics import YOLO

# Initialize YOLOv8n PyTorch model
model = YOLO("yolov8n.pt")

# Export to NCNN format with specified input dimensions
model.export(format="ncnn", imgsz=640)  # generates 'yolov8n_ncnn_model'

Execute this script to generate an NCNN-formatted model. Subsequently, modify the main implementation script to utilize this optimized model:

# Initialize the NCNN-optimized model
model = YOLO("yolov8n_ncnn_model")

This format conversion can yield performance improvements exceeding 400% in frame processing rate.

Resolution Reduction

An additional optimization strategy involves reducing the processing resolution:

In the NCNN conversion script, modify the input dimensions:

model.export(format="ncnn", imgsz=320)  # Must be a multiple of 32 due to architecture requirements

In the main implementation script, specify the consistent resolution parameter:
```
results = model.predict(frame, imgsz=320)
```

Input resolutions in the range of 160-320 pixels typically provide an optimal balance between processing performance and detection accuracy. Implementation considerations include:

Utilizing resolutions that are integer multiples of 32 to align with the neural network architecture
Ensuring parameter consistency between the exported model and implementation script
Acknowledging that reduced resolution diminishes detection range and classification accuracy

8. Controlling Hardware with Detection Results

This section demonstrates the integration of computer vision detection with hardware control systems. The following implementation activates GPIO pin 14 when a human subject or literature object is detected:

import cv2
from picamera2 import Picamera2
from ultralytics import YOLO
from gpiozero import LED

# Camera subsystem initialization
picam2 = Picamera2()
picam2.preview_configuration.main.size = (1280, 1280)
picam2.preview_configuration.main.format = "RGB888"
picam2.preview_configuration.align()
picam2.configure("preview")
picam2.start()

# GPIO output initialization
output = LED(14)

# Model initialization
model = YOLO("yolov8n.pt")

# Target class identifiers
objects_to_detect = [0, 73]  # Class ID 0: person, Class ID 73: book

while True:
    # Frame acquisition
    frame = picam2.capture_array()

    # Execute object detection with optimized resolution
    results = model(frame, imgsz=160)

    # Extract classification results
    detected_objects = results[0].boxes.cls.tolist()

    # Detection state evaluation
    object_found = False
    for obj_id in objects_to_detect:
        if obj_id in detected_objects:
            object_found = True
            print(f"Detected object with ID {obj_id}!")
    
    # Hardware control logic
    if object_found:
        output.on()  # Activate GPIO pin
        print("Pin activated!")
    else:
        output.off()  # Deactivate GPIO pin
        print("Pin deactivated!")
            
    # Visualization presentation
    annotated_frame = results[0].plot()
    cv2.imshow("Object Detection", annotated_frame)

    # Termination condition
    if cv2.waitKey(1) == ord("q"):
        break

# Resource release
cv2.destroyAllWindows()

The objects_to_detect list can be modified to include any class identifiers of interest. To determine the complete class mapping, the following diagnostic code can be executed:

print(results[0].names)

This hardware control methodology is adaptable for integration with servo mechanisms, motor controllers, relay modules, or any GPIO-compatible peripheral devices.

9. YOLO World: Open-Vocabulary Detection

YOLO World represents a significant advancement in object detection technology, enabling text-prompted detection rather than relying solely on pre-defined classification categories. This capability transcends the approximately 88 categories defined in the COCO dataset, allowing for object identification based on textual descriptions.

The following implementation demonstrates YOLO World functionality:

import cv2
from picamera2 import Picamera2
from ultralytics import YOLO

# Set up the camera with Picam
picam2 = Picamera2()
picam2.preview_configuration.main.size = (1280, 1280)
picam2.preview_configuration.main.format = "RGB888"
picam2.preview_configuration.align()
picam2.configure("preview")
picam2.start()

# Load YOLO World model
model = YOLO("yolov8s-world.pt")

# Define custom classes to detect
model.set_classes(["person", "glasses", "water bottle"])

while True:
    # Capture a frame from the camera
    frame = picam2.capture_array()
    
    # Run YOLO model on the captured frame
    results = model(frame, imgsz=640)
    
    # Output the visual detection data
    annotated_frame = results[0].plot()
    
    # Get inference time
    inference_time = results[0].speed['inference']
    fps = 1000 / inference_time
    text = f'FPS: {fps:.1f}'

    # Define font and position
    font = cv2.FONT_HERSHEY_SIMPLEX
    text_size = cv2.getTextSize(text, font, 1, 2)[0]
    text_x = annotated_frame.shape[1] - text_size[0] - 10
    text_y = text_size[1] + 10

    # Draw the text on the annotated frame
    cv2.putText(annotated_frame, text, (text_x, text_y), font, 1, (255, 255, 255), 2, cv2.LINE_AA)

    # Display the resulting frame
    cv2.imshow("Camera", annotated_frame)

    # Exit the program if q is pressed
    if cv2.waitKey(1) == ord("q"):
        break

# Close all windows
cv2.destroyAllWindows()

Notes on YOLO World:

The first run may take several minutes as it downloads ~400MB of additional files
You can specify any objects by changing the set_classes list
You can use descriptive phrases like "yellow toy block" instead of just single words
It will only detect objects you specify in the class list
If you remove the set_classes line, it will try to detect everything it can
Available model sizes are small (s), medium (m), and large (l)
YOLO World cannot be converted to NCNN format
It runs significantly slower than standard YOLOv8

10. Using a USB Webcam

If you prefer using a USB webcam instead of the Pi Camera Module, most of the code should work with minimal changes. However, you might encounter color palette issues. Try this modified code:

import cv2
from picamera2 import Picamera2
from ultralytics import YOLO

# Initialize the Picamera2
picam2 = Picamera2()
picam2.preview_configuration.main.size = (1280, 1280)
picam2.preview_configuration.main.format = "BGR888"  # Change to BGR888
picam2.preview_configuration.align()
picam2.configure("preview")
picam2.start()

# Load the YOLO model
model = YOLO("yolov8s-world.pt")
model.set_classes(["spray bottle"])

while True:
    # Capture frame-by-frame
    frame = picam2.capture_array()
    
    # Convert BGR to RGB
    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    
    # Run YOLOv8 inference on the frame
    results = model(frame_rgb, imgsz=320)
    
    # Visualize the results on the frame
    annotated_frame = results[0].plot()
    
    # Get inference time and display FPS
    inference_time = results[0].speed['inference']
    fps = 1000 / inference_time
    text = f'FPS: {fps:.1f}'
    
    # Define font and position
    font = cv2.FONT_HERSHEY_SIMPLEX
    text_size = cv2.getTextSize(text, font, 1, 2)[0]
    text_x = annotated_frame.shape[1] - text_size[0] - 10
    text_y = text_size[1] + 10
    
    # Draw the text on the annotated frame
    cv2.putText(annotated_frame, text, (text_x, text_y), font, 1, (255, 255, 255), 2, cv2.LINE_AA)
    
    # Display the resulting frame
    cv2.imshow("Camera", annotated_frame)
    
    # Break the loop if 'q' is pressed
    if cv2.waitKey(1) == ord("q"):
        break

# Close windows
cv2.destroyAllWindows()

11. Future Directions

Now that you have YOLO running on your Pi, here are some ideas for extending your project:

Object Tracking: Extract the coordinates of detected objects and use servos to track them
Storage Optimization: For long-running applications, consider using an NVME SSD instead of the SD card
Exploring Other Models: Check out the Ultralytics documentation for other models that can run with the infrastructure we've set up
Custom Training: Train your own YOLOv8 model on a more powerful machine, then export it to run on your Pi
Integrating with Home Automation: Connect your vision system to home automation platforms like Home Assistant
Remote Monitoring: Set up a web server on your Pi to view detection results remotely
Adding Notifications: Configure your system to send alerts when specific objects are detected

12. Troubleshooting Common Issues

Low FPS Performance

Try using smaller model variants (nano or small)
Reduce the input resolution further
Ensure you've converted the model to NCNN format
Close other applications running on the Pi
Make sure your Pi is adequately cooled

Camera Not Working

Check cable connections on both ends
Ensure the camera is enabled in raspi-config
Try running libcamera-hello to test the camera directly
For USB webcams, try different USB ports

Model Download Failures

Check your internet connection
Free up disk space if needed
Try downloading the model manually and placing it in your project directory

Memory Errors

Restart your Raspberry Pi
Use a model with lower memory requirements
Add a swap file to increase virtual memory

VNC Connection Issues

Verify VNC is enabled: sudo systemctl status vncserver-x11-serviced
Check IP address: hostname -I
Ensure both devices are on the same network
Restart VNC service: sudo systemctl restart vncserver-x11-serviced
Try resetting VNC to defaults: sudo vncserver-x11-serviced -reset

13. Conclusion

Computer vision on the Raspberry Pi has come a long way, making advanced object detection accessible to hobbyists and educators. The YOLO family of models provides an excellent balance of performance and accuracy, even on limited hardware.

Whether you're building a smart security system, an interactive robot, or just exploring the capabilities of computer vision, these tools provide a solid foundation for your projects. The addition of VNC server capabilities enables seamless remote development and monitoring, making your Raspberry Pi-based computer vision projects more accessible and manageable.

Remember that working with computer vision often requires experimentation to find the right balance between accuracy and performance for your specific application. Don't be afraid to adjust parameters, try different models, and optimize your setup to suit your needs.