03.05 preset projects - advantech-EdgeAI/edge

3.5Clearance Space Detection

The Clearance Space Detection project is designed to monitor designated areas, such as safety zones or operational clearance spaces, to ensure they remain unobstructed. Using a Vision Language Model (VLM), the system continuously checks if any items are placed within these marked zones for a specified duration. If an obstruction is detected, an alarm is triggered to alert personnel. This project is particularly useful for maintaining safety standards and operational efficiency in environments where clear spaces are critical.

3.5.1Prerequisites

To run this project, load the preset named:

FORBIDDEN_ZONE_ALERT_WEBRTC_ADVAN

Note:

Ensure the Edge Agent is running and accessible via your web browser.
The demonstration video file (Forbidden_zone_advan.mp4) must be located in the /ssd/jetson-containers/data/videos/ directory on your Jetson device.
The reference image for In-Context Learning (ICL) (e.g., forbidden_zone2.png) and its containing folder (forbidden_zone) must be located in the /ssd/jetson-containers/data/images/ directory. The AutoPrompt_ICL node will need the correct path to this image (e.g., /data/images/forbidden_zone/forbidden_zone2.png).

3.5.2Pipeline Overview

^{Figure 3.4 — Pipeline overview}

This project utilizes the following key nodes connected in a pipeline:

VideoSource: Provides the video input (e.g., Forbidden_zone_advan.mp4).
RateLimit: Controls the frame processing rate (e.g., 10 fps) for the VLM.
AutoPrompt_ICL: Formats the input for the VLM. It uses In-Context Learning (ICL) with a reference image of the clear zone and focuses on a specific Region of Interest (ROI) for analysis. It then prompts the VLM about obstructions in the current frame's ROI.
VILA-1.5-13B (loaded via NanoLLM_ICL Node): The Vision Language Model that analyzes the ROI of the image based on the prompt and ICL reference.
VideoOverlay: Displays the VLM's response or alert status directly on the video feed.
VideoOutput: Shows the final video stream, often focused on the defined ROI, with overlays.
One_Step_Alert: Processes the VLM's output to trigger an alarm if the clearance space is obstructed.
PiperTTS module (Preset): A pre-configured set of nodes for text-to-speech voice alerts.

Data Flow: The VideoSource sends frames to RateLimit. The limited frames and the ICL reference image path go to AutoPrompt_ICL, which defines the ROI and poses the question to the VILA-1.5-13B VLM. The VLM's partial text output is sent to VideoOverlay to be shown on the VideoOutput (which also applies the ROI). The VLM's final text output is sent to One_Step_Alert. If One_Step_Alert triggers due to an obstruction, it sends a warning message to the PiperTTS module for an audible alarm.

3.5.3Key Node Configurations

Customization primarily involves the AutoPrompt_ICL for defining the monitored zone (via ROI and ICL image) and the VLM prompt, the VILA-1.5-13B settings, and the One_Step_Alert for alarm conditions.

AutoPrompt_ICL Node Settings:
- Template: <reset>'"' /data/images/forbidden_zone/forbidden_zone2.png "" In the above image, there is a red X-shaped area marked with tape on the ground. In the following image, check if any part of the red X shape is obstructed by an object, even partially. In below image: <image> Can you see the entire X shape pattern? (Ensure the path to your ICL image is correct).
- seq_replace_mode: Set to true.
- Roi: Set to true.
- Roi Coordinates: Set to 0.75, 0.25, 1, 0.73 for the demo, or adjust to match your specific clearance zone within the camera's view. These are normalized coordinates [x_min, y_min, x_max, y_max].
VILA-1.5-13B (NanoLLM_ICL Node) Settings:
- Model Selection: Efficient-Large-Model/VILA-1.5-13B.
- API Selection: MLC.
- Quantization Setting: q4f16_ft (default).
- Chat Template: llava-v1.
- System Prompt: "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions."
- Drop inputs: Set to True.
VideoOutput Node Settings:
- ROI: Set to true.
- ROI Coordinates: Set to the same values as in AutoPrompt_ICL (e.g., 0.75, 0.25, 1, 0.73) to focus the output display on the monitored area.
One_Step_Alert Node Settings:
- Check Time: Default is 5 seconds (timeframe to determine status based on VLM outputs).
- Alert: Set to true.
- Alert Keyword: Set to "no" (since the VLM is asked "Can you see the entire X shape pattern?", a "no" response indicates an obstruction, triggering the alert).
- Normal Keyword: Set to "yes".
- Warning Message Text: "Warning: Stacking things in forbidden zone." (Ensure the period "." is at the end).
- Drop inputs: Set to True.

3.5.4Step-by-Step Running Instructions

Launch the Edge Agent UI in your browser.
Load the FORBIDDEN_ZONE_ALERT_WEBRTC_ADVAN preset:
- Click the "Agent" menu in the top-right corner.
- Select "Load."
- Choose FORBIDDEN_ZONE_ALERT_WEBRTC_ADVAN.json from the list.
The pipeline will appear in the Node Editor.
Verify the VideoSource input path (/data/videos/Forbidden_zone_advan.mp4).
Verify the ICL image path in the AutoPrompt_ICL node's template (e.g., '"'/data/images/forbidden_zone/forbidden_zone2.png'"').
Confirm the Roi Coordinates in both AutoPrompt_ICL and VideoOutput nodes are correctly set for your target area.
The project should start running automatically.
Observe the VideoOutput panel:
- You will see the video playing, likely focused on the ROI.
- The VideoOverlay will display the VLM's response regarding the visibility of the clearance space pattern. If an obstruction is detected, the warning message from One_Step_Alert may also be displayed.
Listen for audio alerts: If the VLM indicates an obstruction (answers "no" to the prompt) consistently for the Check Time in One_Step_Alert, the PiperTTS module will announce the warning: "Warning: Stacking things in forbidden zone."

3.5.5Expected Behavior & Output

Visual Output: The VideoOutput panel will show the video, focused on the defined ROI. Text overlays will indicate the VLM's assessment of the clearance space. For example, if the prompt asks "Can you see the entire X shape pattern?" and it's obstructed, the VLM might respond "No, the red X-shaped area is partially obstructed by a shelf," and the warning message will be displayed.
Audio Output: If the One_Step_Alert node confirms an obstruction based on the VLM's responses (e.g., consistently "no"), the PiperTTS module will voice the configured warning.
Alert Logic: The system triggers an alarm if items are placed in the monitored zone for a specified duration, based on the VLM's interpretation of the scene guided by the ICL image and prompt.

3.5.6Troubleshooting

No Detection / Incorrect VLM Response:
- ICL Image: Ensure the path to the reference image in the AutoPrompt_ICL template is correct and the image accurately represents the clear state of the zone.
- Prompt: Verify the prompt in AutoPrompt_ICL is precise and clearly asks about obstructions in the defined area.
- ROI Coordinates: Double-check that the Roi Coordinates in AutoPrompt_ICL and VideoOutput accurately define the clearance space you want to monitor. Misaligned ROIs can lead to incorrect analysis.
- VLM Settings: Ensure the VILA-1.5-13B model parameters are correctly set.
Alerts Not Triggering or False Alerts:
- Alert Keyword: Confirm the Alert Keyword in One_Step_Alert (e.g., "no") correctly corresponds to the VLM's response when an obstruction is present.
- Check Time: Adjust the Check Time in One_Step_Alert. Too short might cause false alarms; too long might delay necessary alerts.
- Lighting/View Changes: Significant changes in lighting or camera angle might affect VLM performance if not accounted for in the ICL image or prompt.
No Audio Alerts:
- Check the connection from One_Step_Alert to the PiperTTS module.
- Verify system audio.