03.04 preset projects - advantech-EdgeAI/edge

3.4Smoke and Fire Detection

The Smoke and Fire Detection project utilizes a Vision Language Model (VLM) to analyze image footage in real-time to determine the presence of smoke or fire. Upon detection of either hazard, the system is configured to trigger an immediate alarm, including visual notifications and voice alerts. This project is crucial for enhancing safety in various environments by providing early warnings for potential fire-related incidents.

3.4.1Prerequisites

To run this project, you can load one of the following presets:

SMOKE_ALERT_WEBRTC_ADVAN (for smoke detection)
FIRE_ALERT_WEBRTC_ADVAN (for fire detection)

Note:

Ensure the Edge Agent is running and accessible via your web browser.
The demonstration video files must be located in the /ssd/jetson-containers/data/videos/ directory on your Jetson device.
- For smoke detection: Smoke_advan.mp4
- For fire detection: Fire_advan.mp4

3.4.2Pipeline Overview

^{Figure 3.2 — Pipeline overview: smoke detection}

^{Figure 3.3 — Pipeline overview: fire detection}

This project utilizes the following key nodes connected in a pipeline:

VideoSource: Provides the video input (e.g., Smoke_advan.mp4 or Fire_advan.mp4).
RateLimit: Controls the frame processing rate (e.g., 15 fps) for the VLM.
AutoPrompt_ICL: Formats the input for the VLM by combining the video frame with a specific question about smoke or fire.
Llama-3-VILA-1.5-8B (loaded via NanoLLM_ICL Node): The Vision Language Model that analyzes the image and prompt to detect smoke or fire.
VideoOverlay: Displays the VLM's response or alert status directly on the video feed.
VideoOutput: Shows the final video stream with overlays.
One_Step_Alert: Processes the VLM's output to trigger an alarm if smoke/fire is confirmed ("yes").
PiperTTS module (Preset): A pre-configured set of nodes for text-to-speech voice alerts.
TextStream: Displays the raw text output from the VLM.

Data Flow: The VideoSource sends frames to RateLimit. The limited frames go to AutoPrompt_ICL, which combines them with a question and sends it to the Llama-3-VILA-1.5-8B VLM. The VLM's partial text output is sent to VideoOverlay to be shown on the VideoOutput. The VLM's final text output is sent to One_Step_Alert (to trigger alarms) and TextStream (for display). If One_Step_Alert triggers, it sends a warning message to the PiperTTS module for an audible alarm.

3.4.3Key Node Configurations

Primary customization for this project involves the AutoPrompt_ICL node for phrasing the detection question, the Llama-3-VILA-1.5-8B (NanoLLM_ICL) node for the VLM settings, and the One_Step_Alert node for alarm conditions.

AutoPrompt_ICL Node Settings:
- Template (for Smoke Detection): <reset><image><image><image>Does the image show any smoke ? Return "Yes" if you see any, or "No" if there is no visible smoke. (The multiple <image> tags are used to provide context frames to the VLM).
- Template (for Fire Detection): <reset><image><image><image>Does the image show any fire ?Return "Yes" if you see any, or "No" if there is no visible fire.
- seq_replace_mode: Set to true.
- Roi & Roi Coordinates: Typically set to false as the whole image is analyzed.
Llama-3-VILA-1.5-8B (NanoLLM_ICL Node) Settings:
- Model Selection: Efficient-Large-Model/Llama-3-VILA-1.5-8B.
- API Selection: MLC (for enhanced inference speed).
- Quantization Setting: q4f16_ft (default).
- Chat Template: llama-3.
- System Prompt: "You are a helpful and friendly Al assistant."
- Drop inputs: Set to True.
One_Step_Alert Node Settings:
- Check Time: Default is 5 seconds (timeframe to determine status based on VLM outputs).
- Alert: Set to true to enable alert functionality.
- Alert Keyword: Set to "yes" (this is the VLM's expected affirmative response if smoke/fire is detected).
- Normal Keyword: Set to "no".
- Warning Message Text (for smoke): "Warning: The smoke is rising." (Ensure the period "." is at the end).
- Warning Message Text (for fire): "Warning: The fire is rising." (Ensure the period "." is at the end).
- Drop inputs: Set to True.

3.4.4Step-by-Step Running Instructions

Launch the Edge Agent UI in your browser.
Load the appropriate preset (SMOKE_ALERT_WEBRTC_ADVAN or FIRE_ALERT_WEBRTC_ADVAN):
- Click the "Agent" menu in the top-right corner.
- Select "Load."
- Choose the desired preset .json file from the list.
The pipeline will appear in the Node Editor.
Ensure the VideoSource node's "Input" parameter is correctly set to the path of the smoke or fire video (e.g., /data/videos/Smoke_advan.mp4).
The project should start running automatically.
Observe the VideoOutput panel:
- You will see the video playing.
- The VideoOverlay will display the VLM's response (e.g., "Yes") or the warning message from One_Step_Alert if smoke/fire is detected.
Listen for audio alerts: If smoke or fire is confirmed by the VLM and One_Step_Alert, the PiperTTS module will announce the warning message.
Check the TextStream node's display (if you open its grid widget) to see the direct textual output from the VLM.

3.4.5Expected Behavior & Output

Visual Output: The VideoOutput panel will show the video. If smoke or fire is detected, text such as "Yes" (from the VLM via VideoOverlay) or the full warning message (e.g., "Warning: The smoke is rising.") will be overlaid on the video.
Audio Output: If the One_Step_Alert node triggers an alarm state (based on the VLM outputting "yes" consistently for the Check Time), the PiperTTS module will voice the configured Warning Message Text.
Textual Output: The TextStream node will display the VLM's direct responses (e.g., "Yes" or "No"). The One_Step_Alert node also outputs a JSON message containing the state and check time.

3.4.6Troubleshooting

No Detection / Incorrect VLM Response:
- Verify the AutoPrompt_ICL template exactly matches the required format and question. Typos can significantly affect VLM performance.
- Ensure the correct Llama-3-VILA-1.5-8B model is selected and its parameters (API, Quantization, Chat Template, System Prompt) are configured as specified.
- Check the TextStream output to see the raw VLM response. If it's not "yes" or "no," the One_Step_Alert node might not trigger correctly.
Alerts Not Triggering:
- Confirm the Alert Keyword in One_Step_Alert is set to "yes" (or whatever the VLM's affirmative response is).
- Check the Check Time in One_Step_Alert. If it's too long, or the VLM responses are intermittent, an alert might not trigger.
- Ensure the Warning Message Text in One_Step_Alert is correctly formatted (ending with a period).
No Audio Alerts:
- Check that the PiperTTS module is correctly connected to the output of the One_Step_Alert node.
- Verify system audio is working and unmuted.
Performance Issues:
- VLMs can be resource-intensive. Monitor system resources (CPU/GPU) in the Edge Agent UI.
- Adjust the RateLimit node if necessary, though 15 fps is given as an example.