Audio‐to‐Haptic Directional Alert Band - shalan/CSCE4301-WiKi GitHub Wiki

Project Title

Audio-to-Haptic Directional Alert Band

Name	GitHub
Ahmed El Dessouky Hafez	https://github.com/AhmedDessouky
Adham Mohamed Hassan	https://github.com/adham-khalil
Andrew Antoine	https://github.com/andrew6255

https://docs.google.com/presentation/d/1b3EKrpBNQJrBMw_ZSv5CK9rjxfpVGrY1/edit?usp=sharing_eil_se_dm&rtpof=true&sd=true&ts=6a11562f

Github Repo: https://github.com/adham-khalil/Audio-to-Haptic-Directional-Alert-Band

1. The Proposal

Abstract / Elevator Pitch:

Individuals with severe hearing impairment often face difficulty detecting and localizing critical environmental sounds such as approaching vehicles, emergency sirens, or car horns. This limitation poses significant safety risks in dynamic environments.

The Audio-to-Haptic Directional Alert Band is a wearable embedded system that enhances situational awareness by converting directional audio cues into intuitive haptic feedback. The system uses multiple microphones positioned around the user to capture sound from different directions. By comparing sound intensity and frequency behavior across these microphones, it estimates the direction of dominant sound sources and conveys this information through localized vibration motors.

The current implementation focuses specifically on detecting emergency siren-like sounds. The system samples audio from three analog microphones, performs frequency analysis using FFT, checks whether the detected dominant frequency falls within a siren frequency range, and then activates the vibration motor corresponding to the microphone receiving the strongest signal.

The system operates in real time using an RTOS-based architecture on the ESP32, enabling concurrent sensing, signal processing, and motor actuation. By integrating embedded sensing, real-time processing, and human-machine interaction, this system provides a practical assistive solution that improves user safety and environmental awareness.

Project Objectives & Scope:

Minimum Viable Product (MVP):

Capture ambient sound using multiple microphones
Estimate direction of sound based on relative intensity across sensors
Generate localized haptic feedback corresponding to detected direction
Vary vibration response based on sound amplitude and detection result
Implement RTOS-based multitasking for system operation
Perform real-time audio frequency analysis using FFT
Detect emergency siren-like sounds using frequency range and sweep behavior

Stretch Goals:

Improve detection of emergency sounds such as ambulance sirens, car horns, and alarms
Integrate Bluetooth Low Energy (BLE) for configuration and monitoring
Implement adaptive sensitivity thresholds
Add battery monitoring and low-power modes
Improve directional accuracy through calibration
Add more microphones for better 360-degree direction detection
Add enclosure and wearable band design for real-life testing

2. System Architecture

2.1 High-Level Block Diagram:

2.2 Detailed Design

The project is built around an ESP32 microcontroller running an ESP-IDF project. We use three analog microphone modules placed in different directions: left, front, and right. These microphones act as the input sensing layer of the system.

We continuously sample the microphone signals using the ESP32 ADC in continuous conversion mode. The sampled audio data is stored in buffers and then separated into three FFT input arrays, one for each microphone channel. After enough samples are collected, we perform FFT analysis on each channel using the ESP-DSP library.

We improved the detection pipeline from a single dominant-frequency approach into a multi-peak frequency analysis approach. Instead of only selecting the loudest frequency bin from each microphone, we now extract the top three strongest distinct frequency peaks from each microphone channel. This makes the system more robust in real environments where constant background sounds, engine noise, or electrical hum may dominate one frequency bin.

The peak extraction starts from frequency bin 1 instead of bin 0. This avoids the DC offset problem caused by the analog microphone baseline voltage. Since the microphones naturally sit around a midpoint voltage, the raw ADC signal contains a large 0 Hz component. Ignoring bin 0 prevents this DC component from being mistaken as the dominant sound.

For each microphone, we store the top three peak magnitudes and their frequency bins using local scratchpad arrays inside the main loop. This keeps the left, front, and right channel data isolated and prevents channel data from interfering with each other during processing.

After extracting the top peaks, we check whether the sound has both a valid frequency range and enough harmonic richness. Real sirens usually contain a main frequency along with smaller related peaks or harmonics. A flat background tone may have one strong frequency but very weak secondary peaks. To handle this, we calculate a harmonic richness ratio:

Harmonic Ratio = Magnitude of 2nd Highest Peak / Magnitude of 1st Highest Peak

If the harmonic ratio is too low, we treat the sound as flat background noise and clear the frequency history. This prevents the system from locking onto stationary noises such as fans, engines, or electrical hums.

We also keep a short frequency history for each microphone channel. A siren is detected only when enough recent frames are inside the siren frequency range and the frequency changes enough across the history window. This allows us to look for siren-like sweeping behavior instead of reacting to a single constant tone.

After siren detection, we select the direction based on the microphone with the strongest valid magnitude. If the left microphone has the strongest valid response, the left vibration motor is activated. If the front microphone has the strongest valid response, the front motor is activated. If the right microphone has the strongest valid response, the right motor is activated.

The haptic feedback layer uses three vibration motors connected through a ULN2003 motor driver board. The ESP32 GPIO pins do not directly power the motors. Instead, they send control signals to the ULN2003 driver inputs, and the driver handles the motor switching. Each motor represents a physical direction.

The motor activation time was changed to 800 ms. This gives a short, noticeable haptic alert without keeping the motor on for too long or making repeated detections feel delayed.

The final wearable band integration was not completed. We tested the current implementation as a functional prototype using the ESP32, microphones, ULN2003 motor driver, and vibration motors connected externally.

Main Processing Flow:

We initialize the ESP32 GPIO pins connected to the ULN2003 motor driver.
We configure ADC continuous mode for three microphone channels.
We collect audio samples from the left, front, and right microphones.
We place samples into separate FFT buffers.
We perform FFT on each microphone signal.
We extract the top three strongest frequency peaks for each microphone.
We skip bin 0 to remove the DC offset effect.
We check whether the peaks match siren-like frequency behavior.
We check the harmonic richness ratio to reject flat background noise.
We update the frequency history only when the sound passes the required checks.
We select the loudest valid microphone channel as the sound direction.
We activate the corresponding vibration motor through the ULN2003 driver.
The selected motor vibrates for 800 ms.
The process repeats continuously in real time.

Current Software Constants:

Parameter	Value
Sample Rate	20,000 Hz
FFT Size	1024 samples
Number of Microphone Channels	3
Number of Extracted Peaks	3 per microphone
Siren Low Frequency	500 Hz
Siren High Frequency	4000 Hz
Siren Magnitude Threshold	70000
Frequency History Size	15 frames
Minimum Siren Sweep	200 Hz
Motor Vibration Duration	800 ms

2.3 Hardware/Software Partitioning

Component	Hardware Responsibility	Software Responsibility
Microphones	Capture surrounding sound as analog voltage signals	Sample signals through ADC continuous mode
ESP32	Main processing unit, ADC input, GPIO output, RTOS execution	Run sampling, FFT, siren detection, direction decision, and motor control
ADC	Convert analog microphone signals into digital samples	Configure channels, read raw sample buffer, split samples by channel
FFT Processing	N/A	Analyze frequency content of each microphone signal
Direction Detection	Physical microphone placement gives directional information	Compare magnitudes and select strongest direction
ULN2003 Motor Driver	Switches motor current using ESP32 control signals	Receives GPIO control signals from ESP32
Vibration Motors	Convert electrical control signal into haptic vibration	Activate selected motor for 800 ms
FreeRTOS	N/A	Manage motor tasks without blocking audio processing
Power System	Supply ESP32, microphones, motor driver, and motors	Future work: monitor battery and optimize power usage
Wearable Band	Intended final physical form	Not completed in current prototype

3. Hardware Design

The hardware design uses the ESP32 as the central microcontroller. Three analog microphones are connected to ADC-capable pins on the ESP32. Each microphone is assigned to one direction around the user.

Microphone Connections:

Direction	ADC Channel	ESP32 GPIO
Left Microphone	ADC_CHANNEL_6	GPIO34
Front Microphone	ADC_CHANNEL_7	GPIO35
Right Microphone	ADC_CHANNEL_4	GPIO32

The microphone modules output analog signals that vary according to the detected sound. These analog signals are sampled by the ESP32 ADC. The microphones were tested as separate directional inputs for left, front, and right sound detection.

Motor Driver Used:

The project uses a ULN2003 motor driver board as the interface between the ESP32 and the vibration motors. This was important because the ESP32 GPIO pins should not directly drive motors that require more current than the microcontroller can safely provide.

The ESP32 sends low-current GPIO control signals to the ULN2003 input pins. The ULN2003 driver then switches the motor outputs. This makes the hardware safer and more reliable than connecting the motors directly to the ESP32 pins.

Vibration Motor Connections:

Direction	ESP32 GPIO	Driver Input	Output Function
Left Motor	GPIO25	ULN2003 input	Activates left vibration motor
Front Motor	GPIO26	ULN2003 input	Activates front vibration motor
Right Motor	GPIO27	ULN2003 input	Activates right vibration motor

Each vibration motor represents a direction. When a sound is detected from the left, the left motor vibrates. When a sound is detected from the front, the front motor vibrates. When a sound is detected from the right, the right motor vibrates.

The motor activation time was updated to 800 ms. This duration was chosen because it gives the user a noticeable haptic alert while keeping the response short enough for repeated detections.

Hardware Notes:

The motors are driven through the ULN2003 motor driver board.
The ESP32 GPIO pins are used only as control signals.
The motors should be powered through the motor driver supply path, not directly from GPIO.
All components must share a common ground.
Microphones should be physically separated to improve directional accuracy.
The current prototype supports three directions: left, front, and right.
The final wearable band integration was not completed.
The project was tested as an external working prototype rather than a fully enclosed wearable device.

Main Hardware Components:

ESP32 development board
3 analog microphone modules
3 vibration motors
ULN2003 motor driver board
Jumper wires
Breadboard or prototype wiring
USB power supply or external supply
Future work: wearable band/enclosure

4. Software Design

The software is implemented as an ESP-IDF C project. The repository includes the main application source file ES_project.c, a main component CMake file, and a top-level ESP-IDF CMake configuration.

The software uses:

ESP-IDF framework
FreeRTOS
ESP32 ADC continuous driver
ESP-DSP FFT functions
GPIO driver for ULN2003 motor driver control
CMake build system

Main Software Modules:

1. Motor Driver Initialization Module

We use motors_init() to configure the three ESP32 GPIO pins connected to the ULN2003 driver inputs as outputs. We also make sure that all driver inputs are set low when the system starts so that no motor is active at startup.

Motor control pins:

Left motor driver input: GPIO25
Front motor driver input: GPIO26
Right motor driver input: GPIO27

2. Motor Control Module

We activate the selected motor through the ULN2003 driver for 800 ms. Each motor activation is handled by a separate FreeRTOS task. This prevents the main FFT and audio detection loop from being blocked while a motor is vibrating.

We also use motor-running flags to prevent the same motor from being triggered repeatedly while it is already active.

3. ADC Sampling Module

We configure the ESP32 ADC in continuous mode. The ADC samples three microphone channels:

Left microphone
Front microphone
Right microphone

The ADC stores samples in a raw buffer. When a conversion frame is complete, a callback function sets a buffer_ready flag. The main loop then reads the ADC buffer and separates the samples into the correct FFT arrays according to their channel.

4. FFT Processing Module

Each microphone channel has its own FFT buffer:

fft_left
fft_front
fft_right

We perform FFT on each buffer using the ESP-DSP library. After FFT processing, we extract the strongest frequency peaks from each microphone signal.

The older version relied on a single dominant frequency. The improved version extracts the top three strongest peaks per channel. This allows us to analyze the relationship between the strongest peak and secondary peaks instead of depending on only one frequency bin.

The frequency for each bin is calculated using:

frequency = bin_index × sample_rate / FFT_size

With the current settings:

frequency = bin_index × 20000 / 1024

5. DC Offset Rejection

We start the peak extraction loop from bin 1 instead of bin 0. This is important because analog microphones have a DC voltage offset. This offset creates a very large FFT magnitude at 0 Hz. If bin 0 is included, the system may incorrectly detect 0 Hz as the strongest frequency.

By skipping bin 0, we ignore the microphone baseline voltage and focus only on real sound frequency content.

6. Multi-Peak Extraction Module

We now find the top three strongest frequency peaks for each microphone. We do this using a small ranking system with three positions.

When a new peak is found:

If it is stronger than the current first peak, the old first peak moves to second, and the old second moves to third.
If it is not stronger than first but stronger than second, it becomes the new second peak.
If it is not stronger than second but stronger than third, it becomes the new third peak.

The peak arrays are local to each processing cycle and separated for each microphone channel. This improves reliability and prevents left, front, and right microphone data from corrupting each other.

7. Harmonic Richness Detection

We check the relationship between the first and second strongest peaks.

The harmonic richness ratio is calculated as:

Harmonic Ratio = second_peak_magnitude / first_peak_magnitude

If this ratio is too small, it means the sound is likely a flat single-frequency tone or stationary background noise. In that case, we clear the frequency history and do not treat the sound as a siren.

This makes the system more robust against:

Constant engine noise
Air conditioner hum
Electrical noise
Single-frequency test tones
Stationary background drones

8. Siren Detection Module

Our siren detection logic checks multiple conditions:

The main detected frequency must be between 500 Hz and 4000 Hz.
The magnitude must be greater than the threshold value of 70000.
The second strongest peak must be strong enough compared to the first peak.
Enough recent frames must be inside the siren range.
The frequency must shift enough across the history window to look like a siren sweep.

If the signal fails the harmonic richness check or frequency range check, we flush the history window to zero. This prevents old valid frames from creating a false detection later.

9. Direction Detection Module

After a siren-like sound is detected, we compare the valid magnitudes of the left, front, and right microphone channels.

Direction logic:

If left magnitude is highest → activate left motor
If front magnitude is highest → activate front motor
If right magnitude is highest → activate right motor
If no clear maximum exists → print direction as unknown

10. Debug Output

We print detected frequency and magnitude information for each microphone channel. This helps us verify that the microphones, FFT pipeline, siren detection logic, and direction decision are working during testing.

Example output format:

Left: 1200.0 Hz (85000) | Front: 980.0 Hz (60000) | Right: 700.0 Hz (50000)

If a siren is detected:

SIREN DETECTED! | Direction: LEFT

5. Integration and Testing

Integration Plan

We integrated the system in stages to reduce debugging complexity.

Stage 1: Microphone ADC Reading

We first connected the microphone modules to the ESP32 ADC pins and verified that the ADC was reading changing values when sound was present.

Stage 2: Multi-Channel Sampling

After individual microphone readings were confirmed, we configured all three microphone channels together using ADC continuous mode. We then separated the raw ADC buffer into left, front, and right sample streams.

Stage 3: FFT Verification

We added FFT processing to identify the frequency content in each microphone channel. We used debug printing to verify that detected frequencies changed when different tones or siren-like sounds were played.

Stage 4: DC Offset Handling

We identified that analog microphones produce a baseline voltage, which appears in the FFT as a large 0 Hz component. This could cause the detection logic to focus on the DC component instead of actual sound frequencies.

To fix this, we changed the peak extraction loop to start from bin 1. This removes the DC offset from the frequency selection process.

Stage 5: Multi-Peak Detection Upgrade

The old system selected only the single strongest frequency. This was not reliable in noisy environments because a steady background sound could dominate the FFT.

We improved the system by extracting the top three strongest frequency peaks for each microphone. This allows the software to check whether the sound contains multiple meaningful peaks rather than one isolated tone.

Stage 6: Harmonic Richness Validation

We added a harmonic ratio check by comparing the second strongest peak to the strongest peak. If the second peak is too weak, we treat the sound as flat background noise and clear the history window.

This improved the robustness of siren detection and reduced false positives from constant background sounds.

Stage 7: ULN2003 Motor Driver Testing

We connected the ULN2003 motor driver board between the ESP32 and the vibration motors. We tested each motor independently by activating GPIO25, GPIO26, and GPIO27 and confirming that the corresponding motor vibrated.

Stage 8: Full System Test

We tested the full system by playing siren-like sounds from different directions and checking that the correct motor vibrated based on the strongest valid microphone response. The motor vibration duration was set to 800 ms.

Stage 9: Wearable Band Integration

The final band integration was planned but not completed. The working prototype remained as a wired hardware setup rather than a fully mounted wearable band.

Testing Methodology

Test	Expected Result
DC offset test	Bin 0 is ignored and does not dominate the detection
Clap or loud sound near one microphone	Corresponding channel magnitude increases
Single steady tone	We reject it if harmonic richness is too low
Constant background hum	We avoid locking onto it as a siren
Play tone below 500 Hz	No siren detection
Play tone above 4000 Hz	No siren detection
Play siren-like sound in valid range	Siren detection message appears
Play siren-like sweeping sound	Frequency history confirms shifting behavior
Play siren-like sound from left	Left motor vibrates for 800 ms
Play siren-like sound from front	Front motor vibrates for 800 ms
Play siren-like sound from right	Right motor vibrates for 800 ms
Keep sound active for several seconds	Motor does not create duplicate overlapping tasks
No sound / normal background noise	No motor activation
Test motor driver inputs	ULN2003 correctly switches the selected motor
Wearable band test	Not completed

Debugging Observations

The ADC continuous driver allowed us to sample repeatedly without manually triggering each conversion.
FFT analysis made the system more selective than simple amplitude-only detection.
Ignoring bin 0 fixed the DC offset problem caused by analog microphone baseline voltage.
Multi-peak extraction made the detection more robust than relying on one dominant frequency.
The harmonic richness ratio helped us reject flat, stationary, single-frequency sounds.
Comparing magnitudes between microphones gave us a simple but functional direction estimation method.
The ULN2003 driver made motor control more reliable than direct GPIO driving.
We placed motor control in separate FreeRTOS tasks because a blocking motor delay inside the main loop would reduce responsiveness.
Motor-running flags were added so the same motor is not triggered again while it is already vibrating.
The 800 ms motor drive time gave a shorter and cleaner haptic response.
Final physical integration into a wearable band was not completed.

6. Results and Evaluation

We successfully demonstrated the main concept of the Audio-to-Haptic Directional Alert Band. The ESP32 collects sound from three analog microphones, performs frequency-domain analysis using FFT, detects siren-like sounds, estimates the direction based on the strongest valid microphone signal, and activates the matching vibration motor through a ULN2003 motor driver board.

We improved the detection system by moving from single dominant-frequency detection to multi-peak analysis. This solved two major issues: the DC offset problem caused by analog microphone baseline voltage and the risk of locking onto stationary background noise.

The project was completed as a working external prototype. However, the final wearable band integration was not completed.

Achieved Features

Feature	Status
ESP32 project setup using ESP-IDF	Completed
Three microphone input channels	Completed
Continuous ADC sampling	Completed
FFT-based frequency analysis	Completed
DC offset rejection by skipping bin 0	Completed
Top-three peak extraction per microphone	Completed
Siren frequency range detection	Completed
Harmonic richness ratio validation	Completed
Frequency sweep/history check	Completed
Direction decision using magnitude comparison	Completed
ULN2003 motor driver integration	Completed
Three vibration motor outputs	Completed
800 ms motor vibration feedback	Completed
FreeRTOS motor task implementation	Completed
Debug serial output	Completed
BLE configuration	Not implemented
Battery monitoring	Not implemented
Low-power optimization	Not implemented
Full wearable band integration	Not completed

System Strengths

Real-time embedded implementation using ESP32 and FreeRTOS.
Uses FFT instead of only raw amplitude, making detection more meaningful.
Rejects the 0 Hz DC offset peak by starting peak extraction from bin 1.
Uses top-three peak extraction instead of relying on one dominant frequency.
Harmonic richness validation helps reject flat background noise.
Frequency history helps detect siren-like sweeping behavior.
Separates sensing, processing, and actuation logically.
Directional haptic feedback is simple and intuitive for the user.
ULN2003 driver improves motor switching reliability.
Motor tasks prevent vibration timing from blocking the main detection loop.
The 800 ms vibration duration provides a clear but short alert.
The design can be expanded to more directions or more advanced sound classification.

System Limitations

Direction detection is based on relative loudness, so accuracy can still be affected by reflections, microphone placement, and background noise.
The system detects siren-like frequency behavior rather than fully classifying all emergency sounds.
Only three directions are supported: left, front, and right.
The current thresholds may need calibration for different environments.
The harmonic ratio threshold may require tuning after more real-world testing.
The prototype does not yet include battery monitoring or low-power operation.
The final wearable band/enclosure was not completed.
The current version is a functional hardware prototype, not a finished wearable product.

Evaluation Summary

The prototype meets the core MVP requirements at the embedded system level. We capture audio, process it in real time, estimate the strongest valid sound direction, and give the user localized haptic feedback through motors driven by a ULN2003 driver board.

The updated signal processing pipeline improves robustness by avoiding the DC offset trap, rejecting flat stationary noise, and using multiple frequency peaks to better identify siren-like acoustic behavior.

The main missing part is the physical wearable band integration. Therefore, the project should be evaluated as a working proof-of-concept prototype rather than a fully completed wearable device.

7. Project Management

7.1 Division of Labor:

Adham Hassan:
- System architecture design
- RTOS implementation and task scheduling
- Signal processing and direction detection
- FFT-based siren detection logic
Ahmed El Dessouky:
- Hardware integration: microphones, motors, and power connections
- Haptic feedback control
- Motor output testing and debugging
- Physical testing of directional behavior
Andrew Antoine:
- ADC continuous sampling configuration
- Prototype validation and wiki documentation
- Signal processing and direction detection
- Motor output testing and debugging

7.2 Timeline:

Date	Milestone	Deliverable
Tue, Apr 14	Team formation	Team submitted.
Wed, Apr 15	Proposal presentation	5 to 7 min in-class presentation of project scope and plan.
Mon, Apr 20	Wiki/page setup	Wiki page live with approved proposal content.
Wed, Apr 29	Progress demo	Microphones reading audio levels and vibration motors responding to detected direction. Presentation + live demo.
Wed, May 14	Integration update	Full system integration: audio sensing, FFT processing, siren detection, direction detection, Wiki updated with testing results and remaining issues.
Wed, May 23	Final demo	Final presentation, full live demo of directional haptic feedback system, complete codebase, polished wiki.

7.3 Risks and Mitigation

Risk	Impact	Mitigation
Microphone readings are noisy	False direction detection or false alerts	Use magnitude thresholding, FFT filtering, and calibration constants
Analog microphone DC offset dominates FFT	System may detect 0 Hz instead of real sound	Skip FFT bin 0 during peak extraction
Background sounds trigger the system	Unwanted motor vibration	Detect only sounds within siren frequency range and require frequency sweep behavior
Stationary background frequency dominates detection	System may lock onto engine noise, fans, or electrical hum	Extract top three peaks and use harmonic richness validation
Single-frequency tone is mistaken for siren	False siren detection	Reject sounds with weak secondary peaks using harmonic ratio check
Motors draw more current than GPIO can safely provide	Possible ESP32 damage or unstable operation	Use the ULN2003 motor driver board instead of direct GPIO motor driving
Motor vibration blocks audio processing	Missed samples or slow response	Use separate FreeRTOS motor tasks instead of blocking the main loop
Same motor triggers repeatedly	Overlapping motor tasks and unstable feedback	Use motor-running flags to prevent duplicate motor tasks
Direction detection is inaccurate in real environments	Wrong motor feedback	Improve physical microphone spacing, add calibration, and test in multiple environments
ADC sampling rate is too low or unstable	Poor FFT frequency resolution	Use ADC continuous mode and maintain a 20 kHz sampling rate
Threshold value does not work in all locations	System may be too sensitive or not sensitive enough	Add adaptive thresholding or user configuration in future versions
Harmonic ratio threshold is not tuned	Real sirens may be missed or background sounds may pass	Test with more siren samples and tune the ratio threshold
Wearable power consumption is high	Short battery life	Add sleep modes, lower-power sampling, and battery monitoring in future work
Limited number of directions	User receives incomplete directional awareness	Add rear microphone and rear vibration motor in future versions
Band integration not completed	Prototype is less practical as a wearable device	Treat current system as proof of concept and complete enclosure/band mounting as future work