Audio‐to‐Haptic Directional Alert Band - shalan/CSCE4301-WiKi GitHub Wiki
Project Title
Audio-to-Haptic Directional Alert Band
| Name | GitHub |
|---|---|
| Ahmed El Dessouky Hafez | https://github.com/AhmedDessouky |
| Adham Mohamed Hassan | https://github.com/adham-khalil |
| Andrew Antoine | https://github.com/andrew6255 |
Github Repo: https://github.com/adham-khalil/Audio-to-Haptic-Directional-Alert-Band
1. The Proposal
Abstract / Elevator Pitch:
Individuals with severe hearing impairment often face difficulty detecting and localizing critical environmental sounds such as approaching vehicles, emergency sirens, or car horns. This limitation poses significant safety risks in dynamic environments.
The Audio-to-Haptic Directional Alert Band is a wearable embedded system that enhances situational awareness by converting directional audio cues into intuitive haptic feedback. The system uses multiple microphones positioned around the user to capture sound from different directions. By comparing sound intensity and frequency behavior across these microphones, it estimates the direction of dominant sound sources and conveys this information through localized vibration motors.
The current implementation focuses specifically on detecting emergency siren-like sounds. The system samples audio from three analog microphones, performs frequency analysis using FFT, checks whether the detected dominant frequency falls within a siren frequency range, and then activates the vibration motor corresponding to the microphone receiving the strongest signal.
The system operates in real time using an RTOS-based architecture on the ESP32, enabling concurrent sensing, signal processing, and motor actuation. By integrating embedded sensing, real-time processing, and human-machine interaction, this system provides a practical assistive solution that improves user safety and environmental awareness.
Project Objectives & Scope:
Minimum Viable Product (MVP):
- Capture ambient sound using multiple microphones
- Estimate direction of sound based on relative intensity across sensors
- Generate localized haptic feedback corresponding to detected direction
- Vary vibration response based on sound amplitude and detection result
- Implement RTOS-based multitasking for system operation
- Perform real-time audio frequency analysis using FFT
- Detect emergency siren-like sounds using frequency range and sweep behavior
Stretch Goals:
- Improve detection of emergency sounds such as ambulance sirens, car horns, and alarms
- Integrate Bluetooth Low Energy (BLE) for configuration and monitoring
- Implement adaptive sensitivity thresholds
- Add battery monitoring and low-power modes
- Improve directional accuracy through calibration
- Add more microphones for better 360-degree direction detection
- Add enclosure and wearable band design for real-life testing
2. System Architecture
2.1 High-Level Block Diagram:
2.2 Detailed Design
The project is built around an ESP32 microcontroller running an ESP-IDF project. We use three analog microphone modules placed in different directions: left, front, and right. These microphones act as the input sensing layer of the system.
We continuously sample the microphone signals using the ESP32 ADC in continuous conversion mode. The sampled audio data is stored in buffers and then separated into three FFT input arrays, one for each microphone channel. After enough samples are collected, we perform FFT analysis on each channel using the ESP-DSP library.
We improved the detection pipeline from a single dominant-frequency approach into a multi-peak frequency analysis approach. Instead of only selecting the loudest frequency bin from each microphone, we now extract the top three strongest distinct frequency peaks from each microphone channel. This makes the system more robust in real environments where constant background sounds, engine noise, or electrical hum may dominate one frequency bin.
The peak extraction starts from frequency bin 1 instead of bin 0. This avoids the DC offset problem caused by the analog microphone baseline voltage. Since the microphones naturally sit around a midpoint voltage, the raw ADC signal contains a large 0 Hz component. Ignoring bin 0 prevents this DC component from being mistaken as the dominant sound.
For each microphone, we store the top three peak magnitudes and their frequency bins using local scratchpad arrays inside the main loop. This keeps the left, front, and right channel data isolated and prevents channel data from interfering with each other during processing.
After extracting the top peaks, we check whether the sound has both a valid frequency range and enough harmonic richness. Real sirens usually contain a main frequency along with smaller related peaks or harmonics. A flat background tone may have one strong frequency but very weak secondary peaks. To handle this, we calculate a harmonic richness ratio:
Harmonic Ratio = Magnitude of 2nd Highest Peak / Magnitude of 1st Highest Peak
If the harmonic ratio is too low, we treat the sound as flat background noise and clear the frequency history. This prevents the system from locking onto stationary noises such as fans, engines, or electrical hums.
We also keep a short frequency history for each microphone channel. A siren is detected only when enough recent frames are inside the siren frequency range and the frequency changes enough across the history window. This allows us to look for siren-like sweeping behavior instead of reacting to a single constant tone.
After siren detection, we select the direction based on the microphone with the strongest valid magnitude. If the left microphone has the strongest valid response, the left vibration motor is activated. If the front microphone has the strongest valid response, the front motor is activated. If the right microphone has the strongest valid response, the right motor is activated.
The haptic feedback layer uses three vibration motors connected through a ULN2003 motor driver board. The ESP32 GPIO pins do not directly power the motors. Instead, they send control signals to the ULN2003 driver inputs, and the driver handles the motor switching. Each motor represents a physical direction.
The motor activation time was changed to 800 ms. This gives a short, noticeable haptic alert without keeping the motor on for too long or making repeated detections feel delayed.
The final wearable band integration was not completed. We tested the current implementation as a functional prototype using the ESP32, microphones, ULN2003 motor driver, and vibration motors connected externally.
Main Processing Flow:
- We initialize the ESP32 GPIO pins connected to the ULN2003 motor driver.
- We configure ADC continuous mode for three microphone channels.
- We collect audio samples from the left, front, and right microphones.
- We place samples into separate FFT buffers.
- We perform FFT on each microphone signal.
- We extract the top three strongest frequency peaks for each microphone.
- We skip bin 0 to remove the DC offset effect.
- We check whether the peaks match siren-like frequency behavior.
- We check the harmonic richness ratio to reject flat background noise.
- We update the frequency history only when the sound passes the required checks.
- We select the loudest valid microphone channel as the sound direction.
- We activate the corresponding vibration motor through the ULN2003 driver.
- The selected motor vibrates for 800 ms.
- The process repeats continuously in real time.
Current Software Constants:
| Parameter | Value |
|---|---|
| Sample Rate | 20,000 Hz |
| FFT Size | 1024 samples |
| Number of Microphone Channels | 3 |
| Number of Extracted Peaks | 3 per microphone |
| Siren Low Frequency | 500 Hz |
| Siren High Frequency | 4000 Hz |
| Siren Magnitude Threshold | 70000 |
| Frequency History Size | 15 frames |
| Minimum Siren Sweep | 200 Hz |
| Motor Vibration Duration | 800 ms |
2.3 Hardware/Software Partitioning
| Component | Hardware Responsibility | Software Responsibility |
|---|---|---|
| Microphones | Capture surrounding sound as analog voltage signals | Sample signals through ADC continuous mode |
| ESP32 | Main processing unit, ADC input, GPIO output, RTOS execution | Run sampling, FFT, siren detection, direction decision, and motor control |
| ADC | Convert analog microphone signals into digital samples | Configure channels, read raw sample buffer, split samples by channel |
| FFT Processing | N/A | Analyze frequency content of each microphone signal |
| Direction Detection | Physical microphone placement gives directional information | Compare magnitudes and select strongest direction |
| ULN2003 Motor Driver | Switches motor current using ESP32 control signals | Receives GPIO control signals from ESP32 |
| Vibration Motors | Convert electrical control signal into haptic vibration | Activate selected motor for 800 ms |
| FreeRTOS | N/A | Manage motor tasks without blocking audio processing |
| Power System | Supply ESP32, microphones, motor driver, and motors | Future work: monitor battery and optimize power usage |
| Wearable Band | Intended final physical form | Not completed in current prototype |
3. Hardware Design
The hardware design uses the ESP32 as the central microcontroller. Three analog microphones are connected to ADC-capable pins on the ESP32. Each microphone is assigned to one direction around the user.
Microphone Connections:
| Direction | ADC Channel | ESP32 GPIO |
|---|---|---|
| Left Microphone | ADC_CHANNEL_6 | GPIO34 |
| Front Microphone | ADC_CHANNEL_7 | GPIO35 |
| Right Microphone | ADC_CHANNEL_4 | GPIO32 |
The microphone modules output analog signals that vary according to the detected sound. These analog signals are sampled by the ESP32 ADC. The microphones were tested as separate directional inputs for left, front, and right sound detection.
Motor Driver Used:
The project uses a ULN2003 motor driver board as the interface between the ESP32 and the vibration motors. This was important because the ESP32 GPIO pins should not directly drive motors that require more current than the microcontroller can safely provide.
The ESP32 sends low-current GPIO control signals to the ULN2003 input pins. The ULN2003 driver then switches the motor outputs. This makes the hardware safer and more reliable than connecting the motors directly to the ESP32 pins.
Vibration Motor Connections:
| Direction | ESP32 GPIO | Driver Input | Output Function |
|---|---|---|---|
| Left Motor | GPIO25 | ULN2003 input | Activates left vibration motor |
| Front Motor | GPIO26 | ULN2003 input | Activates front vibration motor |
| Right Motor | GPIO27 | ULN2003 input | Activates right vibration motor |
Each vibration motor represents a direction. When a sound is detected from the left, the left motor vibrates. When a sound is detected from the front, the front motor vibrates. When a sound is detected from the right, the right motor vibrates.
The motor activation time was updated to 800 ms. This duration was chosen because it gives the user a noticeable haptic alert while keeping the response short enough for repeated detections.
Hardware Notes:
- The motors are driven through the ULN2003 motor driver board.
- The ESP32 GPIO pins are used only as control signals.
- The motors should be powered through the motor driver supply path, not directly from GPIO.
- All components must share a common ground.
- Microphones should be physically separated to improve directional accuracy.
- The current prototype supports three directions: left, front, and right.
- The final wearable band integration was not completed.
- The project was tested as an external working prototype rather than a fully enclosed wearable device.
Main Hardware Components:
- ESP32 development board
- 3 analog microphone modules
- 3 vibration motors
- ULN2003 motor driver board
- Jumper wires
- Breadboard or prototype wiring
- USB power supply or external supply
- Future work: wearable band/enclosure
4. Software Design
The software is implemented as an ESP-IDF C project. The repository includes the main application source file ES_project.c, a main component CMake file, and a top-level ESP-IDF CMake configuration.
The software uses:
- ESP-IDF framework
- FreeRTOS
- ESP32 ADC continuous driver
- ESP-DSP FFT functions
- GPIO driver for ULN2003 motor driver control
- CMake build system
Main Software Modules:
1. Motor Driver Initialization Module
We use motors_init() to configure the three ESP32 GPIO pins connected to the ULN2003 driver inputs as outputs. We also make sure that all driver inputs are set low when the system starts so that no motor is active at startup.
Motor control pins:
- Left motor driver input: GPIO25
- Front motor driver input: GPIO26
- Right motor driver input: GPIO27
2. Motor Control Module
We activate the selected motor through the ULN2003 driver for 800 ms. Each motor activation is handled by a separate FreeRTOS task. This prevents the main FFT and audio detection loop from being blocked while a motor is vibrating.
We also use motor-running flags to prevent the same motor from being triggered repeatedly while it is already active.
3. ADC Sampling Module
We configure the ESP32 ADC in continuous mode. The ADC samples three microphone channels:
- Left microphone
- Front microphone
- Right microphone
The ADC stores samples in a raw buffer. When a conversion frame is complete, a callback function sets a buffer_ready flag. The main loop then reads the ADC buffer and separates the samples into the correct FFT arrays according to their channel.
4. FFT Processing Module
Each microphone channel has its own FFT buffer:
fft_leftfft_frontfft_right
We perform FFT on each buffer using the ESP-DSP library. After FFT processing, we extract the strongest frequency peaks from each microphone signal.
The older version relied on a single dominant frequency. The improved version extracts the top three strongest peaks per channel. This allows us to analyze the relationship between the strongest peak and secondary peaks instead of depending on only one frequency bin.
The frequency for each bin is calculated using:
frequency = bin_index × sample_rate / FFT_size
With the current settings:
frequency = bin_index × 20000 / 1024
5. DC Offset Rejection
We start the peak extraction loop from bin 1 instead of bin 0. This is important because analog microphones have a DC voltage offset. This offset creates a very large FFT magnitude at 0 Hz. If bin 0 is included, the system may incorrectly detect 0 Hz as the strongest frequency.
By skipping bin 0, we ignore the microphone baseline voltage and focus only on real sound frequency content.
6. Multi-Peak Extraction Module
We now find the top three strongest frequency peaks for each microphone. We do this using a small ranking system with three positions.
When a new peak is found:
- If it is stronger than the current first peak, the old first peak moves to second, and the old second moves to third.
- If it is not stronger than first but stronger than second, it becomes the new second peak.
- If it is not stronger than second but stronger than third, it becomes the new third peak.
The peak arrays are local to each processing cycle and separated for each microphone channel. This improves reliability and prevents left, front, and right microphone data from corrupting each other.
7. Harmonic Richness Detection
We check the relationship between the first and second strongest peaks.
The harmonic richness ratio is calculated as:
Harmonic Ratio = second_peak_magnitude / first_peak_magnitude
If this ratio is too small, it means the sound is likely a flat single-frequency tone or stationary background noise. In that case, we clear the frequency history and do not treat the sound as a siren.
This makes the system more robust against:
- Constant engine noise
- Air conditioner hum
- Electrical noise
- Single-frequency test tones
- Stationary background drones
8. Siren Detection Module
Our siren detection logic checks multiple conditions:
- The main detected frequency must be between 500 Hz and 4000 Hz.
- The magnitude must be greater than the threshold value of 70000.
- The second strongest peak must be strong enough compared to the first peak.
- Enough recent frames must be inside the siren range.
- The frequency must shift enough across the history window to look like a siren sweep.
If the signal fails the harmonic richness check or frequency range check, we flush the history window to zero. This prevents old valid frames from creating a false detection later.
9. Direction Detection Module
After a siren-like sound is detected, we compare the valid magnitudes of the left, front, and right microphone channels.
Direction logic:
- If left magnitude is highest → activate left motor
- If front magnitude is highest → activate front motor
- If right magnitude is highest → activate right motor
- If no clear maximum exists → print direction as unknown
10. Debug Output
We print detected frequency and magnitude information for each microphone channel. This helps us verify that the microphones, FFT pipeline, siren detection logic, and direction decision are working during testing.
Example output format:
Left: 1200.0 Hz (85000) | Front: 980.0 Hz (60000) | Right: 700.0 Hz (50000)
If a siren is detected:
SIREN DETECTED! | Direction: LEFT
5. Integration and Testing
Integration Plan
We integrated the system in stages to reduce debugging complexity.
Stage 1: Microphone ADC Reading
We first connected the microphone modules to the ESP32 ADC pins and verified that the ADC was reading changing values when sound was present.
Stage 2: Multi-Channel Sampling
After individual microphone readings were confirmed, we configured all three microphone channels together using ADC continuous mode. We then separated the raw ADC buffer into left, front, and right sample streams.
Stage 3: FFT Verification
We added FFT processing to identify the frequency content in each microphone channel. We used debug printing to verify that detected frequencies changed when different tones or siren-like sounds were played.
Stage 4: DC Offset Handling
We identified that analog microphones produce a baseline voltage, which appears in the FFT as a large 0 Hz component. This could cause the detection logic to focus on the DC component instead of actual sound frequencies.
To fix this, we changed the peak extraction loop to start from bin 1. This removes the DC offset from the frequency selection process.
Stage 5: Multi-Peak Detection Upgrade
The old system selected only the single strongest frequency. This was not reliable in noisy environments because a steady background sound could dominate the FFT.
We improved the system by extracting the top three strongest frequency peaks for each microphone. This allows the software to check whether the sound contains multiple meaningful peaks rather than one isolated tone.
Stage 6: Harmonic Richness Validation
We added a harmonic ratio check by comparing the second strongest peak to the strongest peak. If the second peak is too weak, we treat the sound as flat background noise and clear the history window.
This improved the robustness of siren detection and reduced false positives from constant background sounds.
Stage 7: ULN2003 Motor Driver Testing
We connected the ULN2003 motor driver board between the ESP32 and the vibration motors. We tested each motor independently by activating GPIO25, GPIO26, and GPIO27 and confirming that the corresponding motor vibrated.
Stage 8: Full System Test
We tested the full system by playing siren-like sounds from different directions and checking that the correct motor vibrated based on the strongest valid microphone response. The motor vibration duration was set to 800 ms.
Stage 9: Wearable Band Integration
The final band integration was planned but not completed. The working prototype remained as a wired hardware setup rather than a fully mounted wearable band.
Testing Methodology
| Test | Expected Result |
|---|---|
| DC offset test | Bin 0 is ignored and does not dominate the detection |
| Clap or loud sound near one microphone | Corresponding channel magnitude increases |
| Single steady tone | We reject it if harmonic richness is too low |
| Constant background hum | We avoid locking onto it as a siren |
| Play tone below 500 Hz | No siren detection |
| Play tone above 4000 Hz | No siren detection |
| Play siren-like sound in valid range | Siren detection message appears |
| Play siren-like sweeping sound | Frequency history confirms shifting behavior |
| Play siren-like sound from left | Left motor vibrates for 800 ms |
| Play siren-like sound from front | Front motor vibrates for 800 ms |
| Play siren-like sound from right | Right motor vibrates for 800 ms |
| Keep sound active for several seconds | Motor does not create duplicate overlapping tasks |
| No sound / normal background noise | No motor activation |
| Test motor driver inputs | ULN2003 correctly switches the selected motor |
| Wearable band test | Not completed |
Debugging Observations
- The ADC continuous driver allowed us to sample repeatedly without manually triggering each conversion.
- FFT analysis made the system more selective than simple amplitude-only detection.
- Ignoring bin 0 fixed the DC offset problem caused by analog microphone baseline voltage.
- Multi-peak extraction made the detection more robust than relying on one dominant frequency.
- The harmonic richness ratio helped us reject flat, stationary, single-frequency sounds.
- Comparing magnitudes between microphones gave us a simple but functional direction estimation method.
- The ULN2003 driver made motor control more reliable than direct GPIO driving.
- We placed motor control in separate FreeRTOS tasks because a blocking motor delay inside the main loop would reduce responsiveness.
- Motor-running flags were added so the same motor is not triggered again while it is already vibrating.
- The 800 ms motor drive time gave a shorter and cleaner haptic response.
- Final physical integration into a wearable band was not completed.
6. Results and Evaluation
We successfully demonstrated the main concept of the Audio-to-Haptic Directional Alert Band. The ESP32 collects sound from three analog microphones, performs frequency-domain analysis using FFT, detects siren-like sounds, estimates the direction based on the strongest valid microphone signal, and activates the matching vibration motor through a ULN2003 motor driver board.
We improved the detection system by moving from single dominant-frequency detection to multi-peak analysis. This solved two major issues: the DC offset problem caused by analog microphone baseline voltage and the risk of locking onto stationary background noise.
The project was completed as a working external prototype. However, the final wearable band integration was not completed.
Achieved Features
| Feature | Status |
|---|---|
| ESP32 project setup using ESP-IDF | Completed |
| Three microphone input channels | Completed |
| Continuous ADC sampling | Completed |
| FFT-based frequency analysis | Completed |
| DC offset rejection by skipping bin 0 | Completed |
| Top-three peak extraction per microphone | Completed |
| Siren frequency range detection | Completed |
| Harmonic richness ratio validation | Completed |
| Frequency sweep/history check | Completed |
| Direction decision using magnitude comparison | Completed |
| ULN2003 motor driver integration | Completed |
| Three vibration motor outputs | Completed |
| 800 ms motor vibration feedback | Completed |
| FreeRTOS motor task implementation | Completed |
| Debug serial output | Completed |
| BLE configuration | Not implemented |
| Battery monitoring | Not implemented |
| Low-power optimization | Not implemented |
| Full wearable band integration | Not completed |
System Strengths
- Real-time embedded implementation using ESP32 and FreeRTOS.
- Uses FFT instead of only raw amplitude, making detection more meaningful.
- Rejects the 0 Hz DC offset peak by starting peak extraction from bin 1.
- Uses top-three peak extraction instead of relying on one dominant frequency.
- Harmonic richness validation helps reject flat background noise.
- Frequency history helps detect siren-like sweeping behavior.
- Separates sensing, processing, and actuation logically.
- Directional haptic feedback is simple and intuitive for the user.
- ULN2003 driver improves motor switching reliability.
- Motor tasks prevent vibration timing from blocking the main detection loop.
- The 800 ms vibration duration provides a clear but short alert.
- The design can be expanded to more directions or more advanced sound classification.
System Limitations
- Direction detection is based on relative loudness, so accuracy can still be affected by reflections, microphone placement, and background noise.
- The system detects siren-like frequency behavior rather than fully classifying all emergency sounds.
- Only three directions are supported: left, front, and right.
- The current thresholds may need calibration for different environments.
- The harmonic ratio threshold may require tuning after more real-world testing.
- The prototype does not yet include battery monitoring or low-power operation.
- The final wearable band/enclosure was not completed.
- The current version is a functional hardware prototype, not a finished wearable product.
Evaluation Summary
The prototype meets the core MVP requirements at the embedded system level. We capture audio, process it in real time, estimate the strongest valid sound direction, and give the user localized haptic feedback through motors driven by a ULN2003 driver board.
The updated signal processing pipeline improves robustness by avoiding the DC offset trap, rejecting flat stationary noise, and using multiple frequency peaks to better identify siren-like acoustic behavior.
The main missing part is the physical wearable band integration. Therefore, the project should be evaluated as a working proof-of-concept prototype rather than a fully completed wearable device.
7. Project Management
7.1 Division of Labor:
-
Adham Hassan:
- System architecture design
- RTOS implementation and task scheduling
- Signal processing and direction detection
- FFT-based siren detection logic
-
Ahmed El Dessouky:
- Hardware integration: microphones, motors, and power connections
- Haptic feedback control
- Motor output testing and debugging
- Physical testing of directional behavior
-
Andrew Antoine:
- ADC continuous sampling configuration
- Prototype validation and wiki documentation
- Signal processing and direction detection
- Motor output testing and debugging
7.2 Timeline:
| Date | Milestone | Deliverable |
|---|---|---|
| Tue, Apr 14 | Team formation | Team submitted. |
| Wed, Apr 15 | Proposal presentation | 5 to 7 min in-class presentation of project scope and plan. |
| Mon, Apr 20 | Wiki/page setup | Wiki page live with approved proposal content. |
| Wed, Apr 29 | Progress demo | Microphones reading audio levels and vibration motors responding to detected direction. Presentation + live demo. |
| Wed, May 14 | Integration update | Full system integration: audio sensing, FFT processing, siren detection, direction detection, Wiki updated with testing results and remaining issues. |
| Wed, May 23 | Final demo | Final presentation, full live demo of directional haptic feedback system, complete codebase, polished wiki. |
7.3 Risks and Mitigation
| Risk | Impact | Mitigation |
|---|---|---|
| Microphone readings are noisy | False direction detection or false alerts | Use magnitude thresholding, FFT filtering, and calibration constants |
| Analog microphone DC offset dominates FFT | System may detect 0 Hz instead of real sound | Skip FFT bin 0 during peak extraction |
| Background sounds trigger the system | Unwanted motor vibration | Detect only sounds within siren frequency range and require frequency sweep behavior |
| Stationary background frequency dominates detection | System may lock onto engine noise, fans, or electrical hum | Extract top three peaks and use harmonic richness validation |
| Single-frequency tone is mistaken for siren | False siren detection | Reject sounds with weak secondary peaks using harmonic ratio check |
| Motors draw more current than GPIO can safely provide | Possible ESP32 damage or unstable operation | Use the ULN2003 motor driver board instead of direct GPIO motor driving |
| Motor vibration blocks audio processing | Missed samples or slow response | Use separate FreeRTOS motor tasks instead of blocking the main loop |
| Same motor triggers repeatedly | Overlapping motor tasks and unstable feedback | Use motor-running flags to prevent duplicate motor tasks |
| Direction detection is inaccurate in real environments | Wrong motor feedback | Improve physical microphone spacing, add calibration, and test in multiple environments |
| ADC sampling rate is too low or unstable | Poor FFT frequency resolution | Use ADC continuous mode and maintain a 20 kHz sampling rate |
| Threshold value does not work in all locations | System may be too sensitive or not sensitive enough | Add adaptive thresholding or user configuration in future versions |
| Harmonic ratio threshold is not tuned | Real sirens may be missed or background sounds may pass | Test with more siren samples and tune the ratio threshold |
| Wearable power consumption is high | Short battery life | Add sleep modes, lower-power sampling, and battery monitoring in future work |
| Limited number of directions | User receives incomplete directional awareness | Add rear microphone and rear vibration motor in future versions |
| Band integration not completed | Prototype is less practical as a wearable device | Treat current system as proof of concept and complete enclosure/band mounting as future work |