G5: Baby Monitoring System - shalan/CSCE4301-WiKi GitHub Wiki

Name GitHub
Aly Elaswad alyelaswad
Mazin Bersy mazinbersy
Omar Ganna omarganna

Github Repo: https://github.com/mazinbersy/Baby-Monitoring-System

1. The Proposal

Elevator Pitch

Caregivers cannot maintain constant presence near a baby, and existing monitors are either too simple or too expensive. They tell you nothing is wrong, but they cannot tell you why or what triggered a concern.

Smart Baby Monitoring System is a self-contained embedded device that monitors a baby across three dimensions simultaneously: sound, motion, and environment. It detects infant crying using FFT-based audio analysis, monitors ambient temperature, and detects prolonged inactivity. When distress is detected, the system first attempts to soothe the baby by playing a lullaby automatically. If crying persists, it escalates to a caregiver push notification with a live video stream.

All of this runs on dedicated embedded hardware, streaming wirelessly over WiFi using HTTP POST requests, with no physical connection required from the caregiver.

Project Objectives & Scope

Minimum Viable Product (MVP)

  • Detect infant crying using FFT-based audio analysis on the MAX9814 microphone
  • Play lullaby automatically via DFPlayer Mini within 30 seconds of cry detection
  • Monitor ambient temperature via LM35 and alert if outside safe range (< 18Β°C or > 30Β°C)
  • Detect prolonged inactivity via HC-SR501 PIR and alert if no motion for > 5 minutes
  • Deliver mobile push notifications via WiFi using HTTP POST on any alert
  • Activate event-triggered live video stream on any alert

Stretch Goals

  • Remote camera toggle β€” turn camera on/off from the app
  • [❌] Two-way audio β€” speak through app, baby hears through speaker

View Proposal Slides

View Final Slides


2. System Architecture

2.1 High-Level Block Diagram

System Block Diagram

Subsystem Breakdown

The system uses two connected boards working together:

  • The STM32L432KC (Nucleo) handles sensing, sound analysis, and playing lullabies.
  • The ESP32-CAM handles Wi-Fi communication and live video streaming.

The STM32 Nucleo runs several FreeRTOS tasks. It continuously records audio from the microphone and analyzes it using an FFT to detect baby crying. If crying is detected for a long enough period, it plays a lullaby using the DFPlayer Mini and sends a CRY alert to the ESP32-CAM.

The Nucleo also:

  • Monitors temperature using the LM35DZ sensor and sends alerts if it becomes too high or too low.
  • Detects movement using the PIR sensor. If no movement is detected for 5 minutes, it sends a NOMOV alert.
  • Includes a sleep-watch mode that tracks motion over time to detect when a sleeping baby wakes up and sends an AWAKE alert.

The ESP32-CAM receives these alerts from the Nucleo and sends them to a Railway-hosted server, which then pushes notifications to the mobile app. It also receives commands from the app, such as enabling or disabling sleep-watch mode. In addition, the ESP32-CAM captures and uploads camera frames to support live video streaming.


3. Hardware Design

Component Selection

Component Photo Role Interface
STM32L432KC (Nucleo-32) β€” Central MCU, FFT processing, sensor fusion, DFPlayer control ADC, UART, GPIO, FreeRTOS
ESP32-CAM (AI-Thinker) WiFi communication, live video streaming UART, WiFi, Camera
Microphone Microphone Analog out to ADC
LM35 Temperature sensing Analog out to ADC
HC-SR501 PIR motion detection Digital GPIO
DFPlayer Mini (MP3-TF-16P) MP3 lullaby playback UART
3W Speaker Audio output for lullabies Direct to DFPlayer Mini

Schematics & Wiring

STM32 Nucleo (Main Controller)

Microphone β€” (Analog Audio, ADC1_IN11)

STM32 Pin Function Connect To
PA6 ADC1_IN11 Mic analog OUT
3.3V Power Mic VCC
GND Ground Mic GND

Temperature Sensor β€” LM35DZ (ADC1_IN5, injected channel)

STM32 Pin Function Connect To
PA0 ADC1_IN5 LM35DZ Vout
3.3V Power LM35DZ +Vs
GND Ground LM35DZ GND

Audio Playback β€” DFPlayer Mini (USART1 @ 9600 baud)

STM32 Pin Function Connect To
PA9 USART1_TX DFPlayer RX
PA10 USART1_RX DFPlayer TX
5V Power DFPlayer VCC
GND Ground DFPlayer GND
β€” β€” DFPlayer SPK1/SPK2 β†’ Speaker

Motion Sensor β€” HC-SR501 PIR (GPIO, PA4)

STM32 Pin Function Connect To
PA4 GPIO_INPUT (PULLDOWN) Sensor OUT
5V Power HC-SR501 VCC
GND Ground Sensor GND

ESP32-CAM Link β€” USART2 (@ 115200 baud)

STM32 Pin Function Connect To
PA2 USART2_TX ESP32-CAM GPIO 13 (RX)
PA3 USART2_RX ESP32-CAM GPIO 14 (TX)
GND Common ground ESP32-CAM GND

ESP32-CAM

UART to STM32 Nucleo (Serial2 @ 115200 baud)

ESP32-CAM Pin Function Connect To
GPIO 13 Serial2 RX Nucleo PA2 (USART2_TX)
GPIO 14 Serial2 TX Nucleo PA3 (USART2_RX)
GND Common ground Nucleo GND

WiFi

No pins. Connects to access point and communicates with the HTTPS server (Railway).


Summary Diagram

STM32 Nucleo
  PA0   ────   LM35DZ Vout           (temperature)
  PA4   ────   HC-SR501 OUT          (PIR motion)
  PA6   ────   Microphone OUT        (audio)
  PA9   ──→    DFPlayer RX           (USART1 TX)
  PA10  ──←    DFPlayer TX           (USART1 RX)
  PA2   ──→    ESP32-CAM GPIO 13     (USART2 TX)
  PA3   ──←    ESP32-CAM GPIO 14     (USART2 RX)

ESP32-CAM
  GPIO 13  ──←  Nucleo PA2           (receives CRY / NOMOV / AWAKE / TEMP_HIGH / TEMP_LOW)
  GPIO 14  ──→  Nucleo PA3           (sends SLEEP_ON / SLEEP_OFF)
  GND      ────  Nucleo GND          (common ground)
  [Camera] ──→  JPEG frames β†’ HTTPS server (Railway)
  [WiFi]   ──→  alerts / mode / status β†’ HTTPS server

Bill of Materials (BOM)

Component Model Cost (EGP)
Microcontroller Board STM32L432KC (Nucleo-32) 750
WiFi + Camera Module ESP32-CAM (AI-Thinker) 350
Microphone Amplifier MAX9814 185
Audio Player DFPlayer Mini 150
Speaker 3W Speaker 160
Temperature Sensor LM35DZ 60
PIR Motion Sensor HC-SR501 70
Total 1725

Power Budget

P = V Γ— I.

3.3 V Rail β€” Nucleo

Component Voltage Typical Current Power
STM32L432KC (80 MHz, ADC DMA + 2Γ— UART) 3.3 V 15 mA 49.5 mW
MAX9814 microphone amplifier 3.3 V 3.5 mA 11.6 mW
LM35DZ temperature sensor 3.3 V 0.1 mA 0.3 mW
Onboard LED LD3 (cry-alert blink, average) 3.3 V 1 mA 3.3 mW
3.3 V Rail Total 19.6 mA 64.7 mW

5 V Rail

Component Voltage Typical Current Power
Nucleo board (ST-LINK + LDO overhead) 5 V 50 mA 250 mW
HC-SR501 PIR motion sensor 5 V 0.1 mA 0.5 mW
DFPlayer Mini + speaker (moderate volume) 5 V 120 mA 600 mW
5 V Rail Total 170 mA 850 mW

ESP32-CAM 3.3 V Rail (external supply)

Component Voltage Typical Current Power
ESP32-CAM (WiFi active + camera streaming) 3.3 V 200 mA 660 mW
ESP32-CAM Rail Total 200 mA 660 mW

4. Software Implementation

4.1 Functional Requirements

  • Cry detection via FFT sampling at 8 kHz; alert triggered after a 30-second majority-vote window confirms sustained crying
  • Lullaby playback initiated automatically via DFPlayer Mini within 30 seconds of cry onset
  • Temperature sampled every 5 seconds via LM35 ADC; alert triggered if temp > 30 Β°C or < 18 Β°C
  • PIR motion sampled continuously; alert triggered if no motion detected for > 5 minutes
  • Event-triggered video stream activated within 5 seconds of any alert
  • Mobile push notification delivered via WiFi using HTTP POST within 5 seconds of any alert

4.2 Software Architecture

The firmware runs on two microcontrollers communicating over USART2/Serial2 at 115 200 baud.

The STM32L432KC (Nucleo) runs six FreeRTOS tasks under CMSIS-RTOS V2:

Task Priority Stack Responsibility
defaultTask Idle 512 B Idle placeholder
AudioCapture Realtime 1 KB Starts ADC DMA + TIM1 trigger at 8 kHz
FFT Normal 2 KB 1024-point Hann-windowed FFT, 30-second majority vote β†’ CRY/QUIET
Alert Above Normal 1 KB Plays DFPlayer lullaby and blinks LD3 on CRY; sends CRY\n over USART2
PIR Low 512 B HC-SR501 polling; NOMOV after 5 min no motion; baby wakeup detection in sleep mode
Temp Low 512 B LM35DZ injected ADC read every 5 s; sends TEMP_HIGH / TEMP_LOW over USART2

AudioQueue passes 1024-sample ADC buffer pointers from AudioCapture to FFT. ResultQueue passes CRY/QUIET results from FFT to Alert. USART2 RX is interrupt-driven, assembling incoming bytes into a command string for the PIR task to consume.

The ESP32-CAM runs a single Arduino loop that reads alert strings from Serial2, dispatches HTTP POST alerts to the Railway server, polls /api/mode every 3 s to forward parent commands to Nucleo, polls /api/status every 2 s to toggle streaming, and uploads JPEG frames when streaming is active.

4.3 Flowcharts & Hardware Diagram

Software Diagram hw

Sensor fusion logic:

Condition Action
Cry detected (30-s window, β‰₯ 60/234 frames score as CRY) Play lullaby via DFPlayer Mini; send CRY\n to ESP32-CAM
No PIR motion > 5 min Send NOMOV\n to ESP32-CAM; enter sleep-watch mode
Motion detected in sleep-watch window (> 8/30 samples over 75 s) Send AWAKE\n to ESP32-CAM; return to awake mode
Temperature out of range Send TEMP_HIGH\n or TEMP_LOW\n to ESP32-CAM
SLEEP_ON / SLEEP_OFF received from ESP32-CAM Switch PIR task between sleep-watch and awake-watch mode

4.4 Key Algorithms

FFT Cry Detection

The MAX9814 analog output is sampled at 8 kHz via ADC DMA triggered by TIM1. Every 1024 samples (~128 ms), a window is applied and an FFT is computed. Five frequency-band energy percentages, spectral centroid, peak frequency, and spectral prominence are extracted and combined into a 100-point score. Frames scoring β‰₯ 75 vote as CRY. Over a 30-second window (234 frames), if β‰₯ 60 frames vote CRY and the audio was not continuously silent, a cry event is raised.

4.5 Development Environment

  • STM32CubeIDE was used for Nucleo peripheral configuration (ADC with DMA, TIM1 at 8 kHz, USART1/2, GPIO) and firmware development using HAL drivers and CMSIS-RTOS V2 (FreeRTOS).
  • Railway was used to host the server.
  • GitHub was used for version control and to host the wiki.

5. Testing, Validation & Debugging

5.1 Unit Testing

Cry detection pipeline

  • ADC DMA sampling verified at 8 kHz by checking buffer fill rate in the FFT task.
  • FFT output verified against known audio inputs to confirm frequency-bin mapping.
  • Score threshold and vote threshold tuned using infant cry recordings to eliminate false triggers from speech and music.

Motion subsystem

  • HC-SR501 output on PA4 verified to go HIGH on movement and LOW after holdoff.
  • No-motion alert confirmed to fire after the configured timeout.
  • Sleep-mode wakeup window confirmed to send AWAKE\n when motion count exceeds threshold.

DFPlayer Mini

  • Binary command frame with checksum verified to start playback on track 1 within 1 s of a cry event.
  • Stop command confirmed to silence output when the FFT window returns QUIET.

USART2 / Serial2 link

  • Interrupt-driven ISR on Nucleo verified to correctly assemble CRY, NOMOV, TEMP_HIGH, and TEMP_LOW strings.
  • ESP32-CAM Serial2 confirmed to receive and dispatch each alert string to the server.
  • SLEEP_ON and SLEEP_OFF commands confirmed to reach Nucleo and flip the baby_sleeping flag.

WiFi and server

  • HTTP POST verified to reach the Railway server and trigger a mobile push notification.
  • Mode poll verified to pick up app changes within one 3-second poll cycle.

5.2 Integration Testing

Ran the complete system with all sensing pipelines active simultaneously. Played infant cry audio near the microphone and confirmed lullaby playback started within the 30-second window closing and the CRY alert appeared on the mobile app. Applied heat near the LM35DZ and confirmed a TEMP_HIGH alert within one 5-second sampling cycle. Blocked the PIR sensor for 5 minutes and confirmed the NOMOV alert sent and the system transitioned to sleep-watch mode. Set sleep mode ON from the app and confirmed SLEEP_ON propagated through the server, ESP32-CAM, and Nucleo within one poll cycle. Simulated movement in sleep mode and confirmed the 75-second wakeup window correctly sent AWAKE and reverted the system to awake mode. Toggled the camera stream from the app and confirmed JPEG frames appeared in the web viewer.

5.3 Challenges & Solutions

Challenge Detail Solution
Architecture pivot ESP32-CAM ADC is unusable while WiFi is active, making concurrent audio sampling and streaming impossible on a single chip Split responsibilities: Nucleo handles all sensing, FFT, and DFPlayer; ESP32-CAM handles camera and WiFi only
Microphone ADC pin conflict PA3 was originally planned for the microphone ADC but is shared with USART2 RX needed for the ESP32-CAM link Moved microphone input to PA6 (ADC1_IN11), freeing PA3 for USART2
FFT threshold calibration Initial score thresholds produced false cry triggers on background speech and TV audio Tuned score weights and vote threshold against infant cry recordings at multiple distances
PIR warm-up false triggers HC-SR501 generates spurious detections for ~60 s after power-on Added a 60-second startup delay in the PIR task before monitoring begins
UART message loss Missing newline terminators caused the ESP32-CAM to buffer incomplete alert strings Added explicit \n to every Nucleo UART transmission; ISR resets the buffer index on each \n
Two-way audio DFPlayer Mini has no audio input path; routing a server audio stream to the speaker had no viable hardware path Dropped from scope

6. Results & Demonstration

6.1 Final Prototype

6.2 Video Demonstration

Demo Video

6.3 Performance Metrics

Metric Target Achieved
Cry detection window 30 s 30 s (234 frames Γ— 128 ms)
Lullaby start after cry event < 30 s < 30 s
Push notification delivery < 5 s from event ~3–4 s over home WiFi
Camera stream activation < 5 s from alert ~4–5 s
Temperature measurement accuracy Β±1 Β°C Β±1 Β°C vs. reference thermometer
No-motion alert trigger 5 min no movement Confirmed
SLEEP_ON propagation (app β†’ Nucleo) < 3 s ~3 s (one poll cycle)

7. Project Management

7.1 Division of Labor

Aly: worked on the FFT baby cry detection and the railway/mobile app hosting Omar: worked on the camera integration and temperature and motion sensors Mazin: worked on the sound player and the alert signaling

7.2 Timeline

Date Milestone Status Date of Completion
Apr 14, 2026 Team formation finalized and submitted βœ… Completed Apr 14, 2026
Apr 15, 2026 Proposal presentation βœ… Completed Apr 15, 2026
Apr 20, 2026 Wiki deployment with proposal and architecture βœ… Completed Apr 20, 2026
Apr 22–25, 2026 Phase 1: Sensor validation β€” MAX9814 ADC, LM35 ADC, PIR GPIO βœ… Completed Apr 25, 2026
Apr 26–29, 2026 Phase 2: Core processing β€” FFT pipeline, DFPlayer playback, ESP32-CAM stream βœ… Completed Apr 29, 2026
Apr 29, 2026 Milestone 3: Progress demo β€” at least one working subsystem βœ… Completed Apr 29, 2026
May 1–5, 2026 Phase 3: Full integration β€” sensor fusion, WiFi alerts, Nucleo–ESP32 link βœ… Completed May 5, 2026
May 6, 2026 Checkpoint B: Integration update on wiki βœ… Completed May 6, 2026
May 8–12, 2026 Phase 4: Stretch goals β€” remote camera toggle βœ… / two-way audio ❌ βœ… Completed May 12, 2026
May 13, 2026 Final demo and presentation βœ… Completed May 13, 2026

8. Appendices & References

8.1 Source Code Repository

GitHub Repo: https://github.com/mazinbersy/Baby-Monitoring-System

8.2 References

  • ESP32-CAM AI-Thinker datasheet
  • MAX9814 datasheet β€” Maxim Integrated
  • DFRobotDFPlayerMini Arduino library
  • HC-SR501 PIR sensor datasheet
  • LM35 datasheet β€” Texas Instruments
  • STM32L432KC datasheet β€” STMicroelectronics
  • https://github.com/Wendy-Nam/IoT-BabyCryDetection
⚠️ **GitHub.com Fallback** ⚠️