G8: Smart Doorbell - shalan/CSCE4301-WiKi GitHub Wiki
| Name | GitHub |
|---|---|
| Hoda Magdy | Hoda-Magdy |
| Laila Sayed | LAILA102 |
| Yasmina Mahdy | Yasmina-Mahdy |
Github Repo: https://github.com/Yasmina-Mahdy/ESP32-Smart-Doorbell.git
Most commercial smart doorbells are expensive, cloud-dependent, and bloated with features most people never use. A typical Ring or Nest doorbell costs upwards of $100, requires a subscription for any meaningful functionality, and sends your footage to third-party servers. For a family that just wants to know who is at the door, this is overkill. There is a clear gap for a cheap, self-contained, privacy-first alternative.
This project builds a low-cost smart doorbell around the FireBeetle2 ESP32-S3 that runs face detection entirely on-device, no cloud required. When motion is detected or the button is pressed, the system wakes from deep sleep, captures a frame, runs ESP-WHO's face detection pipeline, and uploads the result to a local web dashboard. A passive buzzer rings on button press. All events โ motion, doorbell, and images โ appear in real time on a self-hosted dashboard. Full face recognition was evaluated but scoped down in favor of a stable, end-to-end detection and upload pipeline within the project timeline.
- Use an ESP32-S3 with OV2640 camera as the main embedded device.
- Wake the ESP32-S3 from deep sleep using PIR motion detection.
- Wake the ESP32-S3 from deep sleep using a doorbell push button.
- Ring a passive buzzer when the doorbell button is pressed.
- Connect to WiFi after wakeup.
- Run the ESP-WHO face detection pipeline.
- Capture a frame when a face is detected.
- Upload the detected image to the local server using HTTP POST.
- Publish PIR and doorbell events using MQTT.
- Display uploaded images and event information on the server dashboard.
- Return to deep sleep after the upload/MQTT work is completed.
- Full face recognition with known/unknown identity matching.
- Mobile push notifications.
- Audio recording / push-to-talk.
- Electric door lock relay integration.
A visual representation of the entire system (inputs, processing, outputs, and power supply).

Trigger subsystem โ push button and PIR sensor are both configured as EXT1 wakeup sources. Either source wakes the chip from deep sleep and produces identical downstream behavior.
Camera & capture โ on wakeup, the OV2640 captures frames into a PSRAM framebuffer via the DVP interface. ESP-WHO manages frame capture internally and passes frames to the detection pipeline.
Face detection โ ESP-WHO scans frames continuously for a face. If none found within 30 seconds, the system times out and returns to sleep. If a face is found, the frame is grabbed and forwarded to the upload task.
Audio โ a passive buzzer provides the audible doorbell tone on button press, driven by LEDC PWM.
Web server & dashboard โ runs on the ESP32-S3 serving the latest detected JPEG on HTTP GET. A separate Flask server on the host machine receives image uploads via HTTP POST, subscribes to MQTT events, stores everything in SQLite, and pushes real-time updates to the browser via SocketIO.
Final presentation Section
| Component | GPIO | Purpose |
|---|---|---|
| FireBeetle2 ESP32-S3 | Main board | Main microcontroller |
| OV2640 Camera | Internal DVP camera pins | Captures visitor images |
| PIR Sensor | GPIO 10 | Motion wakeup source |
| Doorbell Push Button | GPIO 12 | Doorbell wakeup and trigger |
| Passive Buzzer | GPIO 3 | Doorbell sound using LEDC PWM |
Circuit diagrams, pinout tables, and breadboard layouts.
| Component | ESP32-S3 Pin | GPIO |
|---|---|---|
| PIR Sensor (signal) | D3 | GPIO 10 |
| Doorbell Button | D5 | GPIO 12 |
| Passive Buzzer | D2 | GPIO 3 |
A table listing component names, part numbers, quantities, costs, and links to datasheets.
| Component | Quantity | Cost |
|---|---|---|
| FireBeetle2 ESP32-S3 with OV2640 Camera | 1 | 2000 EGP |
| PIR Motion Sensor | 1 | From workshop |
| Push Button | 1 | From workshop |
| Passive Buzzer | 1 | From workshop |
Calculations ensuring your power supply can handle the peak current draw of all components combined.
The system runs on a single ESP32-S3 that handles both PIR motion detection and face detection, with firmware written in C++ using ESP-IDF v5.4 and esp-who. On the backend, a Python/Flask + SocketIO server subscribes to MQTT, stores events in SQLite, and serves a real-time dashboard. The full server stack (Mosquitto MQTT broker + Flask) runs via Docker Compose.
The data flow looks like this:
PIR triggered -> MQTT publish -> Flask logs motion event -> Dashboard updates live
Face detected -> HTTP POST JPEG -> Flask saves image + logs event -> Dashboard updates live
OR
Bush Button triggered -> Buzzer rings -> MQTT publish -> Flask logs motion event -> Dashboard updates live
Face detected -> HTTP POST JPEG -> Flask saves image + logs event -> Dashboard updates live
The sleep task runs continuously on Core 0 and is the sole decision-maker for when the chip enters deep sleep. It monitors a FreeRTOS event group where other tasks set bits on completion. The logic covers three cases:
Case 1 โ Face detected, all work done:
If both BIT_IMAGE_UPLOAD_DONE and BIT_MQTT_DONE are set, all pipeline work is complete and the chip sleeps immediately.
Case 2 โ Timer wakeup:
If the chip woke from the 120s backup timer rather than EXT1, no MQTT event was queued. Sleep is triggered on upload completion alone, without waiting for BIT_MQTT_DONE.
Case 3 โ Detection timeout:
If no face is found within 30 seconds, BIT_DETECTION_TIMEOUT is set by the detection task. The sleep task then waits up to 5 seconds for any late upload or MQTT publish that may have just completed, then sleeps regardless.
Before sleeping in all cases, the firmware polls the PIR pin until it goes LOW to prevent an immediate re-wakeup, reconfigures wakeup GPIOs with gpio_sleep_sel_dis() to survive sleep isolation, enables EXT1 wakeup on both the PIR and button pins, and starts a 120-second backup timer as a failsafe.
The detection task tracks elapsed time since boot using xTaskGetTickCount(). Every 100ms (the queue receive timeout), it checks whether 30 seconds have passed. If so, it sets BIT_DETECTION_TIMEOUT in the event group once and stops checking. This approach piggybacks on the queue polling loop โ no dedicated timer task is needed, and the 100ms granularity is precise enough for a 30-second window.
Both the PIR sensor and doorbell button share a single EXT1 wakeup source configured with ESP_EXT1_WAKEUP_ANY_HIGH. On every boot, the sleep task calls esp_sleep_get_ext1_wakeup_status(), which returns a 64-bit bitmask where each bit position corresponds to a GPIO number. The task checks whether bit 12 (doorbell button) is set:
uint64_t pin_mask = esp_sleep_get_ext1_wakeup_status();
if (pin_mask & (1ULL << GPIO_BTN_DOORBELL))
// doorbell wakeup
else
// PIR wakeupThis determines which MQTT event to publish at boot โ MQTT_EVENT_DOORBELL or MQTT_EVENT_PIR.
The doorbell button is handled by a GPIO interrupt service routine (ISR) that fires on every rising edge. Because ISRs run in interrupt context and cannot call blocking FreeRTOS functions, the ISR only does one thing โ it packages a MQTT_EVENT_DOORBELL message and sends it to xQueueMQTT using xQueueSendFromISR(). The MQTT task then dequeues it in normal task context, rings the buzzer, and publishes to the broker. This decouples the time-critical interrupt response from the slower network operations.
A passive buzzer has no internal oscillator and requires a PWM square wave to produce sound. The firmware uses ESP-IDF's LEDC driver to generate two tones for a ding-dong sequence:
- Ding: C6 at 1047 Hz for 300ms
- Silence: 100ms
-
Dong: G5 at 784 Hz for 500ms
The LEDC channel is configured once at boot in
gpio_init_all().buzzer_ring()updates the frequency and duty cycle at runtime without reinitializing the peripheral.
All firmware runs on the ESP32-S3 using ESP-IDF v5.4 as the toolchain and build system. The firmware lives inside the esp-who example directory. Docker Desktop is required to run the backend stack.
Project structure:
ESP32-Smart-Doorbell/
โโโ docker-compose.yml
โโโ mosquitto/
โ โโโ config/
โ โโโ mosquitto.conf
โ โโโ pwfile
โโโ webserver/
โ โโโ app.py
โ โโโ database.py
โ โโโ mqtt_client.py
โ โโโ Dockerfile
โ โโโ requirements.txt
โ โโโ static/style.css
โ โโโ templates/dashboard.html
โโโ esp-who/
โโโ examples/
โโโ human_face_detection/
โโโ terminal/
โโโ main/
โโโ app_main.cpp
โโโ config.h
โโโ shared/
โ โโโ event_groups.h/c
โ โโโ queues.h/c
โ โโโ latest_frame.h/c
โโโ wifi/
โ โโโ wifi_manager.h/c
โโโ camera/
โ โโโ camera_task.h/cpp
โโโ detection/
โ โโโ detection_task.h/cpp
โโโ upload/
โ โโโ image_upload_task.h/c
โโโ sleep/
โ โโโ sleep_task.h/c
โโโ mqtt/
โ โโโ mqtt_task.h/c
โโโ gpio/
โโโ gpio_init.h/c
Running the backend:
docker compose up --buildOpen the dashboard at http://localhost:5000, or from any device on the same network at http://<your-pc-ip>:5000.
Flashing the firmware:
All credentials and configuration live in config.h. Update the following before flashing:
#define WIFI_SSID "your-network"
#define WIFI_PASS "your-password"
#define SERVER_BASE_URL "http://<your-pc-local-ip>:5000"
#define MQTT_BROKER_URI "mqtt://<your-pc-local-ip>:1883"Then flash from the terminal directory:
cd esp-who/examples/human_face_detection/terminal
idf.py set-target esp32s3
idf.py build flash monitorWhen a face is detected, the ESP32-S3 captures a JPEG and POSTs it to the server automatically. Images are saved under uploads/images/ and logged in the dashboard.
The ESP32-S3 uses deep sleep to reduce power consumption when no work is required. According to the ESP32-S3 datasheet, deep sleep can reach around 10 ยตA depending on configuration, while wakeup can take around 200โ500 ms.
The sleep task returns the system to deep sleep in the following cases:
- If a face is detected: wait until both image upload and MQTT publish are complete.
- If no face is detected: wait until the detection timeout occurs, then wait a short grace period.
- If a timer wakeup occurs: image upload completion alone may be enough depending on the wakeup cause.
Before entering deep sleep, the firmware waits for the PIR signal to go LOW to avoid immediate re-wakeup. It then enables EXT1 wakeup on the PIR and button pins and starts a backup timer wakeup.
Each major subsystem was tested in isolation before integration to isolate failure points and verify correct behavior independently.
Goal: Confirm that the ESP-WHO human face detection pipeline correctly identifies faces and produces no false positives under normal indoor lighting.
Method:
- Monitored serial output (
idf.py monitor) while standing in front of the camera at close range (under 1m) and at an angle, confirming detection fired correctly in both cases. - Also left the camera running with no one in frame for several minutes to check for false positives.
AXP313A power-on verification: Before testing the model itself, confirmed the camera rails came up correctly by reading back AXP313A register
0x10over I2C afteraxp313a_power_on()and checking the enable bits matched what was written. Failure here producesesp_camera_init() FAILED. Once rails were verified stable, this error disappeared.
Results: Detection fired reliably for frontal faces at close range. At steep angles or distances beyond ~1m, detection became inconsistent, which is expected given the model is optimized for frontal 240ร240 input. No false positives were observed with an empty frame.
Frame buffer timeout fix: During early testing, cam_hal: Failed to get frame: timeout appeared frequently. Isolated this to a single frame buffer (fb_count = 1) being held by the detection pipeline while the upload task also tried to acquire it. Fix: set fb_count = 2 in camera_config_t. After this change, the timeout error did not appear in 50+ subsequent detection cycles.
JPEG corruption fix: Initial upload images showed horizontal scan-line artifacts. Isolated to fmt2jpg() being called with a hardcoded PIXFORMAT_RGB565 argument that didn't match the actual buffer format in some build configurations. Replaced with frame2jpg(fb, ...), which reads fb->format directly. Corruption disappeared entirely after the change.
- PIR sensor was tested by checking whether motion could wake the ESP32-S3.
- Doorbell button was tested as both an interrupt trigger and EXT1 wakeup source.
- Passive buzzer was tested using LEDC PWM, because a passive buzzer requires a frequency signal and cannot ring properly with only
gpio_set_level(). - Camera capture and ESP-WHO detection were tested using serial monitor output.
- HTTP image upload was tested by checking whether images arrived at the server.
- MQTT publishing was tested by checking broker/server logs.
- Deep sleep was tested by confirming that the device returned to sleep after detection/upload work completed.
Symptom: esp_camera_init() returned ESP_FAIL every time, regardless of pin configuration.
Root cause: The FireBeetle 2 ESP32-S3 uses an AXP313A PMIC to control the camera's power rails (AVDD, DVDD, DOVDD). The esp-who library assumes the camera is always powered; on this board it is not.
Solution: Wrote a custom axp313a_power_on() function that initializes the I2C master bus on GPIO 1/2, writes the enable and voltage registers (0x10, 0x16, 0x17), and waits 500ms for rails to stabilize before calling esp_camera_init(). Added a second I2C bus scan after power-on to confirm the OV2640 became visible at address 0x30.
Symptom: Intermittent frame grab timeouts appearing in serial logs during face detection + upload testing.
Root cause: fb_count = 1 meant the single frame buffer was held by the detection pipeline at the same time the upload task called esp_camera_fb_get().
Solution: Increased fb_count to 2 and set fb_location = CAMERA_FB_IN_PSRAM. With two buffers in the 8MB PSRAM, the pipeline and upload task can each hold a buffer simultaneously. Timeouts did not recur after this change.
Symptom: Images arriving at Flask had visible horizontal bands of noise across the frame.
Root cause: fmt2jpg() was called with a hardcoded PIXFORMAT_RGB565 argument. In some build configurations the actual frame buffer format differed, causing the JPEG encoder to misinterpret the pixel data.
Solution: Replaced fmt2jpg(fb->buf, fb->len, fb->width, fb->height, PIXFORMAT_RGB565, ...) with frame2jpg(fb, 80, &jpeg_buf, &jpeg_len). The latter reads fb->format directly from the frame buffer struct, eliminating any format mismatch. Corruption was completely resolved.
Symptom: Serial monitor was flooded with Stage 1 candidates: 0 lines at every frame, making it impossible to read meaningful debug output.
Root cause: The human_face_detection component logs at ESP_LOG_INFO level by default, printing candidate counts every frame.
Solution: Added esp_log_level_set("human_face_detection", ESP_LOG_NONE) at startup to suppress all output from that component. Meaningful application logs (detection events, upload status, WiFi events) remained visible.
Symptom: PIR sensor and doorbell button read as permanently LOW regardless of physical state. GPIO status logs showed InputEn: 0 on configured pins after camera initialization.
Root cause: The FireBeetle2-S3 camera DVP interface internally uses GPIOs 1, 2, 4, 5, 6, 7, 8, 39, 40, 41, 42, 45, 46, and 48. This is not documented clearly on the board's pinout diagram. After esp_camera_init() runs, the camera driver silently reclaims any of these pins that were previously configured as general purpose GPIOs, overriding their direction and input enable settings.
Resolution: Found the actual camera pin list from the ESP-WHO GitHub source. Moved all peripherals to confirmed-free GPIOs: PIR โ GPIO 10, doorbell button โ GPIO 12, buzzer โ GPIO 3. Confirmed none of these appear in the camera DVP pin list.
Symptom: EXT1 wakeup worked on first boot but failed on every subsequent wakeup. Serial logs showed sleep_gpio: Configure to isolate all GPIO pins in sleep state immediately before deep sleep, and GPIO status showed InputEn: 0 on the PIR pin after re-entry.
Root cause: ESP-IDF's automatic GPIO sleep isolation feature reconfigures all GPIO pins to a safe isolated state right before esp_deep_sleep_start(), overriding any custom configuration. This set InputEn: 0 on the PIR wakeup pin, preventing EXT1 hardware from detecting the HIGH signal.
Resolution: Called gpio_sleep_sel_dis() on both wakeup pins (GPIO 10 and GPIO 12) after every GPIO reconfiguration, including inside enter_deep_sleep() after the pin reset there. This exempts the specified pins from the automatic sleep isolation pass.
Symptom: Intermittent camera stalls and cam_hal: Failed to get frame: timeout errors when attempting to grab frames from a separate task after detection fired.
Root cause: register_camera() spawns its own internal task that exclusively manages esp_camera_fb_get() and esp_camera_fb_return(). Calling esp_camera_fb_get() from our own task in parallel with ESP-WHO's internal task exhausted the frame buffer pool, causing both tasks to deadlock waiting for a free buffer.
Resolution: Removed the manual frame grab loop from the camera task entirely. ESP-WHO now manages all frame capture internally. Our code only calls esp_camera_fb_get() after a detection result arrives via xQueueDetectionResult, at which point ESP-WHO's pipeline has already released the buffer.
Symptom: Linker errors when building โ functions defined in .c files were unresolvable from .cpp files and vice versa, with mangled symbol names in the error output.
Root cause: ESP-WHO headers use C++ (.hpp) and require C++ compilation. Utility modules are plain C. Any .h header included from both .c and .cpp translation units needs extern "C" guards, otherwise the C++ compiler mangles the function names and the linker cannot match them to the C-compiled object files.
Resolution: Added #ifdef __cplusplus extern "C" { #endif guards to every .h file in the project as a consistent rule, regardless of whether a C/C++ boundary was immediately apparent.
Symptom: After adding mqtt to the REQUIRES list in CMakeLists.txt, previously working headers such as esp_camera.h, esp_wifi.h, and who_camera.h became unresolvable at compile time.
Root cause: ESP-IDF's build system uses an implicit dependency resolution chain. Adding an explicit REQUIRES entry for one component disrupted the chain, causing previously implicit transitive dependencies to no longer be pulled in automatically.
Resolution: Switched to listing every required component explicitly in REQUIRES: mqtt, espressif__esp32-camera, esp_wifi, esp_http_server, esp_http_client, esp_driver_gpio, esp_driver_ledc, esp_hw_support, nvs_flash, esp_netif, esp_event, and modules. Once all dependencies were explicit, the build system resolved correctly.
โถ Watch Demo Video
Data showing how well the project met its initial objectives (e.g., "Response time was measured at 12ms, well within our 50ms goal").
Hoda: Camera operations and face detection Laila: Deep sleep and wake up triggers Yasmina: Web server and FreeRTOS
Note: project scope was revised mid-development. Full face recognition (MobileFaceNet embeddings, cosine similarity matching), audio recording, and door lock relay were dropped in favor of a stable end-to-end pipeline covering detection, image upload, MQTT publishing, and deep sleep power management.
- Finalized pipeline design and block diagram for presentation
- Wiki page live with approved proposal and block diagram
- Set up ESP-IDF toolchain and verified it builds for ESP32-S3 target
- Researched ESP-WHO native ESP-IDF integration and OV2640 camera component
- Wrote camera initialization and GPIO config code ready to flash on board arrival
- Flashed and verified OV2640 streams via ESP-IDF camera driver
- Integrated ESP-WHO natively on ESP-IDF
- Configured FreeRTOS task architecture: Core 1 for camera/detection, Core 0 for network/sleep
- Wired and tested doorbell button as GPIO ISR and EXT1 wakeup source
- Wired and tested PIR sensor as EXT1 wakeup source
- Refactored monolithic app_main into modular FreeRTOS multi-task architecture
- Implemented deep sleep lifecycle with EXT1 wakeup on PIR and button
- MQTT publishing for PIR and doorbell events
- HTTP image upload on face detection
- Flask dashboard with real-time SocketIO updates
- Full pipeline stable across extended testing
- Documentation completed
- Final code cleaned and submitted
Links to datasheets, tutorials, academic papers, and course materials used during development.
[File presentation Blue and Mauve Gray Simple Elegant Presentation (2).pdf ](https://www.canva.com/design/DAHG3S0GCkg/jsNfB_iy6L5ZAVUvuSqJMQ/edit)