G8: Smart Doorbell - shalan/CSCE4301-WiKi GitHub Wiki
# Smart Doorbell
| Name | GitHub |
|---|---|
| Hoda Magdy | Hoda-Magdy |
| Laila Sayed | LAILA102 |
| Yasmina Mahdy | Yasmina-Mahdy |
Github Repo: https://github.com/.....
1. The Proposal
Abstract / Elevator Pitch:
Most commercial smart doorbells are expensive, cloud-dependent, and bloated with features most people never use. A typical Ring or Nest doorbell costs upwards of $100, requires a subscription for any meaningful functionality, and sends your footage to third-party servers. For a family that just wants to know who is at the door and maybe let them in, this is overkill. There is a clear gap for a cheap, self-contained, privacy-first alternative.
This project builds a low-cost smart doorbell around the ESP32-S3 (with a camera) that runs face recognition entirely on-device, no cloud required. When someone presses the button, the system captures an image, detects and crops the face, runs a quantized and lightweight face detection model to generate an embedding, and matches it against a local database of known individuals. The result, a labeled JPEG with a name or [Unknown] tag, is pushed to a local web dashboard along with an optional audio clip captured from the I2S microphone.
The core bet here is that TinyML has matured enough to make this feasible on a microcontroller. INT8 quantization, ESP-WHO's built-in detection pipeline, and careful memory management should let us squeeze a usable face recognition system into the ESP32-S3's constraints. The stretch goals push further: automatic triggering via a PIR sensor, and a relay-controlled door lock that opens automatically for recognized family members.
Project Objectives & Scope:
- Doorbell button press triggers image capture from an external OV2640 camera
- ESP-WHO face detection crops and isolates the face region from the frame
- Quantized face recognition generates a 128-d embedding on-device via TFLite
- Cosine similarity matching against a flash-stored database returns a name or [Unknown]
- Labeled JPEG result pushed to a local web dashboard
- PIR motion sensor to enable detection even without ringing the bell
- Known faces can be added to the database by capturing them on-device and re-flashing
- [Stretch Goal] I2S microphone captures a short audio clip and makes it playable on the dashboard
- [Stretch Goal] Demo on an actual electric door lock
- [Stretch Goal] Send mobile notifications when a new entry is added to the webserver
2. System Architecture
2.1 High-Level Block Diagram:
A visual representation of the entire system (inputs, processing, outputs, and power supply).
Subsystem Breakdown:
Trigger subsystem — push button and PIR sensor both fire GPIO interrupts that wake the pipeline. Either source produces identical downstream behavior.
Camera & capture — on trigger, the OV2640 captures a JPEG frame into a PSRAM framebuffer via the DVP interface and passes it to detection.
Face detection — ESP-WHO scans the frame for a face. If none found, the pipeline halts. If found, the cropped face region is passed forward.
Face recognition — MobileFaceNet generates a 128-dimensional embedding from the cropped face. Cosine similarity matching against the flash-stored database produces a name or [Unknown] label.
Audio — push-to-talk records a voice clip via I2S microphone into PSRAM (stretch goal). A buzzer provides the audible doorbell tone on button press.
Web server & dashboard — runs on Core 0 continuously, independent of the inference pipeline on Core 1. Serves the labeled visitor photo, timestamp, and audio clip to any browser on the local network. Also exposes a face enrollment endpoint for adding new known faces.
3. Hardware Design
Component Selection:
Schematics & Wiring:
Circuit diagrams, pinout tables, and breadboard layouts.
Bill of Materials (BOM):
A table listing component names, part numbers, quantities, costs, and links to datasheets.
Power Budget:
Calculations ensuring your power supply can handle the peak current draw of all components combined.
4. Software Implementation
4.1 Software Architecture:
Description of the firmware design (e.g., Bare-metal Superloop, Interrupt-driven, or RTOS).
4.2 Flowcharts & State Machines:
Visual diagrams mapping out the core logic, state transitions, and interrupt service routines (ISRs).
4.3 Key Algorithms:
Explanations of any complex logic used (e.g., PID control loops, digital filtering, sensor fusion).
4.4 Development Environment:
Compilers, IDEs, and toolchains used (e.g., Keil, PlatformIO, STM32CubeIDE).
5. Testing, Validation & Debugging
5.1 Unit Testing:
How individual hardware components and software functions were tested in isolation.
5.2 Integration Testing:
How the system was tested as a whole.
5.3 Challenges & Solutions:
A log of major bugs, hardware failures, or design flaws you encountered, and the engineering steps you took to solve them.
6. Results & Demonstration
6.1 Final Prototype:
High-quality photos of the completed build.
6.2 Video Demonstration:
A link to a short video showing the system working in real-time under various conditions.
6.3 Performance Metrics:
Data showing how well the project met its initial objectives (e.g., "Response time was measured at 12ms, well within our 50ms goal").
7. Project Management
7.1 Division of Labor:
A clear breakdown of who worked on what (professors usually require this to grade individual contributions).
7.2 Timeline:
Now → Apr 15 (Proposal Presentation)
- Finalize pipeline design and block diagram for presentation
- Wiki page live with approved proposal and block diagram
- Apr 15 → Apr 20 (Checkpoint A: Wiki Setup)
- Board ordered and arriving this window
- Set up ESP-IDF toolchain and verify it builds for ESP32-S3 target
- Research ESP-WHO native ESP-IDF integration and OV2640 camera component
- Begin MobileFaceNet TFLite model conversion offline
- Write camera initialization and GPIO config code in ESP-IDF ready to flash on arrival
Apr 20 → Apr 29 (Milestone 3: Progress Demo)
- Board arrives: immediately flash and verify OV2640 streams via ESP-IDF camera driver
- Continue offline model quantization work
- Integrate ESP-WHO natively on ESP-IDF
- Configure FreeRTOS tasks: Core 0 for web server, Core 1 for camera/inference pipeline
- Wire doorbell button with GPIO interrupt handler
- Wire PIR sensor GPIO interrupt for passive detection Demo target: PIR or button triggers capture → ESP-WHO detects face → cropped JPEG served to dashboard with [Unknown] label
Apr 29 → May 6 (Checkpoint B: Integration)
- TFLite Micro inference arena allocated explicitly in PSRAM
- MobileFaceNet embedding generation running on Core 1
- Cosine similarity matching against flash-stored embedding database
- Name or [Unknown] label appearing on dashboard
- FreeRTOS queues handling data passing between tasks cleanly
- Wiki updated with memory map, task architecture, testing evidence
May 6 → May 13 (Final Demo)
- Full pipeline stable across extended testing in demo environment
- Stretch goals: relay door lock, I2S microphone audio capture and playback on dashboard
- Final code cleaned, commented and submitted
- Wiki completed with full system documentation
8. Appendices & References
8.1 Source Code Repository:
Link to your GitHub/GitLab repo.
8.2 References:
Links to datasheets, tutorials, academic papers, and course materials used during development.