G8: Smart Doorbell - shalan/CSCE4301-WiKi GitHub Wiki

alt# Smart Doorbell

Name GitHub
Hoda Magdy Hoda-Magdy
Laila Sayed LAILA102
Yasmina Mahdy Yasmina-Mahdy

Github Repo: https://github.com/.....

1. The Proposal

Abstract / Elevator Pitch:

Most commercial smart doorbells are expensive, cloud-dependent, and bloated with features most people never use. A typical Ring or Nest doorbell costs upwards of $100, requires a subscription for any meaningful functionality, and sends your footage to third-party servers. For a family that just wants to know who is at the door and maybe let them in, this is overkill. There is a clear gap for a cheap, self-contained, privacy-first alternative.

This project builds a low-cost smart doorbell around the ESP32-S3 (with a camera) that runs face recognition entirely on-device, no cloud required. When someone presses the button, the system captures an image, detects and crops the face, runs a quantized and lightweight face detection model to generate an embedding, and matches it against a local database of known individuals. The result, a labeled JPEG with a name or [Unknown] tag, is pushed to a local web dashboard along with an optional audio clip captured from the I2S microphone.

The core bet here is that TinyML has matured enough to make this feasible on a microcontroller. INT8 quantization, ESP-WHO's built-in detection pipeline, and careful memory management should let us squeeze a usable face recognition system into the ESP32-S3's constraints. The stretch goals push further: automatic triggering via a PIR sensor, and a relay-controlled door lock that opens automatically for recognized family members.

Project Objectives & Scope:

  • Doorbell button press triggers image capture from an external OV2640 camera
  • ESP-WHO face detection crops and isolates the face region from the frame
  • Quantized face recognition generates a 128-d embedding on-device via TFLite
  • Cosine similarity matching against a flash-stored database returns a name or [Unknown]
  • Labeled JPEG result pushed to a local web dashboard
  • PIR motion sensor to enable detection even without ringing the bell
  • Known faces can be added to the database by capturing them on-device and re-flashing
  • [Stretch Goal] I2S microphone captures a short audio clip and makes it playable on the dashboard
  • [Stretch Goal] Demo on an actual electric door lock
  • [Stretch Goal] Send mobile notifications when a new entry is added to the webserver

2. System Architecture

2.1 High-Level Block Diagram:

A visual representation of the entire system (inputs, processing, outputs, and power supply). alt

Subsystem Breakdown:

Trigger subsystem — push button and PIR sensor both fire GPIO interrupts that wake the pipeline. Either source produces identical downstream behavior.

Camera & capture — on trigger, the OV2640 captures a JPEG frame into a PSRAM framebuffer via the DVP interface and passes it to detection.

Face detection — ESP-WHO scans the frame for a face. If none found, the pipeline halts. If found, the cropped face region is passed forward.

Face recognition — MobileFaceNet generates a 128-dimensional embedding from the cropped face. Cosine similarity matching against the flash-stored database produces a name or [Unknown] label.

Audio — push-to-talk records a voice clip via I2S microphone into PSRAM (stretch goal). A buzzer provides the audible doorbell tone on button press.

Web server & dashboard — runs on Core 0 continuously, independent of the inference pipeline on Core 1. Serves the labeled visitor photo, timestamp, and audio clip to any browser on the local network. Also exposes a face enrollment endpoint for adding new known faces.

3. Hardware Design

Component Selection:

Schematics & Wiring:

Circuit diagrams, pinout tables, and breadboard layouts.

Bill of Materials (BOM):

A table listing component names, part numbers, quantities, costs, and links to datasheets.

Power Budget:

Calculations ensuring your power supply can handle the peak current draw of all components combined.

4. Software Implementation

4.1 Software Architecture:

Description of the firmware design (e.g., Bare-metal Superloop, Interrupt-driven, or RTOS).

4.2 Flowcharts & State Machines:

Visual diagrams mapping out the core logic, state transitions, and interrupt service routines (ISRs).

4.3 Key Algorithms:

Explanations of any complex logic used (e.g., PID control loops, digital filtering, sensor fusion).

4.4 Development Environment:

Compilers, IDEs, and toolchains used (e.g., Keil, PlatformIO, STM32CubeIDE).

5. Testing, Validation & Debugging

5.1 Unit Testing:

How individual hardware components and software functions were tested in isolation.

5.2 Integration Testing:

How the system was tested as a whole.

5.3 Challenges & Solutions:

A log of major bugs, hardware failures, or design flaws you encountered, and the engineering steps you took to solve them.

6. Results & Demonstration

6.1 Final Prototype:

High-quality photos of the completed build.

6.2 Video Demonstration:

A link to a short video showing the system working in real-time under various conditions.

6.3 Performance Metrics:

Data showing how well the project met its initial objectives (e.g., "Response time was measured at 12ms, well within our 50ms goal").

7. Project Management

7.1 Division of Labor:

A clear breakdown of who worked on what (professors usually require this to grade individual contributions).

7.2 Timeline:

Now → Apr 15 (Proposal Presentation)

  • Finalize pipeline design and block diagram for presentation
  • Wiki page live with approved proposal and block diagram
  • Apr 15 → Apr 20 (Checkpoint A: Wiki Setup)
  • Board ordered and arriving this window
  • Set up ESP-IDF toolchain and verify it builds for ESP32-S3 target
  • Research ESP-WHO native ESP-IDF integration and OV2640 camera component
  • Begin MobileFaceNet TFLite model conversion offline
  • Write camera initialization and GPIO config code in ESP-IDF ready to flash on arrival

Apr 20 → Apr 29 (Milestone 3: Progress Demo)

  • Board arrives: immediately flash and verify OV2640 streams via ESP-IDF camera driver
  • Continue offline model quantization work
  • Integrate ESP-WHO natively on ESP-IDF
  • Configure FreeRTOS tasks: Core 0 for web server, Core 1 for camera/inference pipeline
  • Wire doorbell button with GPIO interrupt handler
  • Wire PIR sensor GPIO interrupt for passive detection Demo target: PIR or button triggers capture → ESP-WHO detects face → cropped JPEG served to dashboard with [Unknown] label

Apr 29 → May 6 (Checkpoint B: Integration)

  • TFLite Micro inference arena allocated explicitly in PSRAM
  • MobileFaceNet embedding generation running on Core 1
  • Cosine similarity matching against flash-stored embedding database
  • Name or [Unknown] label appearing on dashboard
  • FreeRTOS queues handling data passing between tasks cleanly
  • Wiki updated with memory map, task architecture, testing evidence

May 6 → May 13 (Final Demo)

  • Full pipeline stable across extended testing in demo environment
  • Stretch goals: relay door lock, I2S microphone audio capture and playback on dashboard
  • Final code cleaned, commented and submitted
  • Wiki completed with full system documentation

8. Appendices & References

8.1 Source Code Repository:

Link to your GitHub/GitLab repo.

8.2 References:

Links to datasheets, tutorials, academic papers, and course materials used during development.