Backend Development Guide - supriyak2003/eyecontrol GitHub Wiki

Set Up the Environment: Install Required Libraries: Ensure OpenCV, Mediapipe, and PyAutoGUI are installed: pip install opencv-python mediapipe pyautogui Camera Access: Make sure the environment has permissions to access the webcam.
Face Detection & Landmark Processing: Initialize Face Mesh: Use Mediapipe’s FaceMesh with refined landmarks to capture precise facial points. Video Capture: Use OpenCV to capture frames in real-time. Pre-process Frames: Flip frames horizontally for a mirror effect and convert them to RGB for Mediapipe compatibility.
Map Landmarks to Screen Coordinates: Retrieve Screen Size: Use pyautogui.size() to get screen dimensions for accurate mapping. Translate Facial Points: Calculate screen coordinates from facial landmarks by scaling based on screen width and height.
Implement Eye Movement Tracking: Identify Landmark Points: Select specific landmarks around the eye for tracking. Calculate Coordinates: For each frame, get the coordinates of eye landmarks and move the mouse to mapped points on the screen using pyautogui.moveTo().
Add Click Detection (Blink-Based): Set a Blink Threshold: Define a small threshold for vertical distance between eyelid landmarks to detect a blink. Simulate Click: When a blink is detected (eyes close based on threshold), trigger pyautogui.click().
Visualize for Testing: Draw on Frame: Use cv2.circle() to draw circles on detected landmarks for visual feedback and debugging. Display Output: Show each processed frame with overlays using cv2.imshow().
Handle Real-time Execution: Loop Control: Run in a loop, refreshing each frame, and allow for a smooth experience by keeping a minimal delay (cv2.waitKey(1)). Exit Strategy: Ensure the loop can be exited gracefully to release the camera and close windows.
Testing and Optimization: Performance Testing: Check on different lighting conditions and adjust the blink threshold or landmark sensitivity as needed. Error Handling: Handle cases where the face isn’t detected, or landmarks aren’t retrieved, to prevent crashes.