Backend Development Guide - supriyak2003/eyecontrol GitHub Wiki
Set Up the Environment:
Install Required Libraries: Ensure OpenCV, Mediapipe, and PyAutoGUI are installed:
pip install opencv-python mediapipe pyautogui
Camera Access: Make sure the environment has permissions to access the webcam.
Face Detection & Landmark Processing:
Initialize Face Mesh: Use Mediapipe’s FaceMesh with refined landmarks to capture precise facial points.
Video Capture: Use OpenCV to capture frames in real-time.
Pre-process Frames: Flip frames horizontally for a mirror effect and convert them to RGB for Mediapipe compatibility.
Map Landmarks to Screen Coordinates:
Retrieve Screen Size: Use pyautogui.size() to get screen dimensions for accurate mapping.
Translate Facial Points: Calculate screen coordinates from facial landmarks by scaling based on screen width and height.
Implement Eye Movement Tracking:
Identify Landmark Points: Select specific landmarks around the eye for tracking.
Calculate Coordinates: For each frame, get the coordinates of eye landmarks and move the mouse to mapped points on the screen using pyautogui.moveTo().
Add Click Detection (Blink-Based):
Set a Blink Threshold: Define a small threshold for vertical distance between eyelid landmarks to detect a blink.
Simulate Click: When a blink is detected (eyes close based on threshold), trigger pyautogui.click().
Visualize for Testing:
Draw on Frame: Use cv2.circle() to draw circles on detected landmarks for visual feedback and debugging.
Display Output: Show each processed frame with overlays using cv2.imshow().
Handle Real-time Execution:
Loop Control: Run in a loop, refreshing each frame, and allow for a smooth experience by keeping a minimal delay (cv2.waitKey(1)).
Exit Strategy: Ensure the loop can be exited gracefully to release the camera and close windows.
Testing and Optimization:
Performance Testing: Check on different lighting conditions and adjust the blink threshold or landmark sensitivity as needed.
Error Handling: Handle cases where the face isn’t detected, or landmarks aren’t retrieved, to prevent crashes.