Single Face: How Face Tracking Works - iVideoGameBoss/iRoopDeepFaceCam GitHub Wiki

Deep Dive: How Face Tracking Works (Single Face) - Advanced

This page provides an in-depth explanation of how the _process_face_tracking_single function works. We will delve into the algorithms and logic behind this process, including how it handles face embeddings, position tracking, and how it determines if a face in the current frame is the same face it was tracking in previous frames.

Core Goal: Continuous Face Swapping

The core function of _process_face_tracking_single is to ensure that when a video frame is processed, the face swap occurs on the same face throughout the video. This involves accurately tracking the desired face and using this identified face for the swap. This process is not straightforward since the faces are constantly in motion, rotating, scaling, and can even disappear from the camera for a few frames.

Detailed Breakdown of Variables and Their Roles

Let's review the variables in more detail:

first_face_embedding (numpy.ndarray, Optional):
- This variable holds the face embedding for the face we are tracking. A face embedding is a high-dimensional vector (a long list of numbers) that uniquely represents a face's features. The key idea here is that similar faces have similar embeddings. We get this embedding from the face analyzer.
- It's initialized to None because when the tracker starts we don't have a face to track yet.
- It is updated each frame by a weighted average.
first_face_position (Tuple[float, float], Optional):
- This is a tuple containing the (x, y) coordinates of the face's center in the frame.
- It is used for calculating the consistency of movement with past tracked face.
- It is also initialized to None initially, updated each frame.
- It is updated each frame by a weighted average.
first_face_id (int, Optional): * This is an unique identification number for the face on each frame. * If the faces have the same identification number on two frames it increases our confidence that they are the same face. * It is also initialized to None initially, updated each frame.
face_lost_count (int):
- A counter that keeps track of how many consecutive frames the tracking algorithm failed to find the face it was tracking.
- If this count exceeds a certain threshold (not defined directly in this function, it is implicit in if we found a face or not), the face will no longer be considered for tracking.
- Initialized to 0 and incremented or reset each frame.
face_position_history (deque):
- A double ended queue (deque) storing the last 30 face positions.
- Used to calculate an average position, useful for predicting where the face should be if we did not see the face on a current frame.
- Limited to a size of 30, so it acts as a short memory of where the face was.
target_face (Face):
- An object containing the information about each of the detected faces in the current frame.
- This includes properties like the bounding box, landmarks, and embedding.
- It's what the algorithm uses to find if this face is the face that we were previously tracking.
source_face (List[Face]):
- A list containing the Face object for our source face.
- This is the face we use to replace the tracked target face.
active_source_index (int):
- An integer determining which index from the source_face to use.
- Usually just 0, meaning we only use one source face.

Algorithmic Walkthrough

The _process_face_tracking_single function can be broken down into distinct stages:

Initialization Check:
- The function first checks if first_face_embedding is None. If it is, this is the first frame where we've encountered a face to track.
- In this case:
  - The target_face's embedding is extracted using extract_face_embedding and is assigned to first_face_embedding. This essentially becomes the "reference fingerprint" of the face we want to track.
  - The target_face's center position is extracted using get_face_center and assigned to first_face_position.
  - The target_face's unique id is extracted using id(target_face) and assigned to first_face_id.
  - face_lost_count is set to 0, because we found a face.
  - face_position_history is cleared, to only track this face.
  - first_face_position is appended to the face_position_history.
  - The _process_face_swap function is called and the current timestamp is saved to the last_swap_time global variable, marking the first frame's successful swap.
  - The function then returns, having processed the initial face.
Face Detection and Best Match Finding:
- If first_face_embedding is not None, the algorithm is already tracking a face.
- The function will call _detect_faces(frame) to get a new list of faces that may be the face we are tracking.
- If no face is detected:
- The counter for face_lost_count is incremented, as the face was not found in the current frame. * If the use of pseudo_face is enabled using modules.globals.use_pseudo_face and our best score is below a certain value from modules.globals.pseudo_face_threshold we create a pseudo face and use it for the swap. * The function then returns.
- Otherwise, the algorithm prepares to loop through all the faces and score them based on how well they match the face we are tracking.
- For each detected face in the current frame, the following is done:
  - target_embedding: A new embedding of the detected target_face is obtained using extract_face_embedding.
  - target_position: The position of the detected target_face is obtained using get_face_center.
  - Embedding Similarity: The cosine similarity between first_face_embedding and the target_embedding is calculated using the cosine_similarity function. This score represents how similar the new face looks to the previously tracked face based on their "fingerprints". The closer to 1.0, the more similar the faces are in appearance.
  - Position Consistency: This is calculated by determining the inverse of the distance between the average of previous positions stored in face_position_history and the target_position . This score is not the distance, the closer the new face position to the average of the last 30 positions, the higher the score is. If the face_position_history is empty, then we use the previous first_face_position.
  - Total Match Score: A weighted score of these two metrics (embedding_similarity and position_consistency) is calculated based on weights from modules.globals.embedding_weight_size and modules.globals.position_size by the following forumla:
    - TOTAL_WEIGHT = EMBEDDING_WEIGHT * modules.globals.weight_distribution_size + POSITION_WEIGHT
    - match_score = ((EMBEDDING_WEIGHT * embedding_similarity + POSITION_WEIGHT * position_consistency) / TOTAL_WEIGHT)
  - Stickiness: If the unique id of the detected face is the same as the current first_face_id we multiply the score by (1 + STICKINESS_FACTOR). This means that the current face is more likely to be picked as the best match. This helps to avoid sudden flickering or swapping between multiple faces. * The score with the highest total match_score is chosen as the best_match_face.
Face Tracking Update:
- After going through all detected faces, we check if a best_match_face was found and the score is higher then the modules.globals.sticky_face_value.
- If a face is found:
  - The counter for face_lost_count is set to 0.
  - The tracked face's embedding is updated to reflect the new best_match_face by taking a weighted average of both the old embedding and the new embedding using OLD_WEIGHT from modules.globals.old_embedding_weight and NEW_WEIGHT from modules.globals.new_embedding_weight
  - The tracked face's position is updated to reflect the new best_match_face by getting the position of the best_match_face using get_face_center.
  - The unique id is also updated with the new id using id(best_match_face)
  - We add the new position to the breadcrumbs (face_position_history)
  - The _process_face_swap function is called, which performs the actual face swap using our best_match_face.
- If no good face is found: * The counter for face_lost_count is incremented, as the face was not found in the current frame. * If the use of pseudo_face is enabled using modules.globals.use_pseudo_face and our best score is below a certain value from modules.globals.pseudo_face_threshold we create a pseudo face and use it for the swap.
Do Nothing:
- If for any reason no face was found or a fake face was used, we do nothing to the current frame.

Key Concepts Explained

Face Embeddings: Face embeddings are a technique for capturing the essence of a face in a high-dimensional space. A mathematical algorithm (the face analyzer) translates a face into a long list of numbers. Similar faces will have similar numbers.
Cosine Similarity: This is a way to measure how similar two vectors (like embeddings) are. A cosine similarity of 1.0 means the vectors are exactly the same, while 0.0 means they are completely different.
Position Consistency: By tracking the position of the face and only selecting a face that is near our previous location, we minimize the possibility of tracking the wrong face.
Weighted Averages: By taking weighted averages we get a more consistent update to the face position and embedding that helps to avoid jitteriness and abrupt changes when the face moves.
Stickiness: The "stickiness" logic is used to ensure that the tracker sticks to the same face between frames instead of quickly swapping to a new face when a new face is detected.
Pseudo Face: When we can't track a face, we can create a pseudo face by using our breadcrumb history.

Practical Considerations

modules.globals: This module contains all settings and global configurations. This is where all the weights, distances, and variables are kept. This means you can change many settings on the fly from the UI, without changing the code.
Thresholds: The thresholds, like the stickiness and pseudo face threshold, greatly effect the way the face tracker works. These values may need to be tweaked to get the desired effect.

Conclusion

The _process_face_tracking_single function is a powerful algorithm for tracking a single face in a video. It uses a combination of face embeddings for recognition, position for consistency, and smoothing techniques to make a seamless face-swapping experience. While there are several constants and variables in the function, these values are carefully tuned to achieve a good trade off between accuracy and responsiveness.

Understanding these details provides insight into the complexities of the face-swapping process and the methods used to keep the desired face consistent throughout a video.