Many Faces: How Face Tracking Works - iVideoGameBoss/iRoopDeepFaceCam GitHub Wiki

Deep Dive: How Face Tracking Works (Many Faces) - Advanced

This page provides a comprehensive and in-depth analysis of the _process_face_tracking_many function. It details the methods used for tracking multiple faces, up to 10 in this implementation, focusing on dynamic tracking, avoiding face-id mixing, and maintaining a smooth and consistent face-swapping experience. This function is the most complex, as it deals with a dynamic number of tracked objects.

Core Function: Dynamic Multi-Face Tracking

The primary objective of _process_face_tracking_many is to track up to 10 distinct faces in a video, allowing for face swapping on each of them with different source faces. This involves tracking a dynamic number of faces, assigning a source face to each target face correctly, and handling situations when faces enter, exit, or are temporarily lost in the video frame.

Detailed Variable Breakdown

Let's detail the crucial variables within this function:

tracked_faces_many (dict):
- A dictionary that stores the data of each tracked face. The dictionary keys are integers, and the values are dictionaries containing the following:
- embedding (numpy.ndarray): The face embedding of the tracked face.
- position (Tuple[float, float]): The (x, y) coordinates of the center of the tracked face.
- id (int): The unique identification number for the face.
- The keys are the track ids for each face.
- This dictionary is created using the globals() function if it does not exist.
position_histories_many (dict):
- A dictionary that stores each faces position history. The keys of the dictionary are track ids for each face, and the value is a deque storing the last 30 face positions for that particular face.
- Used to calculate an average position, useful for predicting where the face should be if we did not see the face on a current frame.
- Limited to a size of 30, so it acts as a short memory of where the face was.
- This dictionary is created using the globals() function if it does not exist.
target_embedding (numpy.ndarray):
- The face embedding of the current target_face.
target_position (Tuple[float, float]):
- The (x, y) coordinates of the center of the current target_face.
face_id (int):
- Unique identification number of the current target_face.
use_pseudo_face (bool):
- A variable to check if we should create a pseudo face or not.
target_face (Face):
- An object containing the information about each of the detected faces in the current frame.
- This includes properties like the bounding box, landmarks, and embedding.
- It's what the algorithm uses to find if this face is one of the tracked faces.
source_face (List[Face]):
- A list containing the Face object for our source face.
- This is the face we use to replace the tracked target face.
source_index (int):
- An integer determining which index from the source_face to use for this target_face.
- Used for identifying which of our source faces are going to be swapped.
source_face_order (List[int]):
- A list containing [0, 1] which is the order of which source faces to use. If flipping faces is enabled this list will change to [1, 0]. This parameter is actually not used in _process_face_tracking_many and is left for compatibility.

Algorithmic Breakdown: Dynamic Multi-Face Tracking

The _process_face_tracking_many function operates through the following detailed process:

Initialization of Global Dictionaries:
- We use globals() to check if the face tracking dictionaries exist or not, if they do not then we create them and add them to the globals() variables, this means these variables will persist between function calls. This is to ensure the dictionaries are persistent and shared between function calls.
Extraction of Target Face Data:
- The function extracts the embedding (target_embedding), position (target_position), and unique id (face_id) of the input target_face.
- use_pseudo_face is a variable that we set to False initially.
Iterating Through Tracked Faces and Finding the Best Match
- We initialize best_match_score to -1 and best_match_key to None.
- The code then iterates through each tracked face in tracked_faces_many to determine a best match by looping through the track ids and their data in the tracked_faces_many dictionary using the items() function.
  - For each tracked face, we get:
    - The track_embedding, track_position, and track_history (from position_histories_many)
    - Similarity Score: The cosine similarity between track_embedding and target_embedding is calculated. This indicates how similar the current target_face looks compared to the tracked face, based on their "fingerprints." The closer this value is to 1.0 the more similar it is.
    - Position Consistency Score: We calculate a position consistency score, this score is the inverse of the distance between the average from track_history or the track_position if track_history is empty. * Total Match Score: A weighted score is calculated, combining the "fingerprint score" and "position consistency score" using the following forumla using global variables from modules.globals:
      - TOTAL_WEIGHT = EMBEDDING_WEIGHT * modules.globals.weight_distribution_size + POSITION_WEIGHT
      - score = ((EMBEDDING_WEIGHT * similarity + POSITION_WEIGHT * position_consistency) / TOTAL_WEIGHT)
    - Stickiness: If the face_id is the same as the tracked id of the face we increase the score further by using the following code:
      - if track_data.get("id") == face_id: score *= (1 + STICKINESS_FACTOR)
    - If the calculated score is higher then best_match_score then we update our best_match_score and the best_match_key.
Updating Tracked Face Based on Best Match
- After looping through each tracked face, we determine if a best_match_key was found and is higher then the value in the variable modules.globals.sticky_face_value.
- If we find a best match
  - We get the tracked face data from tracked_faces_many using the best_match_key.
  - We get the track_history from the position_histories_many dictionary using the same best_match_key.
  - The embedding of this face is updated using a weighted average of the track_embedding and the new target_embedding using the variables modules.globals.old_embedding_weight and modules.globals.new_embedding_weight.
  - The position of this face is updated using a weighted average of the track_position and the new target_position by using a static weight of 0.8 for the old position and 0.2 for the new position.
  - The unique id of the face is also updated
  - The new target_position is appended to the track_history.
  - ```
  The score is also stored in a global variable to update the UI. If the `best_match_key` is below `10`, we use the following: `setattr(modules.globals, f"target_face{best_match_key + 1}_score", best_match_score)`
```
- The source_index is determined using the following code:
  - source_index = best_match_key % len(source_face)
- The _process_face_swap is called and we return the result. * Else if no good match was found:
- We check to see if we should create a pseudo face by checking modules.globals.use_pseudo_face and if our best score is lower then modules.globals.pseudo_face_threshold.
  - If we should create a pseudo face:
    - We get the tracked track_history from position_histories_many using the best_match_key or if that is empty we get the position for the tracked face, or if that is None we use the current target_position.
    - A pseudo face is created with this position by calling create_pseudo_face().
    - The correct source_index is set using source_index = 0 if not source_face else 0 % len(source_face)
    - _process_face_swap is called using the pseudo_face, source_face and source_index.
Handling New Faces:
- If no best_match_key was found or we should not create a pseudo face we need to handle if this is a new face that we should track.
  - The code checks if the tracked_faces_many dictionary is less then 10, meaning we can track more faces. * If we can track more faces:
    - A new track id (new_key) is created by counting the number of current keys in the tracked_faces_many dictionary.
    - A new dictionary is created and added to tracked_faces_many using the new_key with the embedding, position and id from our target_face.
    - A new deque is created and stored in position_histories_many using the same new_key.
    - The new position is added to the position history.
    - A source_index is assigned using the following code:
      - source_index = new_key % len(source_face)
    - The score is also initialized in the global variable, for display purposes in the UI. if the new_key is less then 10: setattr(modules.globals, f"target_face{new_key + 1}_score", 0.00)
    - The _process_face_swap is called with the target_face, source_face and the source_index and the result is returned.
  - If we have max tracked faces: * We do not add any new faces and we simply return the current frame.

Key Concepts and Techniques

Dynamic Tracking: The function is designed to dynamically track an arbitrary number of faces (up to 10) as they enter and exit the frame.
Modular Design: The function builds upon previously discussed functions like extract_face_embedding, get_face_center, _process_face_swap, and cosine_similarity.
Weighted Averaging: Weighted averaging of the embeddings and positions ensures a smoother update, reducing abrupt changes and flickering and improving the stability of the tracking.
Position History Tracking: By using a short history of previous positions, we can predict the next likely location of the face making the tracker more robust.
Stickiness Factor: This helps maintain continuity in face tracking. It is more likely to track the same face between frames than to switch to a different face.
Global Dictionaries: By using global dictionaries with keys as track ids, we can uniquely identify, access and modify tracked faces more efficiently.
Pseudo Faces: We can use the face tracking history to make fake faces to make our tracking more robust.
Global Module: The use of global values in modules.globals allows for configuration of many of these variables such as thresholds, weights and distances on the fly from the UI without modifying the code.

Practical Considerations

Performance: Tracking multiple faces simultaneously can be computationally intensive, and may impact the overall performance of the application depending on device limitations.
Robustness: Although this function is robust, it can have problems with rapid movements, overlapping faces, or faces that disappear for extended periods of time.
Max Tracked Faces: The max of 10 tracked faces is hard coded in this function and can only be increased by changing the code.
Settings: The specific values used for constants like STICKINESS_FACTOR, embedding_weight_size, position_size and the pseudo_face_threshold all have an impact on the overall performance of the tracker and can be changed on the fly from the UI.
Memory Management: Since we are storing data in dictionaries, care should be taken to manage the memory usage so we don't have issues with performance as more faces are tracked.

Conclusion

The _process_face_tracking_many function represents an advanced approach to face tracking, enabling multi-face processing through careful management of face embeddings, positions, history, and ID assignment. This in-depth exploration should provide a thorough understanding of this core component and the complexity involved in achieving robust multi-face swapping in real-time video. The ability to dynamically track up to 10 faces with good stability and accuracy is a significant achievement.