Two Faces: How Face Tracking Works - iVideoGameBoss/iRoopDeepFaceCam GitHub Wiki

Deep Dive: How Face Tracking Works (Two Faces) - Advanced

This page provides an in-depth explanation of how the _process_face_tracking_both function operates. It details the process for tracking two faces concurrently, handling the complexity of two moving targets and avoiding common issues like flickering or swapping of face assignments. We'll delve into the logic, algorithms, and nuances that make this function work.

Core Function: Simultaneous Two-Face Tracking

The _process_face_tracking_both function aims to track two distinct faces throughout the video, allowing for a smooth face swap between the two. This requires keeping track of the embeddings, positions, and unique ids for two faces, making decisions on how to handle new faces in each frame, and preventing the tracker from mixing up the identities.

Detailed Variable Breakdown

Let's break down the key variables and their roles in this process:

  • first_face_embedding (numpy.ndarray, Optional):
    • The face embedding of the first face being tracked. This is a high-dimensional vector that uniquely represents facial features.
    • Initialized to None, assigned to the first matching face and then updated each frame using weighted average.
  • second_face_embedding (numpy.ndarray, Optional):
    • The face embedding of the second face being tracked.
    • Initialized to None, assigned to the first matching face and then updated each frame using weighted average.
  • first_face_position (Tuple[float, float], Optional):
    • The (x, y) coordinates of the center of the first face being tracked.
    • Used to calculate position consistency with previous frames.
    • Initialized to None, assigned to the first matching face and then updated each frame using weighted average.
  • second_face_position (Tuple[float, float], Optional):
    • The (x, y) coordinates of the center of the second face being tracked.
    • Used to calculate position consistency with previous frames.
    • Initialized to None, assigned to the first matching face and then updated each frame using weighted average.
  • first_face_id (int, Optional):
    • Unique identification number for the first face on each frame.
    • Initialized to None, assigned to the first matching face.
  • second_face_id (int, Optional): * Unique identification number for the second face on each frame. * Initialized to None, assigned to the first matching face.
  • first_face_position_history (deque):
    • A double-ended queue storing the last 30 face positions for the first face.
    • Used to calculate an average position, useful for predicting where the face should be if we did not see the face on a current frame.
    • Limited to a size of 30, so it acts as a short memory of where the face was.
  • second_face_position_history (deque):
    • A double-ended queue storing the last 30 face positions for the second face.
    • Used to calculate an average position, useful for predicting where the face should be if we did not see the face on a current frame.
    • Limited to a size of 30, so it acts as a short memory of where the face was.
  • target_face (Face):
    • An object containing the information about each of the detected faces in the current frame.
    • This includes properties like the bounding box, landmarks, and embedding.
    • It's what the algorithm uses to find if this face is either the first or second face that we were previously tracking.
  • source_face (List[Face]):
    • A list containing the Face object for our source face.
    • This is the face we use to replace the tracked target face.
  • source_index (int):
    • An integer determining which index from the source_face to use for this target_face.
    • Used for identifying which of our two faces is going to be swapped.
  • source_face_order (List[int]):
    • A list containing [0, 1] which is the order of which source faces to use. If flipping faces is enabled this list will change to [1, 0].

Detailed Algorithmic Steps

The _process_face_tracking_both function handles two-face tracking through the following steps:

  1. Initialization of Variables:

    • We use globals() to check if the face history queues exist or not, if they do not then we create them and add them to the globals() variables, this means these variables will persist between function calls. This is to ensure the face position histories are persistent and shared between function calls.
    • The function extracts the embedding (target_embedding), position (target_position), and unique id (face_id) of the target_face.
    • use_pseudo_face is a boolean we use to know if we need to use a fake face or not.
  2. Data Structure for Tracked Faces:

    • A dictionary, tracked_faces, is created to store data for both tracked faces in one structure. This improves how we can loop through each tracked face.
      tracked_faces = {
         0: {
             "embedding": first_face_embedding,
             "position": first_face_position,
             "id": first_face_id,
             "history": first_face_position_history
          },
        1: {
            "embedding": second_face_embedding,
            "position": second_face_position,
            "id": second_face_id,
            "history": second_face_position_history
          },
      }
      
  3. Check If All Faces Are Initialized:

    • The code checks if both first_face_embedding and second_face_embedding are not None, implying that we're already tracking two faces.
    • If both faces are already being tracked:
      • We initialize variables for best_match_score and best_match_index, which are set to -1 because they are going to be used to track the best face match.
      • We loop through each tracked face by looping through the tracked_faces dictionary.
      • For each tracked face, we do the following:
        • The embedding, position and history of the current face are extracted
        • Similarity: The cosine similarity between the tracked face's embedding and the target_embedding is calculated. The closer this value is to 1.0 the more similar it is.
        • Position Consistency: A score is determined based on the inverse of the distance from the target_position and the average of the position history.
        • Total Match Score: A weighted score of the similarity and position_consistency is calculated using weights from modules.globals.embedding_weight_size and modules.globals.position_size. The code below shows the total score calculation.
          • TOTAL_WEIGHT = EMBEDDING_WEIGHT * modules.globals.weight_distribution_size + POSITION_WEIGHT
          • score = ((EMBEDDING_WEIGHT * similarity + POSITION_WEIGHT * position_consistency) / TOTAL_WEIGHT)
        • Stickiness: If the unique id of the detected face is the same as the current tracked face's id we multiply the score by (1 + STICKINESS_FACTOR). This helps to avoid flickering between multiple faces.
        • If the calculated score is greater then the best_match_score, we remember the index of the best match and its score.
      • If the best score is found to be higher then the modules.globals.sticky_face_value:
        • We get the data of the tracked face from the tracked_faces dictionary using the best_match_index.
        • We update the embedding of the tracked face with a weighted average of the current embedding and the new target_embedding, using weights from modules.globals.old_embedding_weight and modules.globals.new_embedding_weight.
        • We update the position of the tracked face with a weighted average of the current position and new target_position, using a static weight 0.8 for the old position and 0.2 for the new position.
        • The tracked face's id is also updated.
        • The new position is appended to the face's history.
        • We also update the score on global variable for the face that was tracked for display in the UI
        • The corresponding index from the source face order from source_face_order is determined and stored in source_index.
    • Else if the current best score is below modules.globals.pseudo_face_threshold and modules.globals.use_pseudo_face is true:
      • We set use_pseudo_face to true.
      • If the best_match_index is 0 then we create a pseudo face using the average of the first_face_position_history or if it is empty we use first_face_position.
      • Otherwise if the best_match_index is 1 then we create a pseudo face using the average of the second_face_position_history or if it is empty we use second_face_position.
      • If the best match was not 0 or 1 we use the current target_position.
      • We call _process_face_swap with the pseudo face, source_face and source_index.
    • Otherwise, if no good match was found, or modules.globals.use_pseudo_face is false we do nothing and return
  4. Initialization of One or Both Faces:

    • If the above check fails (meaning not both faces are being tracked), then at least one of the faces needs to be initialized.
    • The code extracts the correct source_index from source_face_order.
    • If the extracted source_index is 0:
      • The target_embedding is assigned to first_face_embedding.
      • The target_position is assigned to first_face_position.
      • The face_id is assigned to first_face_id.
      • The target_position is also appended to the first_face_position_history.
    • Else (the source_index is 1):
      • The target_embedding is assigned to second_face_embedding.
      • The target_position is assigned to second_face_position.
      • The face_id is assigned to second_face_id.
      • The target_position is also appended to the second_face_position_history.
  5. Pseudo Face Call

    • If use_pseudo_face was set to true in step 3 then we need to call _process_face_swap with a pseudo face
      • If the current source_index is 0 then a pseudo face will be created using the first_face_position_history, or if that is empty the current first_face_position.
      • Otherwise a pseudo face will be created using the second_face_position_history, or if that is empty the current second_face_position.
      • We then return the result of _process_face_swap with the pseudo face.
  6. Face Swap Call

    • Otherwise, if no pseudo face is used we call _process_face_swap with the target_face, source_face, and the source_index and return.

Key Concepts and Techniques

  • Simultaneous Tracking: This function is optimized for tracking two faces at once without getting them mixed up.
  • Weighted Averaging: Weighted averaging of the embeddings and positions ensures a smoother update, reducing abrupt changes and flickering and improving the stability of the tracking.
  • History-Based Position Tracking: Using a history of previous positions makes it less likely for the algorithm to lose track of the face, even if the face moves quickly or briefly disappears.
  • Stickiness Factor: This helps maintain continuity in face tracking. It is more likely to track the same face between frames than to switch to a different face.
  • Pseudo Faces: Pseudo faces can be used if the face is occluded or if the face turns away from the camera.
  • Modular Design: The function builds upon other modules such as extract_face_embedding, get_face_center, _process_face_swap, and cosine_similarity. This promotes code reuse and readability.
  • Global Module: The use of global values in modules.globals allows for configuration of many of these variables such as thresholds, weights and distances on the fly from the UI without modifying the code.

Practical Considerations

  • Performance: This function may require more resources compared to the single-face tracking because of the extra computations for each face being tracked.
  • Robustness: While this algorithm is robust, it can still have some issues in cases of rapid movements, extreme angles, or significant occlusions of the face.
  • Settings: The specific values used for constants like STICKINESS_FACTOR, embedding_weight_size, position_size and the pseudo_face_threshold all have an impact on the overall performance of the tracker and can be changed on the fly from the UI.

Conclusion

The _process_face_tracking_both function is a sophisticated algorithm for tracking two faces in video. It combines embeddings, positions, weighted averages, history and a stickiness factor to create stable, robust tracking. Understanding these intricacies provides insight into the challenges and techniques involved in achieving robust and seamless face-swapping with multiple targets. This detailed exploration should provide a comprehensive understanding of this advanced face tracking component.