Two Faces: How Face Tracking Works - iVideoGameBoss/iRoopDeepFaceCam GitHub Wiki
Deep Dive: How Face Tracking Works (Two Faces) - Advanced
This page provides an in-depth explanation of how the _process_face_tracking_both
function operates. It details the process for tracking two faces concurrently, handling the complexity of two moving targets and avoiding common issues like flickering or swapping of face assignments. We'll delve into the logic, algorithms, and nuances that make this function work.
Core Function: Simultaneous Two-Face Tracking
The _process_face_tracking_both
function aims to track two distinct faces throughout the video, allowing for a smooth face swap between the two. This requires keeping track of the embeddings, positions, and unique ids for two faces, making decisions on how to handle new faces in each frame, and preventing the tracker from mixing up the identities.
Detailed Variable Breakdown
Let's break down the key variables and their roles in this process:
first_face_embedding
(numpy.ndarray, Optional):- The face embedding of the first face being tracked. This is a high-dimensional vector that uniquely represents facial features.
- Initialized to
None
, assigned to the first matching face and then updated each frame using weighted average.
second_face_embedding
(numpy.ndarray, Optional):- The face embedding of the second face being tracked.
- Initialized to
None
, assigned to the first matching face and then updated each frame using weighted average.
first_face_position
(Tuple[float, float], Optional):- The (x, y) coordinates of the center of the first face being tracked.
- Used to calculate position consistency with previous frames.
- Initialized to
None
, assigned to the first matching face and then updated each frame using weighted average.
second_face_position
(Tuple[float, float], Optional):- The (x, y) coordinates of the center of the second face being tracked.
- Used to calculate position consistency with previous frames.
- Initialized to
None
, assigned to the first matching face and then updated each frame using weighted average.
first_face_id
(int, Optional):- Unique identification number for the first face on each frame.
- Initialized to
None
, assigned to the first matching face.
second_face_id
(int, Optional): * Unique identification number for the second face on each frame. * Initialized toNone
, assigned to the first matching face.first_face_position_history
(deque):- A double-ended queue storing the last 30 face positions for the first face.
- Used to calculate an average position, useful for predicting where the face should be if we did not see the face on a current frame.
- Limited to a size of 30, so it acts as a short memory of where the face was.
second_face_position_history
(deque):- A double-ended queue storing the last 30 face positions for the second face.
- Used to calculate an average position, useful for predicting where the face should be if we did not see the face on a current frame.
- Limited to a size of 30, so it acts as a short memory of where the face was.
target_face
(Face):- An object containing the information about each of the detected faces in the current frame.
- This includes properties like the bounding box, landmarks, and embedding.
- It's what the algorithm uses to find if this face is either the first or second face that we were previously tracking.
source_face
(List[Face]):- A list containing the
Face
object for our source face. - This is the face we use to replace the tracked target face.
- A list containing the
source_index
(int):- An integer determining which index from the
source_face
to use for thistarget_face
. - Used for identifying which of our two faces is going to be swapped.
- An integer determining which index from the
source_face_order
(List[int]):- A list containing
[0, 1]
which is the order of which source faces to use. If flipping faces is enabled this list will change to[1, 0]
.
- A list containing
Detailed Algorithmic Steps
The _process_face_tracking_both
function handles two-face tracking through the following steps:
-
Initialization of Variables:
- We use
globals()
to check if the face history queues exist or not, if they do not then we create them and add them to theglobals()
variables, this means these variables will persist between function calls. This is to ensure the face position histories are persistent and shared between function calls. - The function extracts the embedding (
target_embedding
), position (target_position
), and unique id (face_id
) of thetarget_face
. use_pseudo_face
is a boolean we use to know if we need to use a fake face or not.
- We use
-
Data Structure for Tracked Faces:
- A dictionary,
tracked_faces
, is created to store data for both tracked faces in one structure. This improves how we can loop through each tracked face.tracked_faces = { 0: { "embedding": first_face_embedding, "position": first_face_position, "id": first_face_id, "history": first_face_position_history }, 1: { "embedding": second_face_embedding, "position": second_face_position, "id": second_face_id, "history": second_face_position_history }, }
- A dictionary,
-
Check If All Faces Are Initialized:
- The code checks if both
first_face_embedding
andsecond_face_embedding
are notNone
, implying that we're already tracking two faces. - If both faces are already being tracked:
- We initialize variables for
best_match_score
andbest_match_index
, which are set to-1
because they are going to be used to track the best face match. - We loop through each tracked face by looping through the
tracked_faces
dictionary. - For each tracked face, we do the following:
- The embedding, position and history of the current face are extracted
- Similarity: The cosine similarity between the tracked face's embedding and the
target_embedding
is calculated. The closer this value is to 1.0 the more similar it is. - Position Consistency: A score is determined based on the inverse of the distance from the
target_position
and the average of the position history. - Total Match Score: A weighted score of the
similarity
andposition_consistency
is calculated using weights frommodules.globals.embedding_weight_size
andmodules.globals.position_size
. The code below shows the total score calculation.TOTAL_WEIGHT = EMBEDDING_WEIGHT * modules.globals.weight_distribution_size + POSITION_WEIGHT
score = ((EMBEDDING_WEIGHT * similarity + POSITION_WEIGHT * position_consistency) / TOTAL_WEIGHT)
- Stickiness: If the unique
id
of the detected face is the same as the current tracked face's id we multiply the score by(1 + STICKINESS_FACTOR)
. This helps to avoid flickering between multiple faces. - If the calculated score is greater then the
best_match_score
, we remember the index of the best match and its score.
- If the best score is found to be higher then the
modules.globals.sticky_face_value
:- We get the data of the tracked face from the
tracked_faces
dictionary using thebest_match_index
. - We update the embedding of the tracked face with a weighted average of the current embedding and the new
target_embedding
, using weights frommodules.globals.old_embedding_weight
andmodules.globals.new_embedding_weight
. - We update the position of the tracked face with a weighted average of the current position and new
target_position
, using a static weight0.8
for the old position and0.2
for the new position. - The tracked face's id is also updated.
- The new position is appended to the face's history.
- We also update the score on global variable for the face that was tracked for display in the UI
- The corresponding index from the source face order from
source_face_order
is determined and stored insource_index
.
- We get the data of the tracked face from the
- We initialize variables for
- Else if the current best score is below
modules.globals.pseudo_face_threshold
andmodules.globals.use_pseudo_face
is true:- We set
use_pseudo_face
to true. - If the
best_match_index
is0
then we create a pseudo face using the average of thefirst_face_position_history
or if it is empty we usefirst_face_position
. - Otherwise if the
best_match_index
is1
then we create a pseudo face using the average of thesecond_face_position_history
or if it is empty we usesecond_face_position
. - If the best match was not
0
or1
we use the currenttarget_position
. - We call
_process_face_swap
with the pseudo face,source_face
andsource_index
.
- We set
- Otherwise, if no good match was found, or
modules.globals.use_pseudo_face
is false we do nothing and return
- The code checks if both
-
Initialization of One or Both Faces:
- If the above check fails (meaning not both faces are being tracked), then at least one of the faces needs to be initialized.
- The code extracts the correct
source_index
fromsource_face_order
. - If the extracted
source_index
is0
:- The
target_embedding
is assigned tofirst_face_embedding
. - The
target_position
is assigned tofirst_face_position
. - The
face_id
is assigned tofirst_face_id
. - The
target_position
is also appended to thefirst_face_position_history
.
- The
- Else (the
source_index
is1
):- The
target_embedding
is assigned tosecond_face_embedding
. - The
target_position
is assigned tosecond_face_position
. - The
face_id
is assigned tosecond_face_id
. - The
target_position
is also appended to thesecond_face_position_history
.
- The
-
Pseudo Face Call
- If
use_pseudo_face
was set totrue
in step3
then we need to call_process_face_swap
with a pseudo face- If the current
source_index
is0
then a pseudo face will be created using thefirst_face_position_history
, or if that is empty the currentfirst_face_position
. - Otherwise a pseudo face will be created using the
second_face_position_history
, or if that is empty the currentsecond_face_position
. - We then return the result of
_process_face_swap
with the pseudo face.
- If the current
- If
-
Face Swap Call
- Otherwise, if no pseudo face is used we call
_process_face_swap
with thetarget_face
,source_face
, and thesource_index
and return.
- Otherwise, if no pseudo face is used we call
Key Concepts and Techniques
- Simultaneous Tracking: This function is optimized for tracking two faces at once without getting them mixed up.
- Weighted Averaging: Weighted averaging of the embeddings and positions ensures a smoother update, reducing abrupt changes and flickering and improving the stability of the tracking.
- History-Based Position Tracking: Using a history of previous positions makes it less likely for the algorithm to lose track of the face, even if the face moves quickly or briefly disappears.
- Stickiness Factor: This helps maintain continuity in face tracking. It is more likely to track the same face between frames than to switch to a different face.
- Pseudo Faces: Pseudo faces can be used if the face is occluded or if the face turns away from the camera.
- Modular Design: The function builds upon other modules such as
extract_face_embedding
,get_face_center
,_process_face_swap
, andcosine_similarity
. This promotes code reuse and readability. - Global Module: The use of global values in
modules.globals
allows for configuration of many of these variables such as thresholds, weights and distances on the fly from the UI without modifying the code.
Practical Considerations
- Performance: This function may require more resources compared to the single-face tracking because of the extra computations for each face being tracked.
- Robustness: While this algorithm is robust, it can still have some issues in cases of rapid movements, extreme angles, or significant occlusions of the face.
- Settings: The specific values used for constants like
STICKINESS_FACTOR
,embedding_weight_size
,position_size
and thepseudo_face_threshold
all have an impact on the overall performance of the tracker and can be changed on the fly from the UI.
Conclusion
The _process_face_tracking_both
function is a sophisticated algorithm for tracking two faces in video. It combines embeddings, positions, weighted averages, history and a stickiness
factor to create stable, robust tracking. Understanding these intricacies provides insight into the challenges and techniques involved in achieving robust and seamless face-swapping with multiple targets. This detailed exploration should provide a comprehensive understanding of this advanced face tracking component.