Single Face: How Face Tracking Works - iVideoGameBoss/iRoopDeepFaceCam GitHub Wiki
Deep Dive: How Face Tracking Works (Single Face) - Advanced
This page provides an in-depth explanation of how the _process_face_tracking_single
function works. We will delve into the algorithms and logic behind this process, including how it handles face embeddings, position tracking, and how it determines if a face in the current frame is the same face it was tracking in previous frames.
Core Goal: Continuous Face Swapping
The core function of _process_face_tracking_single
is to ensure that when a video frame is processed, the face swap occurs on the same face throughout the video. This involves accurately tracking the desired face and using this identified face for the swap. This process is not straightforward since the faces are constantly in motion, rotating, scaling, and can even disappear from the camera for a few frames.
Detailed Breakdown of Variables and Their Roles
Let's review the variables in more detail:
first_face_embedding
(numpy.ndarray, Optional):- This variable holds the face embedding for the face we are tracking. A face embedding is a high-dimensional vector (a long list of numbers) that uniquely represents a face's features. The key idea here is that similar faces have similar embeddings. We get this embedding from the face analyzer.
- It's initialized to
None
because when the tracker starts we don't have a face to track yet. - It is updated each frame by a weighted average.
first_face_position
(Tuple[float, float], Optional):- This is a tuple containing the (x, y) coordinates of the face's center in the frame.
- It is used for calculating the consistency of movement with past tracked face.
- It is also initialized to
None
initially, updated each frame. - It is updated each frame by a weighted average.
first_face_id
(int, Optional): * This is an unique identification number for the face on each frame. * If the faces have the same identification number on two frames it increases our confidence that they are the same face. * It is also initialized toNone
initially, updated each frame.face_lost_count
(int):- A counter that keeps track of how many consecutive frames the tracking algorithm failed to find the face it was tracking.
- If this count exceeds a certain threshold (not defined directly in this function, it is implicit in if we found a face or not), the face will no longer be considered for tracking.
- Initialized to
0
and incremented or reset each frame.
face_position_history
(deque):- A double ended queue (deque) storing the last 30 face positions.
- Used to calculate an average position, useful for predicting where the face should be if we did not see the face on a current frame.
- Limited to a size of 30, so it acts as a short memory of where the face was.
target_face
(Face):- An object containing the information about each of the detected faces in the current frame.
- This includes properties like the bounding box, landmarks, and embedding.
- It's what the algorithm uses to find if this face is the face that we were previously tracking.
source_face
(List[Face]):- A list containing the
Face
object for our source face. - This is the face we use to replace the tracked target face.
- A list containing the
active_source_index
(int):- An integer determining which index from the
source_face
to use. - Usually just
0
, meaning we only use one source face.
- An integer determining which index from the
Algorithmic Walkthrough
The _process_face_tracking_single
function can be broken down into distinct stages:
-
Initialization Check:
- The function first checks if
first_face_embedding
isNone
. If it is, this is the first frame where we've encountered a face to track. - In this case:
- The
target_face
's embedding is extracted usingextract_face_embedding
and is assigned tofirst_face_embedding
. This essentially becomes the "reference fingerprint" of the face we want to track. - The
target_face
's center position is extracted usingget_face_center
and assigned tofirst_face_position
. - The
target_face
's unique id is extracted usingid(target_face)
and assigned tofirst_face_id
. face_lost_count
is set to0
, because we found a face.face_position_history
is cleared, to only track this face.first_face_position
is appended to theface_position_history
.- The
_process_face_swap
function is called and the current timestamp is saved to thelast_swap_time
global variable, marking the first frame's successful swap. - The function then returns, having processed the initial face.
- The
- The function first checks if
-
Face Detection and Best Match Finding:
- If
first_face_embedding
is notNone
, the algorithm is already tracking a face. - The function will call
_detect_faces(frame)
to get a new list of faces that may be the face we are tracking. - If no face is detected:
- The counter for
face_lost_count
is incremented, as the face was not found in the current frame. * If the use ofpseudo_face
is enabled usingmodules.globals.use_pseudo_face
and our best score is below a certain value frommodules.globals.pseudo_face_threshold
we create a pseudo face and use it for the swap. * The function then returns. - Otherwise, the algorithm prepares to loop through all the faces and score them based on how well they match the face we are tracking.
- For each detected face in the current frame, the following is done:
target_embedding
: A new embedding of the detectedtarget_face
is obtained usingextract_face_embedding
.target_position
: The position of the detectedtarget_face
is obtained usingget_face_center
.- Embedding Similarity: The cosine similarity between
first_face_embedding
and thetarget_embedding
is calculated using thecosine_similarity
function. This score represents how similar the new face looks to the previously tracked face based on their "fingerprints". The closer to 1.0, the more similar the faces are in appearance. - Position Consistency: This is calculated by determining the inverse of the distance between the average of previous positions stored in
face_position_history
and thetarget_position
. This score is not the distance, the closer the new face position to the average of the last 30 positions, the higher the score is. If theface_position_history
is empty, then we use the previousfirst_face_position
. - Total Match Score: A weighted score of these two metrics (
embedding_similarity
andposition_consistency
) is calculated based on weights frommodules.globals.embedding_weight_size
andmodules.globals.position_size
by the following forumla:TOTAL_WEIGHT = EMBEDDING_WEIGHT * modules.globals.weight_distribution_size + POSITION_WEIGHT
match_score = ((EMBEDDING_WEIGHT * embedding_similarity + POSITION_WEIGHT * position_consistency) / TOTAL_WEIGHT)
- Stickiness: If the unique
id
of the detected face is the same as the currentfirst_face_id
we multiply the score by(1 + STICKINESS_FACTOR)
. This means that the current face is more likely to be picked as the best match. This helps to avoid sudden flickering or swapping between multiple faces. * The score with the highest totalmatch_score
is chosen as thebest_match_face
.
- If
-
Face Tracking Update:
- After going through all detected faces, we check if a
best_match_face
was found and the score is higher then themodules.globals.sticky_face_value
. - If a face is found:
- The counter for
face_lost_count
is set to0
. - The tracked face's embedding is updated to reflect the new
best_match_face
by taking a weighted average of both the old embedding and the new embedding usingOLD_WEIGHT
frommodules.globals.old_embedding_weight
andNEW_WEIGHT
frommodules.globals.new_embedding_weight
- The tracked face's position is updated to reflect the new
best_match_face
by getting the position of thebest_match_face
usingget_face_center
. - The unique
id
is also updated with the new id usingid(best_match_face)
- We add the new position to the breadcrumbs (
face_position_history
) - The
_process_face_swap
function is called, which performs the actual face swap using ourbest_match_face
.
- The counter for
- If no good face is found:
* The counter for
face_lost_count
is incremented, as the face was not found in the current frame. * If the use ofpseudo_face
is enabled usingmodules.globals.use_pseudo_face
and our best score is below a certain value frommodules.globals.pseudo_face_threshold
we create a pseudo face and use it for the swap.
- After going through all detected faces, we check if a
-
Do Nothing:
- If for any reason no face was found or a fake face was used, we do nothing to the current frame.
Key Concepts Explained
- Face Embeddings: Face embeddings are a technique for capturing the essence of a face in a high-dimensional space. A mathematical algorithm (the face analyzer) translates a face into a long list of numbers. Similar faces will have similar numbers.
- Cosine Similarity: This is a way to measure how similar two vectors (like embeddings) are. A cosine similarity of 1.0 means the vectors are exactly the same, while 0.0 means they are completely different.
- Position Consistency: By tracking the position of the face and only selecting a face that is near our previous location, we minimize the possibility of tracking the wrong face.
- Weighted Averages: By taking weighted averages we get a more consistent update to the face position and embedding that helps to avoid jitteriness and abrupt changes when the face moves.
- Stickiness: The "stickiness" logic is used to ensure that the tracker sticks to the same face between frames instead of quickly swapping to a new face when a new face is detected.
- Pseudo Face: When we can't track a face, we can create a pseudo face by using our breadcrumb history.
Practical Considerations
modules.globals
: This module contains all settings and global configurations. This is where all the weights, distances, and variables are kept. This means you can change many settings on the fly from the UI, without changing the code.- Thresholds: The thresholds, like the stickiness and pseudo face threshold, greatly effect the way the face tracker works. These values may need to be tweaked to get the desired effect.
Conclusion
The _process_face_tracking_single
function is a powerful algorithm for tracking a single face in a video. It uses a combination of face embeddings for recognition, position for consistency, and smoothing techniques to make a seamless face-swapping experience. While there are several constants and variables in the function, these values are carefully tuned to achieve a good trade off between accuracy and responsiveness.
Understanding these details provides insight into the complexities of the face-swapping process and the methods used to keep the desired face consistent throughout a video.