Many Faces: How Face Tracking Works - iVideoGameBoss/iRoopDeepFaceCam GitHub Wiki
Deep Dive: How Face Tracking Works (Many Faces) - Advanced
This page provides a comprehensive and in-depth analysis of the _process_face_tracking_many
function. It details the methods used for tracking multiple faces, up to 10 in this implementation, focusing on dynamic tracking, avoiding face-id mixing, and maintaining a smooth and consistent face-swapping experience. This function is the most complex, as it deals with a dynamic number of tracked objects.
Core Function: Dynamic Multi-Face Tracking
The primary objective of _process_face_tracking_many
is to track up to 10 distinct faces in a video, allowing for face swapping on each of them with different source faces. This involves tracking a dynamic number of faces, assigning a source face to each target face correctly, and handling situations when faces enter, exit, or are temporarily lost in the video frame.
Detailed Variable Breakdown
Let's detail the crucial variables within this function:
tracked_faces_many
(dict):- A dictionary that stores the data of each tracked face. The dictionary keys are integers, and the values are dictionaries containing the following:
embedding
(numpy.ndarray): The face embedding of the tracked face.position
(Tuple[float, float]): The (x, y) coordinates of the center of the tracked face.id
(int): The unique identification number for the face.- The keys are the track ids for each face.
- This dictionary is created using the
globals()
function if it does not exist.
position_histories_many
(dict):- A dictionary that stores each faces position history. The keys of the dictionary are track ids for each face, and the value is a
deque
storing the last 30 face positions for that particular face. - Used to calculate an average position, useful for predicting where the face should be if we did not see the face on a current frame.
- Limited to a size of 30, so it acts as a short memory of where the face was.
- This dictionary is created using the
globals()
function if it does not exist.
- A dictionary that stores each faces position history. The keys of the dictionary are track ids for each face, and the value is a
target_embedding
(numpy.ndarray):- The face embedding of the current
target_face
.
- The face embedding of the current
target_position
(Tuple[float, float]):- The (x, y) coordinates of the center of the current
target_face
.
- The (x, y) coordinates of the center of the current
face_id
(int):- Unique identification number of the current
target_face
.
- Unique identification number of the current
use_pseudo_face
(bool):- A variable to check if we should create a pseudo face or not.
target_face
(Face):- An object containing the information about each of the detected faces in the current frame.
- This includes properties like the bounding box, landmarks, and embedding.
- It's what the algorithm uses to find if this face is one of the tracked faces.
source_face
(List[Face]):- A list containing the
Face
object for our source face. - This is the face we use to replace the tracked target face.
- A list containing the
source_index
(int):- An integer determining which index from the
source_face
to use for thistarget_face
. - Used for identifying which of our source faces are going to be swapped.
- An integer determining which index from the
source_face_order
(List[int]):- A list containing
[0, 1]
which is the order of which source faces to use. If flipping faces is enabled this list will change to[1, 0]
. This parameter is actually not used in_process_face_tracking_many
and is left for compatibility.
- A list containing
Algorithmic Breakdown: Dynamic Multi-Face Tracking
The _process_face_tracking_many
function operates through the following detailed process:
-
Initialization of Global Dictionaries:
- We use
globals()
to check if the face tracking dictionaries exist or not, if they do not then we create them and add them to theglobals()
variables, this means these variables will persist between function calls. This is to ensure the dictionaries are persistent and shared between function calls.
- We use
-
Extraction of Target Face Data:
- The function extracts the embedding (
target_embedding
), position (target_position
), and unique id (face_id
) of the inputtarget_face
. use_pseudo_face
is a variable that we set toFalse
initially.
- The function extracts the embedding (
-
Iterating Through Tracked Faces and Finding the Best Match
- We initialize
best_match_score
to-1
andbest_match_key
toNone
. - The code then iterates through each tracked face in
tracked_faces_many
to determine a best match by looping through the track ids and their data in thetracked_faces_many
dictionary using theitems()
function.- For each tracked face, we get:
- The
track_embedding
,track_position
, andtrack_history
(fromposition_histories_many
) - Similarity Score: The cosine similarity between
track_embedding
andtarget_embedding
is calculated. This indicates how similar the currenttarget_face
looks compared to the tracked face, based on their "fingerprints." The closer this value is to 1.0 the more similar it is. - Position Consistency Score: We calculate a position consistency score, this score is the inverse of the distance between the average from
track_history
or thetrack_position
iftrack_history
is empty. * Total Match Score: A weighted score is calculated, combining the "fingerprint score" and "position consistency score" using the following forumla using global variables frommodules.globals
:TOTAL_WEIGHT = EMBEDDING_WEIGHT * modules.globals.weight_distribution_size + POSITION_WEIGHT
score = ((EMBEDDING_WEIGHT * similarity + POSITION_WEIGHT * position_consistency) / TOTAL_WEIGHT)
- Stickiness: If the
face_id
is the same as the trackedid
of the face we increase the score further by using the following code:if track_data.get("id") == face_id: score *= (1 + STICKINESS_FACTOR)
- If the calculated
score
is higher thenbest_match_score
then we update ourbest_match_score
and thebest_match_key
.
- The
- For each tracked face, we get:
- We initialize
-
Updating Tracked Face Based on Best Match
- After looping through each tracked face, we determine if a
best_match_key
was found and is higher then the value in the variablemodules.globals.sticky_face_value
. - If we find a best match
- We get the tracked face data from
tracked_faces_many
using thebest_match_key
. - We get the
track_history
from theposition_histories_many
dictionary using the samebest_match_key
. - The embedding of this face is updated using a weighted average of the
track_embedding
and the newtarget_embedding
using the variablesmodules.globals.old_embedding_weight
andmodules.globals.new_embedding_weight
. - The position of this face is updated using a weighted average of the
track_position
and the newtarget_position
by using a static weight of 0.8 for the old position and 0.2 for the new position. - The unique
id
of the face is also updated - The new
target_position
is appended to thetrack_history
. -
The score is also stored in a global variable to update the UI. If the `best_match_key` is below `10`, we use the following: `setattr(modules.globals, f"target_face{best_match_key + 1}_score", best_match_score)`
- The
source_index
is determined using the following code:source_index = best_match_key % len(source_face)
- The
_process_face_swap
is called and we return the result. * Else if no good match was found: - We check to see if we should create a pseudo face by checking
modules.globals.use_pseudo_face
and if our best score is lower thenmodules.globals.pseudo_face_threshold
.- If we should create a pseudo face:
- We get the tracked
track_history
fromposition_histories_many
using thebest_match_key
or if that is empty we get theposition
for the tracked face, or if that isNone
we use the currenttarget_position
. - A pseudo face is created with this position by calling
create_pseudo_face()
. - The correct
source_index
is set usingsource_index = 0 if not source_face else 0 % len(source_face)
_process_face_swap
is called using thepseudo_face
,source_face
andsource_index
.
- We get the tracked
- If we should create a pseudo face:
- We get the tracked face data from
- After looping through each tracked face, we determine if a
-
Handling New Faces:
- If no
best_match_key
was found or we should not create a pseudo face we need to handle if this is a new face that we should track.- The code checks if the
tracked_faces_many
dictionary is less then10
, meaning we can track more faces. * If we can track more faces:- A new track id (
new_key
) is created by counting the number of current keys in thetracked_faces_many
dictionary. - A new dictionary is created and added to
tracked_faces_many
using thenew_key
with theembedding
,position
andid
from ourtarget_face
. - A new deque is created and stored in
position_histories_many
using the samenew_key
. - The new position is added to the position history.
- A
source_index
is assigned using the following code:source_index = new_key % len(source_face)
- The score is also initialized in the global variable, for display purposes in the UI. if the
new_key
is less then10
:setattr(modules.globals, f"target_face{new_key + 1}_score", 0.00)
- The
_process_face_swap
is called with thetarget_face
,source_face
and thesource_index
and the result is returned.
- A new track id (
- If we have max tracked faces: * We do not add any new faces and we simply return the current frame.
- The code checks if the
- If no
Key Concepts and Techniques
- Dynamic Tracking: The function is designed to dynamically track an arbitrary number of faces (up to 10) as they enter and exit the frame.
- Modular Design: The function builds upon previously discussed functions like
extract_face_embedding
,get_face_center
,_process_face_swap
, andcosine_similarity
. - Weighted Averaging: Weighted averaging of the embeddings and positions ensures a smoother update, reducing abrupt changes and flickering and improving the stability of the tracking.
- Position History Tracking: By using a short history of previous positions, we can predict the next likely location of the face making the tracker more robust.
- Stickiness Factor: This helps maintain continuity in face tracking. It is more likely to track the same face between frames than to switch to a different face.
- Global Dictionaries: By using global dictionaries with keys as track ids, we can uniquely identify, access and modify tracked faces more efficiently.
- Pseudo Faces: We can use the face tracking history to make fake faces to make our tracking more robust.
- Global Module: The use of global values in
modules.globals
allows for configuration of many of these variables such as thresholds, weights and distances on the fly from the UI without modifying the code.
Practical Considerations
- Performance: Tracking multiple faces simultaneously can be computationally intensive, and may impact the overall performance of the application depending on device limitations.
- Robustness: Although this function is robust, it can have problems with rapid movements, overlapping faces, or faces that disappear for extended periods of time.
- Max Tracked Faces: The max of
10
tracked faces is hard coded in this function and can only be increased by changing the code. - Settings: The specific values used for constants like
STICKINESS_FACTOR
,embedding_weight_size
,position_size
and thepseudo_face_threshold
all have an impact on the overall performance of the tracker and can be changed on the fly from the UI. - Memory Management: Since we are storing data in dictionaries, care should be taken to manage the memory usage so we don't have issues with performance as more faces are tracked.
Conclusion
The _process_face_tracking_many
function represents an advanced approach to face tracking, enabling multi-face processing through careful management of face embeddings, positions, history, and ID assignment. This in-depth exploration should provide a thorough understanding of this core component and the complexity involved in achieving robust multi-face swapping in real-time video. The ability to dynamically track up to 10 faces with good stability and accuracy is a significant achievement.