Home - iVideoGameBoss/iRoopDeepFaceCam GitHub Wiki

iRoopDeepFaceCam: A Deep Dive into Face Swapping

Introduction

Imagine you have a magical mirror that can make you look like anyone you want, or even allow you to have one to ten faces, swap positions, track faces, and more! That's pretty much what iRoopDeepFaceCam does. It's a tool that lets you take your face (or the faces from a source image) and put it on another face in real time using a webcam, video, or an image file.

  • What is iRoopDeepFaceCam? It’s like a super-powered digital mask. It uses the power of artificial intelligence (AI) to change faces in videos and images. This tool was inspired by projects like roop taking the best parts of to make something even more exciting.

  • What can it do? iRoopDeepFaceCam can do a bunch of cool stuff:

    • Face Swap: Takes your face (or other faces in an image) and swaps it with a face in real-time using a webcam.
    • Video/Image Face Swap: Replace faces in videos and images from files on your computer.
    • Mouth Masking: It seamlessly blends the original mouth with a swapped face, making the deep fakes look incredibly real, especially when you talk or eat.
    • Face Tracking: It can track one to two faces, or one to ten faces, when two or more people are in a scene or moving around, ensuring the face swap stays on the right person.
    • Live Transformation: Use it with OBS (Open Broadcaster Software) virtual camera to deep fake your face live on streams!

Core Concepts

Let's understand the magic behind the scenes.

  • Face Detection:

    • Think of this like a very clever detective. The program needs to first find where faces are in an image or video frame. To do this, it uses the InsightFace AI, which is really good at finding faces, even if they are turned sideways or partially hidden.
    • Imagine you have a picture with many objects in it. The face detector is the part of the program that finds where the face objects are, and ignores all other objects.
    • Once it finds a face, it puts a box around it.
  • Face Embeddings:

    • Once the face is found, the AI makes a "fingerprint" for it, called an embedding. This fingerprint is like a unique code that describes how your face looks.
    • It doesn't just remember the shape; it remembers how all the different parts of your face, like the eyes, nose, and mouth are spaced apart from each other.
    • The face embedding is a bunch of numbers.
  • Face Swapping:

    • Now the fun part! The program takes the "fingerprint" (embedding) of the source face and puts it on the target face.
    • Using face bounding box and the AI embedding, a new face is created for every face in the target frame or video. This is done using a special algorithm so that the face looks real.
    • It’s like replacing one digital mask with another, but doing it so smoothly that it looks like you are actually using a digital mask.
  • Mouth Masking:

    • The mouth mask is a really cool feature that does two things. It helps to blend both the face and the mouth to make it more realistic. It works by cutting out the area of the target mouth, and blending it onto the swapped face.
    • Imagine you put a digital mask on your face. The mouth mask will help blend the area of your mouth into the digital mask. So you can eat, drink, talk, and move your mouth, just like your real face does. The mask makes it look super real!
    • The mouth mask is a really unique feature of iRoopDeepFaceCam.
  • Face Tracking:

    • This is like giving the program a memory for faces. It uses the face "fingerprint" (embeddings) and positions of the faces to keep track of one or two faces as they move around. It remembers what faces looked like and where they were.
    • Using a weighted average for position and embeddings it can predict where a face will move to and then continue swapping that particular face, even if the person moves or changes their position.
      • It also has multi face tracking where it can track up to 10 faces in a frame.
    • This is super useful for keeping the face swap on the right person in videos or if there are two people in a frame or more.

Project Structure

iRoopDeepFaceCam is organized into several files, each with a specific purpose. Think of it like building with LEGOs; each piece (file) has a unique function, and they all fit together to create the whole project.

  • File Overview:

    • capturer.py: Handles capturing frames from a video or camera.
    • metadata.py: Stores information about the project, like the name, version, and edition.
    • core.py: This is the main file. It contains the core logic of the application. It processes command line arguments, starts and stops the application.
    • face_analyser.py: This file contains the code for finding faces in a frame, and generating embeddings. It’s like the brains of the face detection process.
    • typing.py: Defines custom data types used in the project, makes the code easier to read and understand.
    • predicter.py: Contains the logic to determine if a video or image is not safe for work (NSFW).
    • globals.py: Stores all the settings and variables that can be accessed in every file in the project.
    • ui.py: Contains code for the user interface of the application, such as the buttons, the preview window etc. It is what you see when you start the application.
    • README.md: The documentation for the project that you are reading.
    • ui.json: The colour style sheet for the user interface
    • modules/processors/frame
      • face_swapper.py: The heart of the face-swapping process. It contains the code to use the InsightFace AI to swap the faces.
      • face_enhancer.py: Contains the code to enhance the faces, making the face look higher in quality.
      • core.py ( in modules/processors/frame ): Manages the different modules that do the frame processing. It sets up each module to work for each frame, including face swap and enhance.
  • Module Descriptions:

    • Core: The core.py file is like the project manager, it sets everything up to start running, it reads what settings the user chooses, and puts all the other modules to work. It also manages the resources of the computer, and makes sure everything is cleaned up. It also has the starting point of the program.
    • Face Swapper: The face_swapper.py file contains the core algorithms for the face swapping. It takes in the source and target face, and create a new image with swapped faces.
    • User Interface: The ui.py file sets up all the buttons and options for the user to use. It is what displays the window that you see, and takes in your inputs.
    • Face Enhancer: The face_enhancer.py file enhances the face to make them look smoother and better, using the GFP-GAN AI.
    • Utilities: The utilities.py file includes functions that make life easier, such as file handling, downloading models, checking file types, and making temporary files. It's like the toolbox for the project.
    • Face Analyser: The face_analyser.py is a bridge to the powerful insightface library, which is used to find and analyse faces in an image. It can be used to find and select faces.
    • NSFW Filter: The predicter.py uses the opennsfw2 to detect NSFW (Not Safe for Work) images or videos. This helps make sure that the software isn't used for bad content.
      • Global Settings: The globals.py is used to store all the variables and settings that are used throughout the project. Think of this like a shared memory for all the project files. When you change settings on the window it will save to the global.txt.
    • Type Definitions: The typing.py file is used to create custom data types, such as Face and Frame. This helps to make the code easier to read, and also tell the computer how to expect each data type.
      • Frame Processing Core: The core.py in modules.processors.frame directory is used to set up all the modules that are used to process each frame, such as face_swapper, and face_enhancer.

Code Breakdown

Now, let's go deeper into the code itself. We'll highlight key parts, focusing on how the face swap works. I will put the images that show in the README.md in the explanations, and reference them.

  • Face Swapping (face_swapper.py):

    1. Pre-checks: The code first checks to make sure all the files are downloaded, and the source path has a face. pre_check() function checks if the necessary face swap models have been downloaded by checking the ../models folder, as shown in the picture in the README.md under the section "How do I install it?". It also uses conditional_download() from utilities.py to make sure the inswapper_128_fp16.onnx file is downloaded.

    2. Getting the Face Swapper:

      • The get_face_swapper() function loads the inswapper_128_fp16.onnx face-swapping model. This file is located in the models folder. The model is from the insightface library.
      • It uses a thread lock to make sure that only one part of the program is using the model at once, preventing errors.
    3. Swapping the Faces:

      • The swap_face() function is where the magic happens. It takes the source face, the target face, and the image frame, then swaps the faces using the FACE_SWAPPER model loaded in step 2.
      • A mask is made that matches the target face. A Gaussian Blur is added to the mask to blend the edges for a natural effect.
    4. Face Detection and Selection:

      • The function _detect_faces() will detect all faces in the frame.
      • The _select_target_faces() selects which face or faces to swap in a frame based on different settings.
        • If many_faces is on, then it selects and swaps all the faces.
        • If both_faces is on, then selects and swaps two faces from the left of the frame, unless detect_face_right is enabled, then it swaps two faces from the right.
        • If face tracking is off, then only one face will be swapped, it will pick the left most face, unless detect_face_right is enabled, then it will pick the right most face.
        • If face tracking is on, then it will pick the face that looks most like the face that the system is tracking.
    5. Face Masking

      • The _compute_mouth_masks() function will create a mask for the mouth that is used in the _apply_mouth_masks() function. The mouth mask uses the InsightFace's landmark points for accurate masking as shown below:

        MouthMaskDemo

      • The create_mouth_mask() function is used to create a rectangular mask using InsightFace's landmark points for an accurate mouth mask.

        MouthMask

      • The create_lower_mouth_mask() function is a more accurate mask for mouth, this is the mask that is shown in the picture above.

    6. Main Frame Processing: * The process_frame() function will do all the different processing depending on what options you chose. * It starts by checking if the face tracking is enabled or has been reset by clicking on a button. * It will detect and select target faces. * It then creates the mouth mask if the option is enabled. * The face tracking system is complex which will be covered in the next section. * It then applies the face swapping or if no face tracking, applies the face swapping without tracking. * If the show face bounding box option is enabled, then the landmarks for each face is drawn onto the frame.

    7. Image and Video Processing:

      • The process_image() function applies the face swap to one image.
      • The process_video() function applies the face swap to a video using the multi threading functions from core.py. It also sets up to reset the face tracking if it is used.
  • Face Enhancement (face_enhancer.py):

    • Loading the Enhancer: The get_face_enhancer() function loads the GFPGANv1.4.pth model using the GFP-GAN library. It uses a thread lock to only allow one part of the code access at one time.
    • The enhance_face() function does the work of enhancing the face to make it look smoother and better.
      • The process_frame() function is used to find all the target faces using the face_analyser and enhances all faces using enhance_face().
    • It uses the same methods of cropping the face and adding a mask as face_swapper.py
  • User Interface (ui.py):

    1. Creating the Main Window:

      • The init() function sets up the main application window. It calls the create_root() function to create the main window. It also calls the functions to create both the image and live preview windows.
    2. Setting up the GUI elements: The create_root() function sets up all the different parts of the window:

      • Labels: Labels are used to display text and images (such as the selected image or video)

      • Buttons: Buttons, like "Select a face/s," "Select a target," and "Start," trigger specific actions when clicked.

      • Switches: Switches are used to toggle options like "Use First Two Source Faces," "Keep fps," and "Mouth Mask." You can see these switches in the picture below under "One/Two face controls for webcam mode, video or image" from the README.md.

        FaceControls

      • Dropdown menus: Dropdown menus are used to select settings like "Stickiness Factor" "Pseudo Threshold" or the "Frame Rotation"

      • Status Labels: Status labels are used to show information to the user, such as what is the current stage of processing.

      • Frames: Frames are used to group sections together and give a visual divide between different areas

    3. Preview Windows:

      • The create_preview() function creates the live camera preview window. It also sets the window to stay on top if the option is enabled.
      • The create_preview_image() sets up the image preview window that is used for the single image face swap.
    4. Updating the Display:

      • The update_status() function is used to display messages to the user, updating the text shown at the bottom of the window.
      • The update_preview() function shows a preview of what the face swap will look like. It can load from an image or a video and displays it in the preview window.
      • The webcam_preview() shows a preview of the live camera in a new window. It can also do face swapping with it. It uses OpenCV to capture the video from the webcam, then it loads the processed image to the preview window using fit_image_to_preview()
    5. File Selection:

      • The select_source_path(), select_target_path(), and select_output_path() functions display dialog windows to select files from your computer. These are shown in the image below.

        instruction

      • The image on the left is what appears after clicking "select a face/s", the center image is what appears after clicking on "select a target", and finally the window on the right is what appears after clicking the "Start" button.

    6. NSFW Check: * The check_and_ignore_nsfw() function prevents processing of Not Safe For Work images, videos or frames using the opennsfw2 library.

    7. UI Logic: * The update_tumbler() function is called when the face enhancer option is toggled. It toggles whether face enhancer is in the frame processing queue. * The toggle_preview(), toggle_preview_cam() are used to display or remove the preview windows. * The update_preview_size() is used to change the camera resolution if using a webcam.

    • The update_camera_resolution() is used to set the camera resolution.
    • The both_faces(), many_faces(), face_tracking(), mask_size(), mask_down_size(), mask_feather_ratio_size(), stickyface_size(), flip_faces(), detect_faces_right(), stickiness_factor_size(), pseudo_threshold_size() functions are used to set the variables in globals.py when specific options are selected by the user on the UI window.
    • The clear_face_tracking_data() function is used to reset all tracked faces.
    • The embedding_weight_size(), weight_wistribution_size(), position_size(), old_embedding_size(), new_embedding_size() functions are used to set the face tracking variables.
  • Face Tracking (Partially in face_swapper.py):

    1. How Face Tracking Works (From the Wiki article):

      • "iRoopDeepFaceCam's auto face tracking feature leverages advanced Artificial Intelligence (A.I.) to create a dynamic and accurate face swapping experience. Imagine teaching a sophisticated robot to recognize and follow faces in real-time—that's essentially what's happening. The system identifies and "memorizes" faces in your target video or webcam, then tracks them as they move, ensuring a consistent and engaging face-swap.

      • The core idea is to use AI to analyze faces and their positions within a frame (video, webcam) and then accurately map them to faces in your source image(s). This mapping enables the face swapping to occur on the correct target faces, even if those faces change position over time."

      • This feature is automatically enabled when you turn on the 'Auto Face Track' switch.

        FaceTracking

    2. The Code Behind Face Tracking

      • The functions _process_face_tracking_single(), _process_face_tracking_both(), _process_face_tracking_many() in the face_swapper.py use the face embeddings to track the faces. The main principle is that, if a face looks similar and has not moved to far from a previous location, then it is most likely the same face.
        • _process_face_tracking_single() is for tracking only one face. This uses a weighted average for both position and embedding for smooth face tracking.
          • The system detects the faces using _detect_faces() and loops each face to find the best match.
          • The system compares each face using the extract_face_embedding() and cosine_similarity() to check the embeddings, also a position check is done using the get_face_center() to determine the location of the faces, compared to the previous position.
          • If the score between the embedding and position matches the threshold, then that face is considered as the same face and face swap is done, otherwise the old face is used.
          • If the tracked face is lost for too long and use_pseudo_face is enabled, then a fake face is used.
        • _process_face_tracking_both() is used when the user wants to track only 2 faces. This code uses a similar system as _process_face_tracking_single(). A weighted average is used for position and embeddings, and if the tracked faces are lost, the fake faces are used.
        • _process_face_tracking_many() is used when the user wants to track up to 10 faces. This function uses the same core principles as the above tracking functions, but instead tracks up to 10 faces.
    3. Resetting Face Tracking: The reset_face_tracking() function clears all variables for face tracking, so a new face is detected for tracking when a new frame appears, or new webcam is used.

  • Core Logic (core.py):

    1. Command Line Arguments: The parse_args() function reads settings, such as source file, target file, output file, if you specify those command line arguments when starting the program in the terminal.

    2. Setting up the Application:

      • The run() function starts the program by parsing arguments and calls the pre_check() function. It will then initialize all the resources including the face_analyser and call the UI.
      • The start() function is the beginning of the frame processing. It will start by making temp files and then extract the frames or images.
      • The limit_resources() is used to limit the computer memory used by the program.
      • The release_resources() function releases the GPU memory after it is used.
    • The update_status() function is used to update the status at the bottom of the screen.
    1. Execution Providers: It also sets up the execution providers for ONNX, which can be CPU, GPU (CUDA), or other options based on your hardware.
    2. Cleaning Up: The destroy() function stops the application, and cleans up all temporary files and resources.
    3. Frame Processing Modules:
      • The function get_frame_processors_modules() finds and loads all the frame processing modules that were chosen by the user in the UI. The set_frame_processors_modules_from_ui() allows the loading of modules from settings on the UI
      • The multi_process_frame() starts multi threaded processing and process_video() sets up the multi threaded processing for a video.
  • Utilities (utilities.py):

    • FFmpeg Operations: The code uses run_ffmpeg() to do actions such as: extracting frames, creating videos, and restoring audio.
    • File Handling: Functions to create temporary files/folders create_temp(), moving files, move_temp() or deleting them clean_temp(), checking file types is_image(), is_video() has_image_extension(), and make folder paths get_temp_directory_path().
    • Downloads: The conditional_download() function will download required models if it cannot find them.
    • Path Handling: The resolve_relative_path() is used to get the correct file path to models etc.

Advanced Topics

  • Execution Providers:

    • What are they? Execution providers are like different "engines" that run the AI models. iRoopDeepFaceCam uses ONNX Runtime, which can use CPU or GPU "engines".
      • CPU: Uses your computer's central processing unit. Good for basic usage but slow for face swapping, especially videos.
      • CUDA: Uses NVIDIA graphics cards. Gives the best performance.
      • CoreML: Uses Apple silicon GPUs. Good if you have a Mac
      • DirectML Used for Windows GPU cards, but may not work well with NVIDIA.
      • OpenVINO Used for Intel CPU with integrated GPU, but may not work with iRoopDeepFaceCam.
    • Setting Execution Providers: You can select them when you run the program from the command line, as described in the installation section of the README.md. For example:
      • To run with CPU use: python run.py --execution-provider cpu
      • To run with NVIDIA GPU: python run.py --execution-provider cuda --execution-threads 5
    • Why they matter? Choosing the right provider makes a huge difference in how fast the program works. GPU providers are much better than the CPU especially with videos. You must ensure you have all the correct software requirements.
  • Face Tracking Settings

    • What they do? The face tracking settings lets the user choose how face tracking is managed.
    • How they work?
      • These settings help to control how the AI keeps track of the faces. These are located in the lower section of the 'Auto Face Track' frame. As shown below in the README.md.
        FaceSettings

      • EMBEDDING WEIGHT: How important the face's unique features (fingerprint) are for tracking a face. Higher means more importance on the features for tracking the faces.

        • Imagine each face has a special "fingerprint". This number tells us how much we care about matching these "fingerprints" when we're trying to track a face from one frame to the next.
      • WEIGHT DISTRIBUTION: A way to adjust how much we care about the face's features compared to its position. If it's bigger, we care more about the features. If it's smaller, we care more about the position.

      • POSITION WEIGHT: How important the position of a face is for tracking. Higher means more importance to position.

        • If a face doesn't move much between frames, it's probably the same face, right? This number tells us how important we think that is.
      • OLD WEIGHT and NEW WEIGHT: How much to remember the face from before and how much to pay attention to how it looks now.

        • When we're tracking a face over time, we don't want to forget what it looked like before, but we also want to update our idea of what it looks like now. OLD_WEIGHT is how much we remember about the face from before, and NEW_WEIGHT is how much we pay attention to what it looks like right now.
      • Where to find them

        • These settings are located in the lower section of the 'Auto Face Track' frame. The controls are shown in the image above.
  • Face Rotation

    • How to use

      • The 'Face Rotation Range' setting allows you to choose from 0 90 180 -90.
      • As shown in the README.md

      facerotation

      • This is done by rotating the source image around it's central axis. This allows for an accurate face swap if the target's face is not looking straight at the camera.
    • Why is it used As explained in the README.md InsightFace has limits to how much rotation it can accurately process. This fixes that. The code for rotating is found in face_swapper.py the function _rotate_frame().

  • Pseudo Face

    • How to use
      • You can enable 'Pseudo Face' option by clicking the check box in the UI.
      • If you have a small blockage of a face or the score is lower then "Pseudo Threshold" value, the system will generate a fake face at the last known location when 'Pseudo Face' option is on and perform a face swap with that fake face.
      • If the face is missing for to long (MAX_LOST_COUNT) then it is better to reset the face tracking.

Performance

  • Factors Affecting Speed:

    • Execution Provider: The single most important factor for performance is your hardware. If you do not have a Nvidia GPU then you are more then likely going to have slow performance.
    • Video Resolution: Higher resolution videos take more time to process.
    • Frame Processors: If you have face enhancer enabled, then it will take longer since the process has to be done twice for each frame.
    • Number of Faces: If the "many_faces" setting is enabled, it will take more processing, compared to only one face.
    • Face Tracking: The "face_tracking" adds time to the program because every face must be tracked from frame to frame to maintain accuracy.
  • Tips for Improving Performance:

    • Use a GPU: If you have an NVIDIA GPU, CUDA execution provider is a MUST. This will increase the speed of processing drastically.
    • Lower Video Resolution: Use a lower resolution if the processing is too slow.
    • Choose less frame processors: Only chose "face_swapper" if "face_enhancer" is not required.
    • Reduce threads: If you have low ram, you can decrease the number of threads to 4 or 2, or 1. But this will decrease your performance
    • Use correct NVIDIA Drivers: Ensure that you have the correct NVIDIA drivers that are compatible with your NVIDIA card and CUDA toolkit.

Conclusion

iRoopDeepFaceCam is a powerful tool that brings deepfake technology to life. By understanding its core concepts, code structure, and advanced options, you can harness its full potential for creative and fun projects. We hope this wiki article provides a solid foundation for both users and developers alike.

Remember to always use this technology responsibly and ethically!

InsightFace landmark_2d_106

Face Outline Mask

    face_outline_indices = [1, 43, 48, 49, 104, 105, 17, 25, 26, 27, 28, 29, 30, 31, 32, 18, 19, 20, 21, 22, 23, 24, 0, 8,
                            7, 6, 5, 4, 3, 2, 16, 15, 14, 13, 12, 11, 10, 9, 1]

Mouth Mask

    lower_lip_order = [65, 66, 62, 70, 69, 18, 19, 20, 21, 22, 23, 24, 0, 8, 7, 6, 5, 4, 3, 2, 65]
    toplip_indices = [20, 0, 1, 2, 3, 4, 5]
    chin_indices = [11, 12, 13, 14, 15, 16] 

2d106markup-jpg