Face Swap Tab - C0untFloyd/roop-unleashed GitHub Wiki
In this tab the actual face swapping work will be done. The screen is divided into an input area where you specify source and destination faces and media, a preview area to preview single frames of a fake, the post-processing and masking area where you can finetune a lot of things and specify extra enhancement or masking.
Input Area
-
Here you can drag'n drop multiple images or faceset files which contain faces you want to use as the deepfake sources (you can also click on it to open a filedialog). The detected faces will be added to the gallery of selected input faces. 💡 You can always add additional faces from other images by clearing the box (by clicking on the x in the corner) and dropping other image files on it
-
The gallery of selected faces so far. With the Buttons you can either remove the current selected face from the list or completely remove them all. (2a) With the sliders (2a) for 'Offset Face Top' you are able to offset the faked face so that e.g. hairlines can be restored from the original. For example faking this face with a nice haircut onto the woman from the example screenshot leads to this abomination:
Using an offset of 168 creates a much better result, as this restores the original clear forehead:
Offset Bottom is basically the same but for the lower half of the face (for example restoring the originals mouth area).
The values can be adjusted for each face individually, most of the time the default of 0 will be ok anyway.
- Here you can drop your target files (or click on it to open a filedialog). roop-unleashed supports batch-processing, meaning you can drop and select multiple files. These can be image files and video files. When selecting a file the image (or first frame if it's a videofile) will be shown in the preview area. Using the video fps slider you can force a video to a fixed fps, overriding the detected ones.
- The gallery of selected target faces. Same functionality as input faces, can be chosen in the preview mask. There is a 1 to 1 swapping relationship for input and target faces. This means the 1st input face will be copied onto the 1st target face, the 2nd to the 2nd in that gallery and so on.
💡 Target faces will only be used when swapping mode 'Selected' is chosen in the swapping type dropdown.
Preview Area
- Display of original image or faked image if 'Face swap Frames' (2) checkbox is turned on. Also used for displaying the masking preview.
- Toggles display of original or deepfaked image
- Use this button to manually refresh display.
- This Button starts the face detection for the current image/Videoframe. If only a single face is detected, it will be automatically added to the target face gallery (5). If there are multiple faces, the same selection dialog as for input faces will pop up.
- Allows you to view single frames of a video, either for previewing or choosing a good frame to select a target face. Using the 'Set as Start/End' Buttons one can specify to only process parts of a video. The final video will then be clipped between start & end.
Face Selection Dialog
For every face you want to use, select it and click 'Use selected Face'. Click 'Done' when you're finished.
Face swapping parameters, enhancement options and masking
This specifies how the source (input) faces should be applied to the target images or video frames. Options are:
- First found the first face from left to right will be swapped
- All faces paste the input face on every found target face
- Selected face maps input faces to specified target faces
- All female swaps faces being detected as female only
- All male swaps faces being detected as male only
When using 'Selected Face' your different input faces will be swapped to your selected target faces in the same order they are. If you selected Input Faces A1,A2,A3,A4 and you have target faces B1,B2,B3,B4 - A1 will try to detect face B1 in every image and if found, swap it with A1. A2 will swap with B2 and so on...
💡 Only 'Selected Face' uses multiple input faces, all others use the first input face only.
Only needed when face swapping mode 'selected' with target faces is used. In that mode the target faces to swap to need to be detected. This is done by a similarity comparison with the one you've chosen previously. A value of 0.0 would mean asbolutely identical, a value of ~ 0.9 and above is very different, could be another face. A range from 0.65 to 0.75 should cover most cases, if you have extreme angles you could try increasing it to about 0.85
Selects how to process the files. Choices:
- Extract Frames to media All video frames will be extracted first and written to files in numeric order. After that they will be processed and overwritten again with the results. At the end a video will be created from the batch of images.
- In-Memory processing Everything will be done in memory, nothing gets written except for the final image/video
💡 'Extract Frames to media' allows you to inspect the frames before they are converted to video. You also can keep the converted images for your own processing.
How to treat images/frames when no face could be detected. Options are:
- Use untouched original frame The original frame will be used in the video
- Retry rotated The image will be flipped 180 degree and face detection/swapping will run again. If there still is no face detected, the original frame will be used.
- Skip Frame This frame is ignored and will not be used for the resulting video (only works for In-Memory processing mode)
The current face swapping model is limited to a resolution of 128x128 px. To work around this for high quality results, you can turn on different face restoration/upscaler models. Options:
- None No enhancement
- Codeformer A very advanced but slow model, very good at skin and identity preservation. https://github.com/sczhou/CodeFormer
- DMDNet A fairly new enhancer with 2 modes which are picked automatically. https://github.com/csxmli2016/DMDNet
- GFPGAN Well known and fairly robust and fast. https://github.com/TencentARC/GFPGAN
- GPEN Seems to handle eyes and occlusions quite well. https://github.com/yangxy/GPEN
The blend ratio slider controls how much of the original image is used in the final enhanced image. This can actually improve the result, because otherwise the enhancement might be too strong and will be very much visible. A value of 1.0 means only the enhanced image will be used, a value of 0.0 would only show the original. The default value of 0.65 is therefore a blend of 65% original and 35% enhanced.
💡 DMDNet will automatically choose the specific mode when facesets are used as input. Be aware that this needs much VRAM and slows down processing a lot.
Options:
- Skip audio Don't restore the audio from the original video back into the deepfake video.
- Keep frames Leaves all the processed images in the temp folder. Usefull if you want to do further stuff with them or create a video by yourself.
- Wait for user key... Before creating the final video, roop-unleashed will wait for a key press in the terminal. That way you can do some last minute changes to the processed frames like deleting unwanted images etc.
This area is for masking occlusions on faces, which will then be restored from the original, unfaked image to the final image. That way e.g. a person drinking a cup of coffee won't have the half-transparent fake mouth overlaid on the final image.
Options:
- Use Text Masking Needs to be checked to use text masking
- List of objects... Here you can enter a list of comma-separated words with the names of the objects you want to restore. There is no official keyword list, it's often trial and error.
- Engine This is a placeholder for things to come. There are more and better masking methods out there, which are waiting to be implemented.
Clicking on 'Show Mask Preview' will process the target image and tries to find the object(s) you entered. Once processed, the masks will be shown in the preview area. If nothing is found the preview area will stay black and empty.
💡 Be creative with your object words. For example sometimes 'mouth' doesn't work but 'chin' does. 'Shades' doesn't work but 'glasses' might.