How to use a custom image with StyleCLIP - lmmx/devnotes GitHub Wiki
You can't directly open a photo in StyleCLIP, you first need to encode it "into" the space of images "known" to StyleGAN2. To do this you run encoder4editing
("e4e").
My tip is: you can encode a batch of multiple selfies in one go and then pick the best. It helps if you number them as 01.jpg, 02.jpg, ... so that you can tell which came from which in the output folder.
I tried a few different selfies until I got one that was not too far from my likeness (it's not that close, I'm curious how the authors got such a good one of Or Patashnik! Probably lots of trial and error).
Once you've run e4e, you'll need the latents.pt
(but can discard the other files it generates).
Note: if your GPU requires CUDA 11, use
nvidia-tensorflow
instead of TensorFlow. My suggested conda setup is below
Click to show suggested conda setup for CUDA 11
conda create -y -n styleclip12
conda activate styleclip12
conda install -y "python<3.7" -c conda-forge # Python 3.6.13 (restricted by TensorFlow 1.x dependency)
pip install nvidia-pyindex
pip install nvidia-tensorflow==1.15.4 # only available for Python 3.6, replaces tensorflow-gpu==1.15.2
conda install -y "cudatoolkit>=11,<11.2" -c pytorch # 11.0.221
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
pip install git+https://github.com/openai/CLIP.git # forces pytorch 1.7.1 install
pip install pandas requests opencv-python matplotlib scikit-learn
git clone https://github.com/omertov/encoder4editing.git
cd global
python GetCode.py --code_type "w"
If all went well, you will be able to run GetCode.py
here with success, if not you will get a CUDA compilation
failure within TensorFlow ("Setting up TensorFlow plugin "fused_bias_act.cu": Failed!"). More details
in the issue I have now closed.
Copy the global/data/ffhq
folder to then reuse it
cd global/data
mv ffhq ffhq_backup
mkdir ffhq
Now regenerate the folder's .jpg
images and w_plus.npy
from the latents.pt
you generated from
encoder4editing
.
To prepare, run GetCode.py
:
cd global
python GetCode.py --code_type 'w'
python GetCode.py --code_type 's'
python GetCode.py --code_type 's_mean_std'
This will download stylegan2-ffhq-config-f.pkl
into global/model/
from NVIDIA's CDN, you can also download it
yourself from Google Drive... I didn't put it in the right place even though I had it, woops...
If you encode multiple images, latent.pt
will correspond to the latent code for multiple inversion images,
as found in the repo by default, which has 68 celebrity face photos.
For example, I tried a few selfies and then chose the best as my "Sim" and encoded it on its own, so
that the latents.pt
code was just the single latent code for that good image, in results_louis_good/
.
cd global
PATH_TO_LATENTS="/home/louis/Pictures/2021/curios/style_test/results_louis_good/latents.pt"
cp "$PATH_TO_LATENTS" data/ffhq/
python GetGUIData.py --real
Your inversion image (which was generated alongside latents.pt
by encoder4editing
) will now be
reproduced at global/data/ffhq/0.jpg
and there'll be a w_plus.npy
now, so you can proceed to
open the GUI viewer and interact with the image via CLIP using text prompts.
python PlayInteractively.py
- The default
--dataset-name
is "ffhq" (see the repo README for other options). FFHQ has been precomputed for use here (I expect that changing the dataset may require you to run the preprocessing script for a few hours)