How to use a custom image with StyleCLIP - lmmx/devnotes GitHub Wiki

Preparing a latent code in e4e

You can't directly open a photo in StyleCLIP, you first need to encode it "into" the space of images "known" to StyleGAN2. To do this you run encoder4editing ("e4e").

My tip is: you can encode a batch of multiple selfies in one go and then pick the best. It helps if you number them as 01.jpg, 02.jpg, ... so that you can tell which came from which in the output folder.

I tried a few different selfies until I got one that was not too far from my likeness (it's not that close, I'm curious how the authors got such a good one of Or Patashnik! Probably lots of trial and error).

Once you've run e4e, you'll need the latents.pt (but can discard the other files it generates).

Setting up StyleCLIP

Note: if your GPU requires CUDA 11, use nvidia-tensorflow instead of TensorFlow. My suggested conda setup is below

Click to show suggested conda setup for CUDA 11

conda create -y -n styleclip12
conda activate styleclip12
conda install -y "python<3.7" -c conda-forge # Python 3.6.13 (restricted by TensorFlow 1.x dependency)
pip install nvidia-pyindex
pip install nvidia-tensorflow==1.15.4 # only available for Python 3.6, replaces tensorflow-gpu==1.15.2
conda install -y "cudatoolkit>=11,<11.2" -c pytorch # 11.0.221
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
pip install git+https://github.com/openai/CLIP.git # forces pytorch 1.7.1 install
pip install pandas requests opencv-python matplotlib scikit-learn
git clone https://github.com/omertov/encoder4editing.git

cd global
python GetCode.py --code_type "w"

If all went well, you will be able to run GetCode.py here with success, if not you will get a CUDA compilation failure within TensorFlow ("Setting up TensorFlow plugin "fused_bias_act.cu": Failed!"). More details in the issue I have now closed.


Copy the global/data/ffhq folder to then reuse it

cd global/data
mv ffhq ffhq_backup
mkdir ffhq

Now regenerate the folder's .jpg images and w_plus.npy from the latents.pt you generated from encoder4editing.

To prepare, run GetCode.py:

cd global
python GetCode.py --code_type 'w'
python GetCode.py --code_type 's'
python GetCode.py --code_type 's_mean_std'

This will download stylegan2-ffhq-config-f.pkl into global/model/ from NVIDIA's CDN, you can also download it yourself from Google Drive... I didn't put it in the right place even though I had it, woops...

If you encode multiple images, latent.pt will correspond to the latent code for multiple inversion images, as found in the repo by default, which has 68 celebrity face photos.

For example, I tried a few selfies and then chose the best as my "Sim" and encoded it on its own, so that the latents.pt code was just the single latent code for that good image, in results_louis_good/.

cd global
PATH_TO_LATENTS="/home/louis/Pictures/2021/curios/style_test/results_louis_good/latents.pt"
cp "$PATH_TO_LATENTS" data/ffhq/
python GetGUIData.py --real

Your inversion image (which was generated alongside latents.pt by encoder4editing) will now be reproduced at global/data/ffhq/0.jpg and there'll be a w_plus.npy now, so you can proceed to open the GUI viewer and interact with the image via CLIP using text prompts.

Using your custom image in StyleCLIP

python PlayInteractively.py
  • The default --dataset-name is "ffhq" (see the repo README for other options). FFHQ has been precomputed for use here (I expect that changing the dataset may require you to run the preprocessing script for a few hours)
⚠️ **GitHub.com Fallback** ⚠️