Setting up the stylegan2 environment - jimenalozano/face-generator GitHub Wiki

Overview of the requirements of stylegan2

Linux is recommended for performance and compatibility reasons.
64-bit Python 3.6 installation
Numpy 1.14.3 or newer.
We recommend TensorFlow 1.14, which we used for all experiments in the paper, but TensorFlow 1.15 is also supported on Linux. TensorFlow 2.x is not supported.
One or more high-end NVIDIA GPUs
NVIDIA drivers
CUDA 10.0 toolkit
cuDNN 7.5
StyleGAN2 relies on custom TensorFlow ops that are compiled on the fly using NVCC.

Hardware availability

Because this is a university project, ITBA provided us an NVIDIA Titan Xp GPU to set up the Face Generator. This seems the best choice, being Docker the second choice considered, as maintainability of the project is one of the top attributes we look at. We expect to have an easy and straight forward access to the generator in order to correct errors, and improve functionalities, while maintaining the functionalities and the precision of the results of the generator to the best possible quality. One of the first things we tried to do when we had access to the server, was running the test_nvcc.cu with NVCC:

nvcc test_nvcc.cu -o test_nvcc -run

CPU passed the test, but a GPU compatibility error was returned from CUDA: "non supported HW". This meant the CUDA version required was not compatible with the one that was installed in the GPU. Before making any changes to a server that is being used by other students and professors, ITBA gave us a system image that had almost nothing installed on, so we could start from scratch. In this way, we could avoid any compatibility errors with previously installed hardware requirements or software libraries.

As expected, when logging for the first time to the new server image, the NVCC command passed the test succesfully, and the following step by step was followed in order to start trying the stylegan2 generator.

Step by step

These were the steps followed for a correct machine learning environment set up:

Python 3.6 installation in virtual environment.
Numpy 1.16 installation in virtual environment. (Higher versions showed warnings)
Tensorflow-gpu 1.14 installation in virtual environment. (Tensorflow 1.14 did not seem to recognize GPU device)
CUDA 10.0 toolkit in Titan, from NVIDIA, and cuDNN 7.5 from cuDNN Archive.

Here we encountered an issue, as CUDA 11.2 was previously installed, and tensorflow could not recognize the GPU device. The steps followed to solve conflicts were found on Stackoverflow:

sudo cp -P cuda/targets/ppc64le-linux/include/cudnn.h /usr/local/cuda-10.0/include/
sudo cp -P cuda/targets/ppc64le-linux/lib/libcudnn* /usr/local/cuda-10.0/lib64/
sudo chmod a+r /usr/local/cuda-10.0/lib64/libcudnn*

And modifying the .bashrc file:

export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH

And adding these lines to the .bashrc file, according to this Github issue:

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

The conflict was still there, and this answer from Stackoverflow helped to finally solve it:

sudo rm /usr/local/cuda/lib64/libcudnn.so
sudo rm /usr/local/cuda/lib64/libcudnn.so.7
cd /usr/local/cuda/lib64/
sudo ln -s libcudnn.so.7.6.5 libcudnn.so.7
sudo ln -s libcudnn.so.7 libcudnn.so
sudo ldconfig -v

After all this steps were done, we were able to run the generator in stylegan2.