Using Other Neural Models - jcjohnson/neural-style GitHub Wiki

Neural Models

This page is still under construction, testing and/or training alternative models is definitely not for beginners. If you want to help testing/share your findings, feel free to get in contact via issue #217 or send a message to the editors.

Introduction to models

Next to the mentioned VGG 19 and NIN models, there are other models available which can be used with neural-style. This page aims to list most of them, what they do, when to use them, how to use them, how they perform and where to get them. It is recommended to never remove the pre-installed files in your /neural-style-master/models folder.

What are neural models

Neural models themselves are quite complicated, some see them as the beginnings of AI while others see them as a gimmick. For neural style however, they are ways to roughly interpret your image like a human would. But some people see a half-empty glass while others half-full, some people see a bunch of barf on a white piece of paper, others see true art. The same can be said for different neural styles, depending on how they are coded and what they have experienced, they interpret things differently.

Trained models vs untrained networks

Neural networks need to be trained with images in order to be useful in neural-style. One should understand the difference between a structure of a neural network and a fully trained model, roughly like a difference between a computer without or with software.

VGG19, for instance, refers to a certain neural network structure defined in the deploy (or train) prototxt file. When we talk of VGG19 model which we use in neural-style we are actually referring to a VGG19 network trained with ILSVRC image dataset to classify images. You should be aware of that there can (and do) exist VGG19 networks trained differently, and that this can make a difference how they behave in neural-style. A model trained with different data will respond differently to features in images. In fact, on this page you will find pointers to several differently trained models based on VGG16 and VGG19. They all share the same structure (VGG16 or VGG19 respectively, so that the same prototxt file works) but the weights contained in the .caffemodel file are different, causing the model behave in a different way. Note that weight here refers to the weights affecting the operation of the artificial neural cells within the model, not the weight parameters in neural-style.

Installing new models

Neural-style should work with models in .caffemodel format unless they contain features not supported in torch, (like AlexNet which needs GROUP).

A model exist out of 2 files: a .caffemodel and a .prototxt, the .caffemodel file is the actual model and the .prototxt describes the structure of the neural network. These models can in theory be copied to any folder in your /neural-style-master folder, but it's recommended to copy them to your /neural-style-master/models folder to keep things organized. If you decide to copy them somewhere else you'll have to modify the commands somewhat.

NOTE: If you start experimenting with additional models, it may be a good practice to create subdirectories in the models directory, becauser there is no convention for naming models, so you may find several different files named model.caffemodel and train.prototxt.

For using a model in neural-style, you need a prototxt with either "deploy" or "train" in the filename. In many cases, a deploy prototxt is not provided, and a train.prototxt also has the necessary contents to describe the network. Prototxt files named "solver" contain settings for the training process.

Basic usage

To use the newly installed models you'll have to call them using the parameters -model_file models/[modelname] and -proto_file models/[protoname] In which [modelname] and [protoname] are the full names of the files with their extension. Next to that you also have to call different layers with -content_layers and -style_layers, which layers have to be called differs for each model. You can find which have to be called in the .prototxt file like this:

layers {

bottom: "conv1_1"

top: "conv1_1"

name: "relu1_1" //This is the name you'll use for the -content_layers and -style_layers

type: RELU //The type needs to be RELU

}

Write down all the type: RELU layers and then separate them by comma's, in the end it could should look like this.

-content_layers relu2,relu5,relu8,relu11 -style_layers relu2,relu5,relu8,relu11

Note that you have to call the exact same layers for both the content and the style.

Comment: You CAN use or omit layers freely for content and style according to what you want or if you want to experiment. Different layers contribute to the result in different ways. But you need to give the layers explicitly for both style and content. Note also that higher layers may give in smaller loss values, so you omit some lower layers you might want increase content weight accordingly.

NIN example

th neural_style.lua -style_image [image1] -content_image [image2] -output_image [outimage] -model_file models/nin_imagenet_conv.caffemodel -proto_file models/train_val.prototxt -content_layers relu0,relu3,relu7,relu12 -style_layers relu0,relu3,relu7,relu12

Model List

This section will try to list most working models and all their details, anyone can add their findings using the following skeleton:

model name by company/creator/group

short description of the model, what it is trying to achieve, what the results were and how it performed

Model file: fullmodelname.caffemodel

Proto file: fullprotoname.prototxt

Layers used: relulayer#,relulayer#,relulayer#,relulayer#,relulayer#

Basic command:

th neural_style.lua -style_image [image1] -content_image [image2] -output_image [outimage] -model_file models/fullmodelname.caffemodel -proto_file models/fullprotoname.prototxt -content_layers relulayer#,relulayer#,relulayer#,relulayer#,relulayer# -style_layers relulayer#,relulayer#,relulayer#,relulayer#,relulayer#

Source: link to where caffemodel and prototxt can be downloaded

VGG-19 by VGG team

The standard caffemodel used in neural style, this one should already be installed. Creates good results without tweaking, but uses a high amount of resources even with smaller images. The normalized version is also present.

Model file: VGG_ILSVRC_19_layers.caffemodel or vgg_normalised.caffemodel

Proto file: VGG_ILSVRC_19_layers_deploy.prototxt

Layers used: relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Basic command:

th neural_style.lua -style_image [image1] -content_image [image2] -output_image [outimage] -model_file models/VGG_ILSVRC_19_layers.caffemodel -proto_file models/VGG_ILSVRC_19_layers_deploy.prototxt -content_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1 -style_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Normalized command:

th neural_style.lua -style_image [image1] -content_image [image2] -output_image [outimage] -model_file models/vgg_normalised.caffemodel -proto_file models/VGG_ILSVRC_19_layers_deploy.prototxt -content_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1 -style_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Source: Github

NIN-Imagenet by ImageNet project

A small caffemodel using the Imagenet, uses using fewer resources and can therefor be used to achieve higer resolution images. May need heavy tweaking to achieve reasonable results.

Model file: nin_imagenet_conv.caffemodel

Proto file: train_val.prototxt

Layers used: relu0,relu3,relu7,relu12

Basic command:

th neural_style.lua -style_image [image1] -content_image [image2] -output_image [outimage] -model_file models/nin_imagenet_conv.caffemodel -proto_file models/train_val.prototxt -content_layers relu0,relu3,relu7,relu12 -style_layers relu0,relu3,relu7,relu12

Source: ~~Google Drive~~

https://www.dropbox.com/s/cphemjekve3d80n/nin_imagenet.caffemodel

https://www.dropbox.com/s/faj8t135bh1i45t/train_val.prototxt

Illustration2vec model by the Illustration2Vec team:

Best used with anime content and/or style images. Outputs can be scaled past 8k resolution using Waifu2x. 8k resolution example. Be warned that it can sometimes be difficult to avoid the burn marks that the model sometimes creates.

Model file: illust2vec_ver200.caffemodel

Proto file: illust2vec.prototxt

Basic Command:

th neural_style.lua -style_image style.jpg -content_image content.jpg -output_image out.png -model_file models/illust2vec_ver200.caffemodel -proto_file models/illust2vec.prototxt

Source:

http://illustration2vec.net/models/illust2vec.prototxt

http://illustration2vec.net/models/illust2vec_ver200.caffemodel

Model Files are from this github page:

https://github.com/rezoo/illustration2vec

The links for the prototxt file seem to be broken, so here's a mirror I made: https://gist.github.com/ProGamerGov/9d2bee7110159611519e7fecbd31c1ae

The caffemodel can be donwloaded via:

wget https://github.com/rezoo/illustration2vec/releases/download/v2.0.0/illust2vec_ver200.caffemodel

VGG-ILSVRC-16 also known as VGG-16 by VGG team

Similar to VGG-19 but seems to work better with finer details like faces on some content images. Similar resource usage to VGG-19. Released in 2014?

Model file: VGG_ILSVRC_16_layers.caffemodel

Proto file: VGG_ILSVRC_16_layers_deploy.prototxt

Layers used: relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Basic command:

th neural_style.lua -style_image [image1] -content_image [image2] -output_image [outimage] -model_file models/VGG_ILSVRC_16_layers.caffemodel -proto_file models/VGG_ILSVRC_16_layers_deploy.prototxt -content_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1 -style_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Source: Github

CNN Object Proposal Models for Salient Object Detection by VGG16_SOD_finetune team

Similar to VGG-ILSVRC-16, hard to tell which is better (May be better than VGG-ILSVRC-16 but more testing is needed to confirm). Same resource usage as VGG-16. Released in 2016.

Model file: VGG16_SOD_finetune.caffemodel

Proto file: deploy.prototxt

Layers used: relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Basic command:

th neural_style.lua -style_image [image1] -content_image [image2] -output_image [outimage] -model_file models/VGG16_SOD_finetune.caffemodel -proto_file models/deploy.prototxt -content_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1 -style_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Source: Github

VGG-16 Places365 by MIT

Made for the Places365-Challenge which includes the Places2 Challenge 2016, the ILSVRC and the COCO joint workshop at ECCV 2016. Places365 is the successor to the Places205 model.

Model file: vgg16_places365.caffemodel

Proto file: deploy_vgg16_places365.prototxt

Layers used: relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Basic command:

th neural_style.lua -style_image [image1] -content_image [image2] -output_image [outimage] -model_file models/vgg16_places365.caffemodel -proto_file models/deploy_vgg16_places365.prototxt -content_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1 -style_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Source: Github

VGG16 Hybrid1365 by MIT

Made for the Places365-Challenge which includes the Places2 Challenge 2016, the ILSVRC and the COCO joint workshop at ECCV 2016. Places365 is the successor to the Places205 model.

Model file: vgg16_hybrid1365.caffemodel

Proto file: deploy_vgg16_hybrid1365.prototxt

Layers used: relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Basic command:

th neural_style.lua -style_image [image1] -content_image [image2] -output_image [outimage] -model_file models/vgg16_hybrid1365.caffemodel -proto_file models/deploy_vgg16_hybrid1365.prototxt -content_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1 -style_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Source: Github

Prototxt File is missing from the Github page but can be found here: http://places2.csail.mit.edu/models_places365/deploy_vgg16_hybrid1365.prototxt

PASCAL VOC FCN-32s by University of California, Berkeley

Uses more resources than VGG-19, but can produce better results depending on your style and/or content image.

Model file: fcn32s-heavy-pascal.caffemodel

Proto file: train.prototxt

Layers used: relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Basic command:

th neural_style.lua -style_image [image1] -content_image [image2] -output_image [outimage] -model_file models/fcn32s-heavy-pascal.caffemodel -proto_file models/train.prototxt -content_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1 -style_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Source: Github

DeepLab Pre-Trained Model by DeepLab

I was not able to test higher layers due to them requiring more resources than I had. The first 3 layers produce very nice results. Seems to preform better with art style images as opposed to real world picture style images. When using the following layers for both style and content: relu1_1,relu1_2,relu2_2 resource usage is similar to VGG-16. Results appear similar to PASCAL VOC FCN-32s and if you are only using the 3 layers, you can create higher resolution images than PASCAL VOC FCN-32s can.

Model file: model.caffemodel

Proto file: deploy_x30.prototxt

Layers used: relu5_3,relu7,relu5_2,relu5_1,relu4_1,relu3_3,relu3_1,relu2_2,relu1_2,relu1_1

Basic command:

th neural_style.lua -style_image [image1] -content_image [image2] -output_image [outimage] -model_file models/model.caffemodel -proto_file models/deploy_x30.prototxt -content_layers relu1_1,relu1_2,relu2_2 -style_layers relu1_1,relu1_2,relu2_2

Source: Project page

Download Version 2 of this model here. Direct download link here.

SOD Finetune "Low Noise" by ProGamerGov

A specially fine-tuned version of the SOD Finetune model. It was fine-tuned on a custom dataset. This model may not be significantly different from the original SOD Finetune model. A comparison of the Neural-Style outputs produced by the SOD Finetune "Low Noise" model, and the VGG-16 Places365 Hybrid "Plaster Version", can be found here: https://i.imgur.com/0mn3gWI.png

Model file: VGG16_SOD_finetune_low_noise.caffemodel

Proto file: vgg16_low_noise.prototxt

Layers used: relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Basic command:

th neural_style.lua -style_image [image1] -content_image [image2] -output_image [outimage] -model_file models/VGG16_SOD_finetune_low_noise.caffemodel -proto_file models/vgg16_low_noise.prototxt -content_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1 -style_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Source: https://drive.google.com/open?id=171Q6xFWuYqAPCFm4TItXNAXpIke4v8eX

VGG-16 Places365 Hybrid "Plaster Version" by ProGamerGov

A specially fine-tuned version of the MIT VGG16-hybrid1365 model. It was fine-tuned on a custom dataset. A comparison of the Neural-Style outputs produced by the SOD Finetune "Low Noise" model, and the VGG-16 Places365 Hybrid "Plaster Version", can be found here: https://i.imgur.com/0mn3gWI.png

Model file: VGG-ImageNetPlaces365-hybrid_plaster.caffemodel

Proto file: vgg16_plaster_train_val.prototxt

Layers used: relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Basic command:

th neural_style.lua -style_image [image1] -content_image [image2] -output_image [outimage] -model_file models/VGG-ImageNetPlaces365-hybrid_plaster.caffemodel -proto_file models/vgg16_plaster_train_val.prototxt -content_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1 -style_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Source: https://drive.google.com/open?id=1C6mwikdQTxcav_9u2C8uuCVKd1_tWfyG

Rough_Faces by ProGamerGov

Based on the VGG16_SOD_finetune model architecture. Finetuned using a roughly sorted data set composed of male and female faces. Same resource usage as VGG-16. Some more information on the creation of the model can be found here. Released in 2017.

The Rough Faces model was created to improve the preservation of facial features with style transfer. Models that were saved every 1000 iterations during the training process are available for download, but I would suggest using models from 10000 and above for the best facial preservation effect.

Model file: VGG16_SOD_finetune_rough_faces_iter_10000.caffemodel

Proto file: vgg16_train_val.prototxt

Layers used: relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Basic command:

th neural_style.lua -style_image [image1] -content_image [image2] -output_image [outimage] -model_file models/VGG16_SOD_finetune_rough_faces_iter_10000.caffemodel -proto_file models/vgg16_train_val.prototxt -content_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1 -style_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1

Source: https://drive.google.com/open?id=0BxzV7RTyx-ZAUE9SQ0lpX0tBcGM

Download all of the models listed from NIN to DeepLab V1 by pasting this into the terminal:

https://gist.github.com/ProGamerGov/ccf80be4b63f62dd6cf60db0b5c7ddbf

Note that the following caffemodel and prototxt files had their names changed:

VGG16 SOD Finetune:

deploy.prototxt to VGG16_SOD_finetune_deploy.prototxt

PASCAL VOC FCN-32s:

train.prototxt to fcn32s-heavy-pascal_train.prototxt

DeepLab V1:

model.caffemodel to DeepLab_V1.caffemodel

train.prototxt to DeepLab_V1_train.prototxt

train_trainval_aug.prototxt to DeepLab_V1_train_trainval_aug.prototxt

List of working caffemodels

These have yet to be converted to the skeleton format because need testing to see what results they can achieve and how much resources they use

VGGs versions with fewer layers (VGG-16, VGG-13 etc)

-These use less resources but have worse recognition as a result. (via: sdz)

-htoyryla: For me, VGG16 ILSVRC works very much similar to VGG19 ILSVRC. On the other hand, the memory footprint is not significantly smaller either. Note also that there are several different trained VGG16 models (see below).

-ProGamerGov: VGG16 ILSVRC seems to work better for finer details like faces, for some content and style images.
VGG Face for face recognition

-This one is trained to recognize faces, though it has not produced great results (via: Hannu)

-download from http://www.robots.ox.ac.uk/~vgg/software/vgg_face/ (vgg_face_caffe.tar-gz)

-this network was trained to recognize the faces of celebrities. I (htoyryla) have tried to use it to process portrait photos without much success. See http://liipetti.net/erratic/2016/02/19/the-endless-possibilities-of-neural-networks/ . It seems to me that the model does not generalize well for facial features (which would be needed for processing portraits).
VGG Places for landscapes/cityscapes

-Produced good results for landscapes and cityscapes (via: Hannu)

-Download files from http://places.csail.mit.edu/downloadCNN.html (Link for Places205-VGG). VGG-16 CNN trained on 205 scene categories of Places Database with 2.5 million images.

-memory usage typical for VGG16, around 3.8G on CPU for default images at 512px.

-Basic results to compare with other models here.

Basic Command:

th neural_style.lua -style_image [image1] -content_image [image2] -output_image [outimage] model_file models/snapshot_iter_765280.caffemodel -proto_file models/deploy_10.prototxt

VGG deeplab

-there are many VGG16 models trained differently at http://ccvl.stat.ucla.edu/ccvl

-start from model.caffemodel, train.prototxt from http://ccvl.stat.ucla.edu/ccvl/DeepLab/ , they work well with neural-style defaults, memory usage similar typical to VGG16, around 3.8G on CPU for default images at 512px.

-Tested the FOV model. Can create square images up to 680px with the opencl backend and 8GB VRAM, CUDA performance should be better. adam has almost no effect on VRAM usage. (via:sdz)

-FOV has great results with "long" images even with adam, other types of images have slightly worse results than a normal VGG-16 model. Note that long images can be used with a higher -image_size (via: sdz)

-FOV has better resource usage than VGG-16 models when using "long" images. (via: ProGamerGov)
VGG codalab
- htoyryla: found out that the VGG19 from codalab site is identical to the VGG19 ILSRVC which is the default in neural-style
Places_CNDS_models
- Did not work when using all layers, but after i specified certain layers it would produce good results
- Great resource use, got up to 1500px with 8GB of VRAM, comparable to the NIN model.
- These layers were specified: relu1,relu2,relu3_1,relu3_2,relu4_1,relu4_2,relu5_1,relu5_2
VGG Pose-Aware Models

Pose-Aware Models

There are 5 models and it was just released in July of 2016.
Typical VGG-16 resource usage on the PAM 2D in-plane frontal model. The PAM 2D in-plane frontal model seems to produce solid results, though I am not sure about the others. Results seemed better when using an image with the human head being sideways, as opposed to facing forward. More testing is needed to understand how well it works.

This is it for now, more testing and time is needed to test all the other available models, you can find more models and information about them below, feel free to test them for yourself and add them to this wiki using the provided skeleton.

Sources

Caffe Model Zoo documentation

The Model Zoo on Github

Hannu's blog, testing with different neural models

Many thanks to Hannu (htoyryla) for informing me about this, he'll probably correct all mistakes I unknowingly made in this article, be it direct or indirect.