For Beginners - jcjohnson/neural-style GitHub Wiki
Basic Neural-Style Commands:
-
The
-init
parameter is by default set torandom
, though from what I have seen, most people get better results by setting it toimage
. -
I would also suggest using a
-seed
value of your choice, or a random value. The-seed
value will let you run the exact same command again, to create the exact same image. This is useful for experimenting with parameters, while ruling out changes due to the default random-seed
value that Neural-Style uses. -
Make sure to save the parameters you use in something like a text file, so that you can refer back to them and learn from them, in the future.
-
The
-save_iter
can be used to monitor the progress of Neural-Style, and to make sure your chosen parameters are working. By default, the-save_iter
command is set to output an image every 100 iterations, though I have found that setting it to output an image every 50 iterations works better for debugging. -
The final output image doesn't have to be your chosen artwork piece to share. By using the
-save_iter
command effectively, you can end up with many different variations of your output image saved at different iterations. For example, I had a nice output image that looked the best at 50 iterations, and at 400 iterations, so while the two images are a bit different, I kept them both to share. -
An interesting Neural-Style phenomenon is increasing or decreasing both content and style weight so that their ratio stays the same. Many of us know that doing that gives different results. Higher absolute weights results in higher loss which gives higher gradients which again result in larger changes per iteration (similar to a higher learning rate). Source
-
Rotating, reflecting, and/or cropping either your style or content image, will change the output image that Neural-Style produces.
-
If your style image has a lot of horizontal or vertical lines/shapes and you want the final output to match the orientation of the content image's geometry, then make 3 other copies of your style image which each copy being rotated an additional 90 degrees. Then use all 4 versions of the style image as your new style images. Feel free to experiment with different numbers of style images, and rotation values. These 2 scripts automate the tedious process of creating rotated and/or reflected versions of your style image(s). Note that this technique will remove geometry like rain falling at an angle, so this technique will not work for every style image.
-
Having the same style image multiple times like for example:
-style_image style1.png,style1.png,style2.png
, will also change the output image, as you are adjusting the style blend weights in a different way than how the-style_blend_weights
command adjusts the blending weights. -
VGG-16 models seem to generally produce "smoother" results, than VGG-19 which create more fine details.
-
Depending on the style image(s) and content image, both
-optimizer adam
and-optimizer lbfgs
can create almost the same result, or very different results. Using the ADAM optimizer is generally considered to be "worse" than the L-BFGS optimizer, but I have come across situations where ADAM created the better final output. -
Neural-Style supports image formats that are supported by the Torch Image library, which are
JPEG
,PNG
,PPM
, andPGM
. -
The order in which the style images are listed with the
-style_image
parameter, does have an affect on the output image. -
The chosen
-num_iterations
value also affects output images. For example using-num_iterations 1500
and-save_iter 500
, will result in a different output at iteration 500, than using-num_iterations 5500
and-save_iter 500
at iteration 500. -
The
-cudnn_autotune
parameter can speed up the style transfer process, at the cost of increasing the amount of GPU memory that is required/used. As a result, omitting the-cudnn_autotune
parameter will result in the ability to utilize more GPU memory elsewhere with larger values for parameters like-image_size
and-style_scale
. -
When using multiscale resolution, a
-tv_weight
of0
seems to result in a better result. Source. But some artists create really nice looking output with a total variance weight that is greater than zero. -
Using an extremely low
-tv_weight
value that's balanced at just the right value, one can destroy the unwanted artifacts which the NIN model while not suffering the issues that a TV weight greater than zero can cause.
Content And Style Image:
-
Neural-Style uses a neural network to perform artistic style transfer, and as such it's easier to think of it as an AI (though neural networks are not AI and are seen as a stepping stone towards to creation of real AI). This means that in addition to your chosen parameters, the model (AI brain) you are using will affect the abilities of Neural-Style. This also means that the lighting, details, and other aspects of an image, will affect Neural-Style as well.
-
Neural-Style does not understand what "image quality" is. This means that using a low quality or low resolution style image will result in a blurry output image.
-
Various tools exist that can let you see the image as Neural-Style sees it.
More Advanced Observations:
-
Normally style transfer that has extremely bright or extremely dark colors, runs into issues (Possibly not with similar style and content image contents, like: a picture and a painting of people with the same zoom). But using content and style images with similar shapes/content, can alleviate these issues. These Histogram Matching Scripts also try to modify the brightest and darkest colors, and thus can be used to help alleviate the issues with dark and light colors. Histogram Matching also seems to improve the spacial coherence of the final output in terms of coloration (i.e., instead of partially blended "color spots", the color locations will be placed in a way that better matches the styled content image).
-
Style and content images seem to blend better when they share complimentary, or similar geometry and shapes (that are also similar sizes). Though this is more of a possible guideline, and it does not apply to every combination of content and style image(s).
-
Motion blur, or any other filter or lens that makes some of an image blurry, can and will be made non-blurry by the style transfer process, if the image is used as a content image.
-
Some art mediums seem to work really well as content images, though the exact mediums (realistic painting?) and the reason why, are still a mystery.
-
Upwards of 1000 style images have been tested at once with no apparent memory issues, though it can take a while for Neural-Style to process large numbers of style images being used in the same command.
-
The
-style_scale
parameter works by changing the size of your style image(s). By default, it transfers the style at a 1:1 scale onto the content image. If you use a value lower than one, then bigger objects (like people, structures, etc...) and geometry in your style image are made smaller as they are transferred to the content image. If you use a value larger than 1, then the opposite happens. -
One of the ways to create larger images, is to use tiling. Tiling is separated into 2 groups, internal, and external. External scripts like Neural-Tile require a bash/shell script and careful planning of the
-save_iter
and-num_iterations
parameters. Internal scripts like VaKonS' neural-style, handle all the tiling internally, at the cost of control over the individual tiles. Though internal tiling solutions seem to produce no visible lines/regions between tiles, whereas external solutions do produce visible lines/regions separating tiles.
Troubleshooting:
If you end up getting loss values appearing as nan
or inf
, you can try changing the -image_size
value ever so slightly to resolve the issue.
Scripts:
- Neural-Style scripts are essentially add-ons for Neural-Style. Though in some cases, they are actual modifications of the Neural-Style code, instead of new ways to use the normal code.