Optimization: Size - unhappy-ending/gentoo-clang-served-rice GitHub Wiki

This is Gentoo's default compiler settings and what the following examples will be tested against when using the LLVM profile. dev-build/cmake and dev-build/ninja are being used as test subjects for their minimal usage of developer set *FLAGS and being coded in C++ for de-virtualization testing.

-O2 -pipe -Wl,-O1 -Wl,--as-needed -Wl,-z,pack-relative-relocs

# stat /usr/bin/cmake
Size: 8375264

# stat /usr/bin/ninja-reference
Size: 283736

Code generation

The following examples show how code generation flags can change a program's size during compile and link time.

-fdata-sections

# stat /usr/bin/cmake
Size: 8375168

# stat /usr/bin/ninja-reference
Size: 283736
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 8375168 -0.01%
ninja 283736 283736 0.00%

-ffunction-sections

# stat /usr/bin/cmake
Size: 8375280

# stat /usr/bin/ninja-reference
Size: 283736
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 8375280 +0.01
ninja 283736 283736 0.00

-Wl,--gc-sections

# stat /usr/bin/cmake
Size: 8357120

# stat /usr/bin/ninja-reference
Size: 283344
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 8357120 -0.22
ninja 283736 283344 -0.13

-Wl,--icf=all

# stat /usr/bin/cmake
Size: 8019008

# stat /usr/bin/ninja-reference
Size: 263176
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 8019008 -4.25
ninja 283736 263176 -7.25

-Wl,-O2

# stat /usr/bin/cmake
Size: 8359632

# stat /usr/bin/ninja-reference
Size: 283464
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 8359632 -0.19
ninja 283736 283464 -0.10

-fdata-sections -Wl,--gc-sections

# stat /usr/bin/cmake
Size: 8343072

# stat /usr/bin/ninja-reference
Size: 283240
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 8343072 -0.38
ninja 283736 283240 -0.17

-ffunction-sections -Wl,--gc-sections

# stat /usr/bin/cmake
Size: 8175408

# stat /usr/bin/ninja-reference
Size: 273920
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 8175408 -2.39
ninja 283736 273920 -3.46

-fdata-sections -ffunction-sections -Wl,--gc-sections

# stat /usr/bin/cmake
Size: 8184016

# stat /usr/bin/ninja-reference
Size: 262760
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 8184016 -2.28
ninja 283736 262760 -3.51

-fdata-sections -ffunction-sections -Wl,--icf=all

# stat /usr/bin/cmake
Size: 7917680

# stat /usr/bin/ninja-reference
Size: 262760
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 7917680 -5.46
ninja 283736 262760 -7.39

-fdata-sections -ffunction-sections -Wl,--gc-sections -Wl,--icf=all

# stat /usr/bin/cmake
Size: 7751248

# stat /usr/bin/ninja-reference
Size: 253128
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 7751248 -7.45
ninja 283736 253128 -10.79

-fdata-sections -ffunction-sections -Wl,--gc-sections -Wl,--icf=all -Wl,-O2

# stat /usr/bin/cmake
Size: 7737568

# stat /usr/bin/ninja-reference
Size: 252952
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 7737568 -7.61
ninja 283736 252952 -10.85

-Wl,--gc-sections -Wl,--icf=all -Wl,-O2

# stat /usr/bin/cmake
Size: 7992912

# stat /usr/bin/ninja-reference
Size: 262576
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 7992912 -4.57
ninja 283736 262576 -7.46

-Wl,--gc-sections -Wl,--icf=all

# stat /usr/bin/cmake
Size: 8006736

# stat /usr/bin/ninja-reference
Size: 262784
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 8006736 -4.40
ninja 283736 262784 -7.38

Link time optimization

Passing -flto has potential to reduce the size of a binary.

-flto

# stat /usr/bin/cmake
Size: 8154144

# stat /usr/bin/ninja-reference
Size: 262784
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 8154144 -2.64
ninja 283736 261312 -7.90

De-virtualization

The following applies only to C++ code.

-fstrict-vtable-pointers

# stat /usr/bin/cmake
Size: 8337136

# stat /usr/bin/ninja-reference
Size: 279416
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 8337136 -0.46
ninja 283736 279416 -1.52

The following examples require -flto to be passed. -fwhole-program-vtables being passed without -flto will cause the compiler to fail with an error message. -fvirtual-function-elimination requires -fwhole-program-vtables but will not cause a compiler failure, it will simply be ignored with no change to the resulting binary.

-fwhole-program-vtables

# stat /usr/bin/cmake
Size: 8169040

# stat /usr/bin/ninja-reference
Size: 262360
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 8169040 -2.46
ninja 283736 262360 -7.53

-fvirtual-function-elimination

# stat /usr/bin/cmake
Size: 8140128

# stat /usr/bin/ninja-reference
Size: 260648
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 8140128 -2.81
ninja 283736 260648 -8.14

Combinations

Combining the above principles can offer various methods to maximize file size reduction. It's not always possible to use all methods for all code. Try to use what works best when possible.

Code generation + link time optimization

-fdata-sections -ffunction-sections -flto -Wl,--gc-sections -Wl,--icf=all -Wl,-O2

# stat /usr/bin/cmake
Size: 7851360

# stat /usr/bin/ninja-reference
Size: 254488
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 7851360 -6.26
ninja 283736 254488 -10.31

Link time optimization + de-virtualization

-flto -fstrict-vtable-pointers

# stat /usr/bin/cmake
Size: 8111456

# stat /usr/bin/ninja-reference
Size: 257408
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 8111456 -3.15
ninja 283736 257408 -9.28

-flto -fstrict-vtable-pointers -fvirtual-function-elimination -fwhole-program-vtables

# stat /usr/bin/cmake
Size: 8087888

# stat /usr/bin/ninja-reference
Size: 256664
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 8087888 -3.43
ninja 283736 256664 -9.54

In these particular cases, combining link time optimization flags with code generation flags doesn't reduce the binary size more than code generation flags.

Code generation + Link time optimization + de-virtualization

-fdata-sections -ffunction-sections -flto
-fstrict-vtable-pointers -fvirtual-function-elimination -fwhole-program-vtables
-Wl,--gc-sections -Wl,--icf=all -Wl,-O2

# stat /usr/bin/cmake
Size: 7792240

# stat /usr/bin/ninja-reference
Size: 250080
Binary Baseline Size Optimized Size Reduction %
cmake 8375264 7792240 -6.96
ninja 283736 250080 -11.86

Notable results

The following results are notable for good file size reduction.

Flag(s) cmake ninja
-Wl,--icf=all -4.25 -7.25
-fdata-sections -ffunction-sections -Wl,--gc-sections -Wl,--icf=all -Wl,-O2 -7.61 -10.85
-flto -2.64 -7.90
CGO + LTO + DVO -6.96 -11.86

-Wl,--icf=all is particularly notable for being a single flag with good file size reduction and is an available option when using -fuse-ld=lld or -fuse-ld=mold. Unlike -flto, -Wl,--icf=all doesn't increase compile time to the degree that LTO does. It's also relatively safe to flip on and there is -Wl,--icf=safe that's a little less aggressive for those who desire a more conservative approach to cooking rice.

-flto by itself can reduce file size fairly well but comes with the drawback of increased link time. It also doesn't always guarantee a performance increase, sometimes ending up with a performance penalty instead. When adding -flto to CGO, the file size reduction regressed compared to just CGO.

A good overall option to reduce file size is using full CGO. It's available for all C and C++ code, doesn't require LTO, and isn't restricted like DVO which is C++ only and relies heavily on LTO.

The best theoretical option is CGO+LTO+DVO but as demonstrated with cmake it doesn't always produce the best results. It did produce the best file size reduction overall as demonstrated with ninja, proving the theoretical aspect of compiler optimization flags.


In conclusion

Small file size savings aren't massive gains but as the OS grows larger the reduction savings also grow. Using the measured examples supplied the average percentage is a 9% size reduction. A /usr directory that's sitting at 10gb would hover around 1gb of storage space saved compared to baseline. This is also true for programs residing in memory, the smaller the footprint the more RAM available for other needs.

As storage space and RAM storage increases in size but also drop in price a 10% difference doesn't seem that impressive. People using constrained systems would have greater benefits in this case, but free space is free space so why not?


Caution!

It's important to note that dev-build/cmake fails during the test phase when passing -flto, -fforce-emit-vtables, -fvirtual-function-elimination, -fwhole-program-vtables, and -Wl,--icf=all.

The flags tested in this page are for experimental purposes only. Do not enable them system wide without testing and being a capable troubleshooter for build and runtime failures. Especially in the case of cmake since it's required to build and run properly. It's an integral building tool, don't mess it up!