Link Time Optimization
Link Time Optimization (LTO) is a feature that allows the compiler to retain its internal representation of a program or module and use it later with different compilation units to perform optimizations during linking. Also see Link Time Optimization on the GCC wiki. This page will show you how to compile and link with LTO.
Generally speaking, Link Time Optimizations causes the library to slow down. Overall the library performance gets worse. Based on our Benchmark results, and with all things being equal, you should probably avoid LTO. But be sure to Benchmark your program to determine if it is profitable to use LTO.
You should avoid Link Time Optimizations using GCC on ARM platforms. GCC LTO appears to be bent or broken on ARM. GCC LTO on x86_64 and Aarch64 appear to be OK. You should also avoid Link Time Optimizations using Clang on all platforms. Clang LTO appears to be completely bent or broken on all platforms.
Issue 865, LTO build fails due to missing "-m" flags in linker command, showed we had gaps in our testing procedures because we were not testing under the configuration.
Also see Can LTO minor version be updated in backward compatible way? on the GCC mailing list for some potential versioning troubles.
Several wiki pages exist for command line builds using various compiler and platform. Also see Category:Command Line.
GCC Options
You can build the library with LTO using the following GCC options. In addition to the GCC options, you must change AR to gcc-ar and RANLIB to gcc-ranlib. Crypto++ drives link through the compiler so you don't need to do anything with LD. The same CXXFLAGS are used for compile and link.
$ AR=gcc-ar RANLIB=gcc-ranlib \ CXXFLAGS="-DNDEBUG -O2 -flto=6 -g -fPIC -pthread" make -j 4 Using testing flags: -DNDEBUG -O2 -flto=6 -g -fPIC -pthread g++ -DNDEBUG -O2 -flto=6 -g -fPIC -pthread -pipe -c cryptlib.cpp g++ -DNDEBUG -O2 -flto=6 -g -fPIC -pthread -pipe -c cpu.cpp g++ -DNDEBUG -O2 -flto=6 -g -fPIC -pthread -pipe -c integer.cpp ... gcc-ar r libcryptopp.a cryptlib.o cpu.o integer.o ... gcc-ranlib libcryptopp.a ... g++ -o cryptest.exe -DNDEBUG -O2 -flto=6 -g -fPIC -pthread -pipe adhoc.o test.o bench1.o bench2.o bench3.o datatest.o dlltest.o fipsalgt.o validat0.o validat1.o validat2.o validat3.o validat4.o validat5.o validat6.o validat7.o validat8.o va lidat9.o validat10.o regtest1.o regtest2.o regtest3.o regtest4.o ./libcryptopp.a
Clang Options
You can compile the library with LTO using the following Clang options. However, the build breaks during link. It appears Clang selects the wrong linker. Additionally, there is no clang-ar for AR and no clang-ranlib for RANLIB. Trying to reuse GCC's variables result in the same error.
$ AR=gcc-ar RANLIB=gcc-ranlib \ CXX=clang++ CXXFLAGS="-DNDEBUG -O2 -flto -g -fPIC -pthread" make -j 4 Using testing flags: -DNDEBUG -O2 -flto -g -fPIC -pthread clang++ -DNDEBUG -O2 -flto -g -fPIC -pthread -DCRYPTOPP_DISABLE_MIXED_ASM -pipe -c cryptlib.cpp clang++ -DNDEBUG -O2 -flto -g -fPIC -pthread -DCRYPTOPP_DISABLE_MIXED_ASM -pipe -c cpu.cpp clang++ -DNDEBUG -O2 -flto -g -fPIC -pthread -DCRYPTOPP_DISABLE_MIXED_ASM -pipe -c integer.cpp ... gcc-ar r libcryptopp.a cryptlib.o cpu.o integer.o ... gcc-ranlib libcryptopp.a ... clang++ -o cryptest.exe -DNDEBUG -O2 -flto -g -fPIC -pthread -DCRYPTOPP_DISABLE_ MIXED_ASM -pipe adhoc.o test.o bench1.o bench2.o bench3.o datatest.o dlltest.o f ipsalgt.o validat0.o validat1.o validat2.o validat3.o validat4.o validat5.o vali dat6.o validat7.o validat8.o validat9.o validat10.o regtest1.o regtest2.o regtes t3.o regtest4.o ./libcryptopp.a /bin/ld: ./libcryptopp.a: error adding symbols: archive has no index; run ranlib to add one clang-8: error: linker command failed with exit code 1 (use -v to see invocation)
Using Bitcodes
An open question for the library is, should LTO be used for the library. We don't know the answer at the moment.
Our initial feeling is, the intermediate bit code is probably a bad idea because the code is malleable and the final machine object code cannot be audited after compilation. That is, the code can change after compilation.
The concern is not about GCC or Clang LTO in particular; it is a general concern about malleable object code in general. It applies to Apple bitcode as well. And the concern is not about Crypto++ in particular; it applies to all high integrity code modules, like Botan and OpenSSL, too.
GCC ARM Platform
Here is what the GCC LTO error looks like on ARM platforms.
g++ -o cryptest.exe -DNDEBUG -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-str ong -funwind-tables -fasynchronous-unwind-tables -flto=6 -g -fpic -fPIC -pthread -fopenmp adhoc.o test.o bench1.o bench2.o bench3.o datatest.o dlltest.o fipsalg t.o validat0.o validat1.o validat2.o validat3.o validat4.o validat5.o validat6.o validat7.o validat8.o validat9.o validat10.o regtest1.o regtest2.o regtest3.o r egtest4.o ./libcryptopp.a -lgomp pubkey.h:640:26: warning: type ‘struct TF_ObjectImpl’ violates the C++ One Defin ition Rule [-Wodr] class CRYPTOPP_NO_VTABLE TF_ObjectImpl : public TF_ObjectImplBase<BASE, SCHEME_ OPTIONS, KEY_CLASS> ^ pubkey.h:640:26: note: a different type is defined in another translation unit class CRYPTOPP_NO_VTABLE TF_ObjectImpl : public TF_ObjectImplBase<BASE, SCHEME_ OPTIONS, KEY_CLASS> ^ pubkey.h:651:11: note: a different type is defined in another translation unit ... make[1]: *** [/tmp/cc1QfZK2.ltrans17.ltrans.o] Error 1 /usr/lib/gcc/arm-linux-gnueabihf/7/include/arm_neon.h: In function ‘BLAKE2_Compr ess32_NEON’: /usr/lib/gcc/arm-linux-gnueabihf/7/include/arm_neon.h:10401:47: fatal error: You must enable NEON instructions (e.g. -mfloat-abi=softfp -mfpu=neon) to use these intrinsics. return (uint8x16_t)__builtin_neon_vld1v16qi ((const __builtin_neon_qi *) __a); ... ^ compilation terminated.
Clang All Platforms
Here is what the Clang LTO error looks like on all platforms. Also see Clang Issue 42684 LTO and error adding symbols: archive has no index; run ranlib to add one.
clang++ -o cryptest.exe -DNDEBUG -O2 -Wall -flto -g -fpic -fPIC -pthread -DCRYPT OPP_DISABLE_MIXED_ASM -pipe adhoc.o test.o bench1.o bench2.o bench3.o datatest.o dlltest.o fipsalgt.o validat0.o validat1.o validat2.o validat3.o validat4.o val idat5.o validat6.o validat7.o validat8.o validat9.o validat10.o regtest1.o regte st2.o regtest3.o regtest4.o ./libcryptopp.a /bin/ld: ./libcryptopp.a: error adding symbols: archive has no index; run ranlib to add one clang-8: error: linker command failed with exit code 1 (use -v to see invocation ) make: *** [GNUmakefile:1324: cryptest.exe] Error 1
Performance
Running the full Benchmark suite on a Skylake machine at 2.7 GHz results in an overall drop in performance. Below, bigger Throughput is better.
Configuration | Throughput |
---|---|
With LTO | 1261.012811 |
Without LTO | 1286.652288 |
Common Errors
If you receive a stream of messages from AR or RANLIB, then you probably did not use gcc-ar or gcc-ranlib.
ar: creating libcryptopp.a ar: cryptlib.o: plugin needed to handle lto object ar: cpu.o: plugin needed to handle lto object ar: integer.o: plugin needed to handle lto object ... ranlib libcryptopp.a ranlib: libcryptopp.a(cryptlib.o): plugin needed to handle lto object ranlib: libcryptopp.a(cpu.o): plugin needed to handle lto object ranlib: libcryptopp.a(integer.o): plugin needed to handle lto object ...