Link Time Optimization

From Crypto++ Wiki
Jump to navigation Jump to search

Link Time Optimization (LTO) is a feature that allows the compiler to retain its internal representation of a program or module and use it later with different compilation units to perform optimizations during linking. Also see Link Time Optimization on the GCC wiki. This page will show you how to compile and link with LTO.

Generally speaking, Link Time Optimizations causes the library to slow down. Overall the library performance gets worse. Based on our Benchmark results, and with all things being equal, you should probably avoid LTO. But be sure to Benchmark your program to determine if it is profitable to use LTO.

You should avoid Link Time Optimizations using GCC on ARM platforms. GCC LTO appears to be bent or broken on ARM. GCC LTO on x86_64 and Aarch64 appear to be OK. You should also avoid Link Time Optimizations using Clang on all platforms. Clang LTO appears to be completely bent or broken on all platforms.

Issue 865, LTO build fails due to missing "-m" flags in linker command, showed we had gaps in our testing procedures because we were not testing under the configuration.

Also see Can LTO minor version be updated in backward compatible way? on the GCC mailing list for some potential versioning troubles.

Several wiki pages exist for command line builds using various compiler and platform. Also see Category:Command Line.

GCC Options

You can build the library with LTO using the following GCC options. In addition to the GCC options, you must change AR to gcc-ar and RANLIB to gcc-ranlib. Crypto++ drives link through the compiler so you don't need to do anything with LD. The same CXXFLAGS are used for compile and link.

$ AR=gcc-ar RANLIB=gcc-ranlib \
  CXXFLAGS="-DNDEBUG -O2 -flto=6 -g -fPIC -pthread" make -j 4
Using testing flags: -DNDEBUG -O2 -flto=6 -g -fPIC -pthread
g++ -DNDEBUG -O2 -flto=6 -g -fPIC -pthread -pipe -c cryptlib.cpp
g++ -DNDEBUG -O2 -flto=6 -g -fPIC -pthread -pipe -c cpu.cpp
g++ -DNDEBUG -O2 -flto=6 -g -fPIC -pthread -pipe -c integer.cpp
...
gcc-ar r libcryptopp.a cryptlib.o cpu.o integer.o ...
gcc-ranlib libcryptopp.a
...
g++ -o cryptest.exe -DNDEBUG -O2 -flto=6 -g -fPIC -pthread -pipe adhoc.o test.o
bench1.o bench2.o bench3.o datatest.o dlltest.o fipsalgt.o validat0.o validat1.o
 validat2.o validat3.o validat4.o validat5.o validat6.o validat7.o validat8.o va
lidat9.o validat10.o regtest1.o regtest2.o regtest3.o regtest4.o ./libcryptopp.a

Clang Options

You can compile the library with LTO using the following Clang options. However, the build breaks during link. It appears Clang selects the wrong linker. Additionally, there is no clang-ar for AR and no clang-ranlib for RANLIB. Trying to reuse GCC's variables result in the same error.

$ AR=gcc-ar RANLIB=gcc-ranlib \
  CXX=clang++ CXXFLAGS="-DNDEBUG -O2 -flto -g -fPIC -pthread" make -j 4
Using testing flags: -DNDEBUG -O2 -flto -g -fPIC -pthread
clang++ -DNDEBUG -O2 -flto -g -fPIC -pthread -DCRYPTOPP_DISABLE_MIXED_ASM -pipe -c cryptlib.cpp
clang++ -DNDEBUG -O2 -flto -g -fPIC -pthread -DCRYPTOPP_DISABLE_MIXED_ASM -pipe -c cpu.cpp
clang++ -DNDEBUG -O2 -flto -g -fPIC -pthread -DCRYPTOPP_DISABLE_MIXED_ASM -pipe -c integer.cpp
...
gcc-ar r libcryptopp.a cryptlib.o cpu.o integer.o ...
gcc-ranlib libcryptopp.a
...
clang++ -o cryptest.exe -DNDEBUG -O2 -flto -g -fPIC -pthread -DCRYPTOPP_DISABLE_
MIXED_ASM -pipe adhoc.o test.o bench1.o bench2.o bench3.o datatest.o dlltest.o f
ipsalgt.o validat0.o validat1.o validat2.o validat3.o validat4.o validat5.o vali
dat6.o validat7.o validat8.o validat9.o validat10.o regtest1.o regtest2.o regtes
t3.o regtest4.o ./libcryptopp.a
/bin/ld: ./libcryptopp.a: error adding symbols: archive has no index; run ranlib to add one
clang-8: error: linker command failed with exit code 1 (use -v to see invocation)

Using Bitcodes

An open question for the library is, should LTO be used for the library. We don't know the answer at the moment.

Our initial feeling is, the intermediate bit code is probably a bad idea because the code is malleable and the final machine object code cannot be audited after compilation. That is, the code can change after compilation.

The concern is not about GCC or Clang LTO in particular; it is a general concern about malleable object code in general. It applies to Apple bitcode as well. And the concern is not about Crypto++ in particular; it applies to all high integrity code modules, like Botan and OpenSSL, too.

GCC ARM Platform

Here is what the GCC LTO error looks like on ARM platforms.

g++ -o cryptest.exe -DNDEBUG -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-str
ong -funwind-tables -fasynchronous-unwind-tables -flto=6 -g -fpic -fPIC -pthread
 -fopenmp adhoc.o test.o bench1.o bench2.o bench3.o datatest.o dlltest.o fipsalg
t.o validat0.o validat1.o validat2.o validat3.o validat4.o validat5.o validat6.o
 validat7.o validat8.o validat9.o validat10.o regtest1.o regtest2.o regtest3.o r
egtest4.o ./libcryptopp.a  -lgomp
pubkey.h:640:26: warning: type ‘struct TF_ObjectImpl’ violates the C++ One Defin
ition Rule [-Wodr]
 class CRYPTOPP_NO_VTABLE TF_ObjectImpl : public TF_ObjectImplBase<BASE, SCHEME_
OPTIONS, KEY_CLASS>
                          ^
pubkey.h:640:26: note: a different type is defined in another translation unit
 class CRYPTOPP_NO_VTABLE TF_ObjectImpl : public TF_ObjectImplBase<BASE, SCHEME_
OPTIONS, KEY_CLASS>
                          ^
pubkey.h:651:11: note: a different type is defined in another translation unit
...

make[1]: *** [/tmp/cc1QfZK2.ltrans17.ltrans.o] Error 1
/usr/lib/gcc/arm-linux-gnueabihf/7/include/arm_neon.h: In function ‘BLAKE2_Compr
ess32_NEON’:
/usr/lib/gcc/arm-linux-gnueabihf/7/include/arm_neon.h:10401:47: fatal error: You
 must enable NEON instructions (e.g. -mfloat-abi=softfp -mfpu=neon) to use these
 intrinsics.
   return (uint8x16_t)__builtin_neon_vld1v16qi ((const __builtin_neon_qi *) __a);
...
                                               ^
compilation terminated.

Clang All Platforms

Here is what the Clang LTO error looks like on all platforms. Also see Clang Issue 42684 LTO and error adding symbols: archive has no index; run ranlib to add one.

clang++ -o cryptest.exe -DNDEBUG -O2 -Wall -flto -g -fpic -fPIC -pthread -DCRYPT
OPP_DISABLE_MIXED_ASM -pipe adhoc.o test.o bench1.o bench2.o bench3.o datatest.o
 dlltest.o fipsalgt.o validat0.o validat1.o validat2.o validat3.o validat4.o val
idat5.o validat6.o validat7.o validat8.o validat9.o validat10.o regtest1.o regte
st2.o regtest3.o regtest4.o ./libcryptopp.a
/bin/ld: ./libcryptopp.a: error adding symbols: archive has no index; run ranlib
 to add one
clang-8: error: linker command failed with exit code 1 (use -v to see invocation
)
make: *** [GNUmakefile:1324: cryptest.exe] Error 1

Performance

Running the full Benchmark suite on a Skylake machine at 2.7 GHz results in an overall drop in performance. Below, bigger Throughput is better.

Geometric Average
Configuration Throughput
With LTO 1261.012811
Without LTO 1286.652288

Common Errors

If you receive a stream of messages from AR or RANLIB, then you probably did not use gcc-ar or gcc-ranlib.

ar: creating libcryptopp.a
ar: cryptlib.o: plugin needed to handle lto object
ar: cpu.o: plugin needed to handle lto object
ar: integer.o: plugin needed to handle lto object
...

ranlib libcryptopp.a
ranlib: libcryptopp.a(cryptlib.o): plugin needed to handle lto object
ranlib: libcryptopp.a(cpu.o): plugin needed to handle lto object
ranlib: libcryptopp.a(integer.o): plugin needed to handle lto object
...