Cryptogams
Cryptogams is Andy Polyakov's project used to develop high speed cryptographic primitives and share them with other developers. Andy is renowned for his implementations, and they are used by the Linux kernel, OpenSSL and many private implementations. Crypto++ uses Cryptogams algorithms for AES, SHA-1, SHA-256 and SHA-512 for 32-bit ARMv7.
ARMv7 is one of the dominant use cases for ARM, but Crypto++ had performance gaps on the devices. The code we wrote (both C++ and ASM) could not close the performance gap. Andy's code runs 2x to 3x faster than the library's existing implementations, and closes the gap for us.
Crypto++ added AES under Issue 683, SHA-1 under Issue 837, SHA-256 under Issue 839, and SHA-512 under Issue 841. After the SHA cut-ins we learned of a problem with loading shared objects. The problem was fixed under Issue 846. We then changed the approach to the shared object problem and fixed it again under Issues 847.
Andy offers a permissive BSD-style license so using his code should not be a problem for most projects. If you don't want to use Cryptogams code, then the procedure to disable the implementation is detailed below in config_asm.h. We recommend you take advantage of the code, however.
We documented the general procedures at Cryptogams AES and Cryptogams SHA on the OpenSSL wiki. This wiki page will provide the details when porting to Crypto++.
config_asm.h
The configuration file config_asm.h
enables Cryptogams on 32-bit ARM by default. Enabling is limited to Linux because the syntax in the *.S
file is GNU AS (GAS) assembler. Attempting to enable them on, say, iOS, will result in assembler errors. The relevant snippet from config_asm.h
is shown below.
#if !defined(CRYPTOPP_DISABLE_ASM) && defined(__arm__) && defined(__linux__) # if defined(__GNUC__) || defined(__clang__) # define CRYPTOGAMS_ARM_AES 1 # define CRYPTOGAMS_ARM_SHA1 1 # define CRYPTOGAMS_ARM_SHA256 1 # define CRYPTOGAMS_ARM_SHA512 1 # endif #endif
To disable the Cryptogams code you only need to comment or delete the appropriate define.
Source Files
The Cryptogams source files of interest are listed below:
- aes_armv4.h
- aes_armv4.S
- sha1_armv4.h
- sha1_armv4.S
- sha256_armv4.h
- sha256_armv4.S
- sha512_armv4.h
- sha512_armv4.S
The header files were created manually. They merely provide the signatures for the functions exported by the source files. The source files were created with the appropriate arm-xlate
program and the Perl source templates as detailed at Cryptogams AES and Cryptogams SHA.
The recipe to create the *.S
files is below. Though arm_arch.h
is copied for testing it cannot be used because it licensed with OpenSSL only. We have to manually pull out the parts relating to ARMv7 and above.
git clone https://github.com/openssl/openssl mkdir -p cryptogams/ cp ./openssl/crypto/perlasm/arm-xlate.pl ./cryptogams/ cp ./openssl/crypto/sha/asm/sha1-armv4-large.pl ./cryptogams/ cp ./openssl/crypto/arm_arch.h cryptogams/ cp ./openssl/crypto/perlasm/arm-xlate.pl ./cryptogams/ cp ./openssl/crypto/sha/asm/sha256-armv4.pl ./cryptogams/ cp ./openssl/crypto/arm_arch.h cryptogams/ cp ./openssl/crypto/perlasm/arm-xlate.pl ./cryptogams/ cp ./openssl/crypto/sha/asm/sha512-armv4.pl ./cryptogams/ cp ./openssl/crypto/arm_arch.h cryptogams/ cd cryptogams/ chmod +x *.pl # Flavor selects an assembler dialect. Apple's assembler # cannot consume GNU assembler syntax. Other flavors # include linux64, ios32 and ios64 FLAVOR=linux32 perl sha1-armv4-large.pl "$FLAVOR" sha1_armv4.S perl sha256-armv4.pl "$FLAVOR" sha256_armv4.S perl sha512-armv4.pl "$FLAVOR" sha512_armv4.S sed -i 's/OPENSSL/CRYPTOGAMS/g' *.S
The SHA source files needed extra treatment due to the Crypto++ shared object and error "unexpected reloc type 0x03". Also see Crypto++ Issue 846, ARM and "unexpected reloc type 0x03" loading shared object The changes are discussed next in SHA Patch.
SHA Patch
The Cryptogams SHA code is tightly coupled to OpenSSL. The coupling is unwanted and was removed for Crypto++. The changes applied to the SHA files are outlined below. Changes for SHA-1 are shown, but they apply to SHA-256 and SHA-512 as well. There were three commits due to Issue 846. First was Commit 7eaa5837e092, second was Commit ea96b9d37504, and third was Commit 81da61fe7b32.
In hindsight we should have only performed the last commit, Commit 81da61fe7b32. We were trying to minimize changes to Cryptogams code, but in the end we had to give up the goal and make more changes than desired.
After the changes below there are two global functions sha1_block_data_order
and sha1_block_data_order_neon
, and no dependence on CRYPTOGAMS_armcap_P
.
First, the loading of local symbol CRYPTOGAMS_armcaps
was removed:
.align 5 sha1_block_data_order: #if __ARM_MAX_ARCH__>=7 .Lsha1_block: ldr r12,.LCRYPTOGAMS_armcap # if !defined(_WIN32) adr r3,.Lsha1_block ldr r12,[r3,r12] @ CRYPTOGAMS_armcap_P # endif # if defined(__APPLE__) || defined(_WIN32) ldr r12,[r12] # endif tst r12,#ARMV7_NEON bne .LNEON
The new function preamble is:
.align 5 .globl sha1_block_data_order .type sha1_block_data_order,%function sha1_block_data_order: .Lsha1_block_data_order: #if __ARM_ARCH__<7 && !defined(__thumb2__) sub r3,pc,#8 @ sha1_block_data_order #else adr r3,.Lsha1_block_data_order #endif stmdb sp!,{r4,r5,r6,r7,r8,r9,r10,r11,r12,lr} add r2,r1,r2,lsl#6 @ r2 to point at the end of r1 ldmia r0,{r3,r4,r5,r6,r7} ...
Second, the remaining CRYPTOGAMS_armcaps
local symbol artifacts were removed. For example, these blocks were deleted because they were no longer needed.
#if __ARM_MAX_ARCH__>=7 .LCRYPTOGAMS_armcap_loc: # ifdef _WIN32 .word CRYPTOGAMS_armcaps # else .word CRYPTOGAMS_armcaps-.Lsha1_block # endif #endif
And
#if __ARM_MAX_ARCH__>=7 .comm CRYPTOGAMS_armcaps,4,4 #endif
Finally, sha1_block_data_order_neon
was made a global symbol:
.align 4 .globl sha1_block_data_order_neon .type sha1_block_data_order_neon,%function sha1_block_data_order_neon: stmdb sp!,{r4,r5,r6,r7,r8,r9,r10,r11,r12,lr}
You can verify a good change by inspecting for the symbol CRYPTOGAMS_armcap_P
. There should be no references to the symbol. And there should also be sha1_block_data_order
and sha1_block_data_order_neon
global symbols.
$ objdump -r sha1_armv4.o sha1_armv4.o: file format elf32-littlearm ...
Building AES
The GNUmakefile creates aes_armv4.o
from aes_armv4.S
. GCC uses the flags -march=armv7-a -Wa,--noexecstack
to enable both the ARMv7 and NEON code paths in aes_armv4.S
. Clang flags include -mthumb
to avoid a crash. The complete set of flags for AES using Clang are -march=armv7-a -mthumb -Wa,--noexecstack
.
The relevant snippet from GNUmakefile
are shown below. The block is used for both AES and the SHA source files.
# Cryptogams source files. We couple to ARMv7. # Limit to Linux. The source files target the GNU assembler. ifeq ($(IS_ARM32)$(IS_LINUX),11) ifeq ($(CLANG_COMPILER),1) CRYPTOGAMS_ARMV7_FLAG = -march=armv7-a -Wa,--noexecstack CRYPTOGAMS_ARMV7_THUMB_FLAG = -march=armv7-a -mthumb -Wa,--noexecstack else CRYPTOGAMS_ARMV7_FLAG = -march=armv7-a -Wa,--noexecstack CRYPTOGAMS_ARMV7_THUMB_FLAG = -march=armv7-a -Wa,--noexecstack endif SRCS += aes_armv4.S sha1_armv4.S sha256_armv4.S sha512_armv4.S endif
The recipe to create the aes_armv4.o
object file is shown below.
aes_armv4.o : aes_armv4.S $(CXX) $(strip $(CXXFLAGS) $(CRYPTOGAMS_ARMV7_THUMB_FLAG) -c) $
Building SHA
The GNUmakefile creates sha1_armv4.o
from sha1_armv4.S
, sha256_armv4.o
from sha256_armv4.S
and sha512_armv4.o
from sha512_armv4.S
. All of them use the flags -march=armv7-a -Wa,--noexecstack
to enable both the ARMv7 and NEON code paths in the *.S
files. There are no special flags for Clang.
The relevant snippet from GNUmakefile
are shown below. The block is used for both AES and the SHA source files.
# Cryptogams source files. We couple to ARMv7. # Limit to Linux. The source files target the GNU assembler. ifeq ($(IS_ARM32)$(IS_LINUX),11) ifeq ($(CLANG_COMPILER),1) CRYPTOGAMS_ARMV7_FLAG = -march=armv7-a -Wa,--noexecstack CRYPTOGAMS_ARMV7_THUMB_FLAG = -march=armv7-a -mthumb -Wa,--noexecstack else CRYPTOGAMS_ARMV7_FLAG = -march=armv7-a -Wa,--noexecstack CRYPTOGAMS_ARMV7_THUMB_FLAG = -march=armv7-a -Wa,--noexecstack endif SRCS += aes_armv4.S sha1_armv4.S sha256_armv4.S sha512_armv4.S endif
The recipe to create the object files are shown below.
# Cryptogams ARM asm implementation. sha1_armv4.o : sha1_armv4.S $(CXX) $(strip $(CXXFLAGS) $(CRYPTOGAMS_ARMV7_FLAG) -c) $< # Cryptogams ARM asm implementation. sha256_armv4.o : sha256_armv4.S $(CXX) $(strip $(CXXFLAGS) $(CRYPTOGAMS_ARMV7_FLAG) -c) $< # Cryptogams ARM asm implementation. sha512_armv4.o : sha512_armv4.S $(CXX) $(strip $(CXXFLAGS) $(CRYPTOGAMS_ARMV7_FLAG) -c) $