Cryptogams

From Crypto++ Wiki
Jump to navigation Jump to search

Cryptogams is Andy Polyakov's project used to develop high speed cryptographic primitives and share them with other developers. Andy is renowned for his implementations, and they are used by the Linux kernel, OpenSSL and many private implementations. Crypto++ uses Cryptogams algorithms for AES, SHA-1, SHA-256 and SHA-512 for 32-bit ARMv7.

ARMv7 is one of the dominant use cases for ARM, but Crypto++ had performance gaps on the devices. The code we wrote (both C++ and ASM) could not close the performance gap. Andy's code runs 2x to 3x faster than the library's existing implementations, and closes the gap for us.

Crypto++ added AES under Issue 683, SHA-1 under Issue 837, SHA-256 under Issue 839, and SHA-512 under Issue 841. After the SHA cut-ins we learned of a problem with loading shared objects. The problem was fixed under Issue 846. We then changed the approach to the shared object problem and fixed it again under Issues 847.

Andy offers a permissive BSD-style license so using his code should not be a problem for most projects. If you don't want to use Cryptogams code, then the procedure to disable the implementation is detailed below in config_asm.h. We recommend you take advantage of the code, however.

We documented the general procedures at Cryptogams AES and Cryptogams SHA on the OpenSSL wiki. This wiki page will provide the details when porting to Crypto++.

config_asm.h

The configuration file config_asm.h enables Cryptogams on 32-bit ARM by default. Enabling is limited to Linux because the syntax in the *.S file is GNU AS (GAS) assembler. Attempting to enable them on, say, iOS, will result in assembler errors. The relevant snippet from config_asm.h is shown below.

#if !defined(CRYPTOPP_DISABLE_ASM) && defined(__arm__) && defined(__linux__)
# if defined(__GNUC__) || defined(__clang__)
#  define CRYPTOGAMS_ARM_AES      1
#  define CRYPTOGAMS_ARM_SHA1     1
#  define CRYPTOGAMS_ARM_SHA256   1
#  define CRYPTOGAMS_ARM_SHA512   1
# endif
#endif

To disable the Cryptogams code you only need to comment or delete the appropriate define.

Source Files

The Cryptogams source files of interest are listed below:

  • aes_armv4.h
  • aes_armv4.S
  • sha1_armv4.h
  • sha1_armv4.S
  • sha256_armv4.h
  • sha256_armv4.S
  • sha512_armv4.h
  • sha512_armv4.S

The header files were created manually. They merely provide the signatures for the functions exported by the source files. The source files were created with the appropriate arm-xlate program and the Perl source templates as detailed at Cryptogams AES and Cryptogams SHA.

The recipe to create the *.S files is below. Though arm_arch.h is copied for testing it cannot be used because it licensed with OpenSSL only. We have to manually pull out the parts relating to ARMv7 and above.

git clone https://github.com/openssl/openssl
mkdir -p cryptogams/

cp ./openssl/crypto/perlasm/arm-xlate.pl ./cryptogams/
cp ./openssl/crypto/sha/asm/sha1-armv4-large.pl ./cryptogams/
cp ./openssl/crypto/arm_arch.h cryptogams/

cp ./openssl/crypto/perlasm/arm-xlate.pl ./cryptogams/
cp ./openssl/crypto/sha/asm/sha256-armv4.pl ./cryptogams/
cp ./openssl/crypto/arm_arch.h cryptogams/

cp ./openssl/crypto/perlasm/arm-xlate.pl ./cryptogams/
cp ./openssl/crypto/sha/asm/sha512-armv4.pl ./cryptogams/
cp ./openssl/crypto/arm_arch.h cryptogams/

cd cryptogams/
chmod +x *.pl

# Flavor selects an assembler dialect. Apple's assembler
# cannot consume GNU assembler syntax. Other flavors
# include linux64, ios32 and ios64
FLAVOR=linux32

perl sha1-armv4-large.pl "$FLAVOR" sha1_armv4.S
perl sha256-armv4.pl "$FLAVOR" sha256_armv4.S
perl sha512-armv4.pl "$FLAVOR" sha512_armv4.S

sed -i 's/OPENSSL/CRYPTOGAMS/g' *.S

The SHA source files needed extra treatment due to the Crypto++ shared object and error "unexpected reloc type 0x03". Also see Crypto++ Issue 846, ARM and "unexpected reloc type 0x03" loading shared object The changes are discussed next in SHA Patch.

SHA Patch

The Cryptogams SHA code is tightly coupled to OpenSSL. The coupling is unwanted and was removed for Crypto++. The changes applied to the SHA files are outlined below. Changes for SHA-1 are shown, but they apply to SHA-256 and SHA-512 as well. There were three commits due to Issue 846. First was Commit 7eaa5837e092, second was Commit ea96b9d37504, and third was Commit 81da61fe7b32.

In hindsight we should have only performed the last commit, Commit 81da61fe7b32. We were trying to minimize changes to Cryptogams code, but in the end we had to give up the goal and make more changes than desired.

After the changes below there are two global functions sha1_block_data_order and sha1_block_data_order_neon, and no dependence on CRYPTOGAMS_armcap_P.

First, the loading of local symbol CRYPTOGAMS_armcaps was removed:

.align 5
sha1_block_data_order:
#if __ARM_MAX_ARCH__>=7
.Lsha1_block:
    ldr    r12,.LCRYPTOGAMS_armcap
# if !defined(_WIN32)
    adr    r3,.Lsha1_block
    ldr    r12,[r3,r12]        @ CRYPTOGAMS_armcap_P
# endif
# if defined(__APPLE__) || defined(_WIN32)
    ldr    r12,[r12]
# endif
    tst    r12,#ARMV7_NEON
    bne    .LNEON

The new function preamble is:

.align    5
.globl    sha1_block_data_order
.type     sha1_block_data_order,%function

sha1_block_data_order:
.Lsha1_block_data_order:

#if __ARM_ARCH__<7 && !defined(__thumb2__)
    sub    r3,pc,#8        @ sha1_block_data_order
#else
    adr    r3,.Lsha1_block_data_order
#endif

    stmdb   sp!,{r4,r5,r6,r7,r8,r9,r10,r11,r12,lr}
    add     r2,r1,r2,lsl#6    @ r2 to point at the end of r1
    ldmia   r0,{r3,r4,r5,r6,r7}
    ...

Second, the remaining CRYPTOGAMS_armcaps local symbol artifacts were removed. For example, these blocks were deleted because they were no longer needed.

#if __ARM_MAX_ARCH__>=7
.LCRYPTOGAMS_armcap_loc:
# ifdef    _WIN32
.word    CRYPTOGAMS_armcaps
# else
.word    CRYPTOGAMS_armcaps-.Lsha1_block
# endif
#endif

And

#if __ARM_MAX_ARCH__>=7
.comm    CRYPTOGAMS_armcaps,4,4
#endif

Finally, sha1_block_data_order_neon was made a global symbol:

.align    4
.globl    sha1_block_data_order_neon
.type     sha1_block_data_order_neon,%function

sha1_block_data_order_neon:

    stmdb sp!,{r4,r5,r6,r7,r8,r9,r10,r11,r12,lr}

You can verify a good change by inspecting for the symbol CRYPTOGAMS_armcap_P. There should be no references to the symbol. And there should also be sha1_block_data_order and sha1_block_data_order_neon global symbols.

$ objdump -r sha1_armv4.o

sha1_armv4.o:     file format elf32-littlearm
...

Building AES

The GNUmakefile creates aes_armv4.o from aes_armv4.S. GCC uses the flags -march=armv7-a -Wa,--noexecstack to enable both the ARMv7 and NEON code paths in aes_armv4.S. Clang flags include -mthumb to avoid a crash. The complete set of flags for AES using Clang are -march=armv7-a -mthumb -Wa,--noexecstack.

The relevant snippet from GNUmakefile are shown below. The block is used for both AES and the SHA source files.

# Cryptogams source files. We couple to ARMv7.
# Limit to Linux. The source files target the GNU assembler.
ifeq ($(IS_ARM32)$(IS_LINUX),11)
  ifeq ($(CLANG_COMPILER),1)
    CRYPTOGAMS_ARMV7_FLAG = -march=armv7-a -Wa,--noexecstack
    CRYPTOGAMS_ARMV7_THUMB_FLAG = -march=armv7-a -mthumb -Wa,--noexecstack
  else
    CRYPTOGAMS_ARMV7_FLAG = -march=armv7-a -Wa,--noexecstack
    CRYPTOGAMS_ARMV7_THUMB_FLAG = -march=armv7-a -Wa,--noexecstack
  endif
  SRCS += aes_armv4.S sha1_armv4.S sha256_armv4.S sha512_armv4.S
endif

The recipe to create the aes_armv4.o object file is shown below.

aes_armv4.o : aes_armv4.S
    $(CXX) $(strip $(CXXFLAGS) $(CRYPTOGAMS_ARMV7_THUMB_FLAG) -c) $

Building SHA

The GNUmakefile creates sha1_armv4.o from sha1_armv4.S, sha256_armv4.o from sha256_armv4.S and sha512_armv4.o from sha512_armv4.S. All of them use the flags -march=armv7-a -Wa,--noexecstack to enable both the ARMv7 and NEON code paths in the *.S files. There are no special flags for Clang.

The relevant snippet from GNUmakefile are shown below. The block is used for both AES and the SHA source files.

# Cryptogams source files. We couple to ARMv7.
# Limit to Linux. The source files target the GNU assembler.
ifeq ($(IS_ARM32)$(IS_LINUX),11)
  ifeq ($(CLANG_COMPILER),1)
    CRYPTOGAMS_ARMV7_FLAG = -march=armv7-a -Wa,--noexecstack
    CRYPTOGAMS_ARMV7_THUMB_FLAG = -march=armv7-a -mthumb -Wa,--noexecstack
  else
    CRYPTOGAMS_ARMV7_FLAG = -march=armv7-a -Wa,--noexecstack
    CRYPTOGAMS_ARMV7_THUMB_FLAG = -march=armv7-a -Wa,--noexecstack
  endif
  SRCS += aes_armv4.S sha1_armv4.S sha256_armv4.S sha512_armv4.S
endif

The recipe to create the object files are shown below.

# Cryptogams ARM asm implementation.
sha1_armv4.o : sha1_armv4.S
    $(CXX) $(strip $(CXXFLAGS) $(CRYPTOGAMS_ARMV7_FLAG) -c) $<

# Cryptogams ARM asm implementation.
sha256_armv4.o : sha256_armv4.S
    $(CXX) $(strip $(CXXFLAGS) $(CRYPTOGAMS_ARMV7_FLAG) -c) $<

# Cryptogams ARM asm implementation.
sha512_armv4.o : sha512_armv4.S
    $(CXX) $(strip $(CXXFLAGS) $(CRYPTOGAMS_ARMV7_FLAG) -c) $