[erlang-questions] crypto:hmac/3 using hardware acceleration

Ben Browitt ben.browitt@REDACTED
Fri Feb 21 01:38:56 CET 2020


I've compared:
Intel Skylake - no SHA hardware extension, N1 machine type on GCP [1]
Second generation AMD EPYC Rome processor - has SHA hardware extension, N2D
machine type on GCP [2]

Ubuntu 18.04
OpenSSL 1.1.1
OTP-22.2.7 (erlang-solutions deb package)

openssl speed -evp sha1 on AMD EPYC is about 2X faster than Intel Skylake.
crypto:macN/5 on AMD EPYC is about 25% faster than Intel Skylake.

It doesn't seem like crypto:macN/5 on AMD is using the SHA hardware
extension. The 25% increase is probably just because Skylake is several
years older than AMD EPYC second genration.
Is my test correct?

Tests:
Key = crypto:strong_rand_bytes(20),
Data = crypto:strong_rand_bytes(1000),
MacLength = 10,
TC = fun(TC_M, TC_F, TC_A, TC_N) when TC_N > 0 -> TC_L = tl([begin {TC_T,
_Result} = timer:tc(TC_M, TC_F, TC_A), TC_T end || _ <- lists:seq(1,
TC_N)]), TC_Min = lists:min(TC_L), TC_Max = lists:max(TC_L), TC_Med =
lists:nth(round((TC_N - 1) / 2), lists:sort(TC_L)), TC_Avg =
round(lists:foldl(fun(TC_X, TC_Sum) -> TC_X + TC_Sum end, 0, TC_L) / (TC_N
- 1)), io:format("Range: ~b - ~b mics~nMedian: ~b mics ~nAverage: ~b
mics~n", [TC_Min, TC_Max, TC_Med, TC_Avg]), TC_Med end.
TC(crypto, macN, [hmac, sha, Key, Data, MacLength], 1000000).

1) Intel Skylake
TC(crypto, macN, [hmac, sha, Key, Data, MacLength], 1000000).
Range: 3 - 729 mics
Median: 4 mics
Average: 4 mics

openssl speed sha1
Doing sha1 for 3s on 16 size blocks: 19026044 sha1's in 2.99s
Doing sha1 for 3s on 64 size blocks: 11512925 sha1's in 2.98s
Doing sha1 for 3s on 256 size blocks: 5769743 sha1's in 2.98s
Doing sha1 for 3s on 1024 size blocks: 1927668 sha1's in 2.98s
Doing sha1 for 3s on 8192 size blocks: 265026 sha1's in 2.98s
Doing sha1 for 3s on 16384 size blocks: 133488 sha1's in 2.98s
OpenSSL 1.1.1  11 Sep 2018
built on: Tue Nov 12 16:58:35 2019 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -Wa,--noexecstack
-g -O2 -fdebug-prefix-map=/build/openssl-kxN_24/openssl-1.1.1=.
-fstack-protector-strong -Wformat -Werror=format-security
-DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM
-DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM
-DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DNDEBUG
-Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192
bytes  16384 bytes
sha1            101811.61k   247257.45k   495655.77k   662393.30k
728554.70k   733915.23k

openssl speed -evp sha1
Doing sha1 for 3s on 16 size blocks: 11590063 sha1's in 2.99s
Doing sha1 for 3s on 64 size blocks: 8259388 sha1's in 2.97s
Doing sha1 for 3s on 256 size blocks: 4853323 sha1's in 2.99s
Doing sha1 for 3s on 1024 size blocks: 1796528 sha1's in 2.98s
Doing sha1 for 3s on 8192 size blocks: 259970 sha1's in 2.99s
Doing sha1 for 3s on 16384 size blocks: 131515 sha1's in 2.99s
OpenSSL 1.1.1  11 Sep 2018
built on: Tue Nov 12 16:58:35 2019 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -Wa,--noexecstack
-g -O2 -fdebug-prefix-map=/build/openssl-kxN_24/openssl-1.1.1=.
-fstack-protector-strong -Wformat -Werror=format-security
-DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM
-DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM
-DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DNDEBUG
-Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192
bytes  16384 bytes
sha1             62020.40k   177980.08k   415535.35k   617330.43k
712265.63k   720649.42k

2) AMD EPYC
TC(crypto, macN, [hmac, sha, Key, Data, MacLength], 1000000).
Range: 3 - 515 mics
Median: 3 mics
Average: 3 mics

openssl speed sha1
Doing sha1 for 3s on 16 size blocks: 39862496 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 25451866 sha1's in 3.00s
Doing sha1 for 3s on 256 size blocks: 13073739 sha1's in 3.00s
Doing sha1 for 3s on 1024 size blocks: 4463324 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 622138 sha1's in 3.00s
Doing sha1 for 3s on 16384 size blocks: 314316 sha1's in 3.00s
OpenSSL 1.1.1  11 Sep 2018
built on: Tue Nov 12 16:58:35 2019 UTC
options:bn(64,64) rc4(8x,int) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -Wa,--noexecstack
-g -O2 -fdebug-prefix-map=/build/openssl-kxN_24/openssl-1.1.1=.
-fstack-protector-strong -Wformat -Werror=format-security
-DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM
-DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM
-DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DNDEBUG
-Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192
bytes  16384 bytes
sha1            212599.98k   542973.14k  1115625.73k  1523481.26k
 1698851.50k  1716584.45k

openssl speed -evp sha1
Doing sha1 for 3s on 16 size blocks: 17719869 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 14559842 sha1's in 3.00s
Doing sha1 for 3s on 256 size blocks: 9433054 sha1's in 3.00s
Doing sha1 for 3s on 1024 size blocks: 3938020 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 607605 sha1's in 2.99s
Doing sha1 for 3s on 16384 size blocks: 309279 sha1's in 3.00s
OpenSSL 1.1.1  11 Sep 2018
built on: Tue Nov 12 16:58:35 2019 UTC
options:bn(64,64) rc4(8x,int) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -Wa,--noexecstack
-g -O2 -fdebug-prefix-map=/build/openssl-kxN_24/openssl-1.1.1=.
-fstack-protector-strong -Wformat -Werror=format-security
-DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM
-DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM
-DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DNDEBUG
-Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192
bytes  16384 bytes
sha1             94505.97k   310609.96k   804953.94k  1344177.49k
 1664715.77k  1689075.71k

[1] https://cloud.google.com/compute/docs/machine-types#n1_machine_type
[2]
https://cloud.google.com/compute/docs/machine-types#n2d_machine_types_beta



On Thu, Feb 20, 2020 at 4:34 PM Ben Browitt <ben.browitt@REDACTED> wrote:

> Hope to test soon. AMD servers on GCP will probably be available in the
> next few days.
> This is how I'm going to benchmark unless someone have a better suggestion:
> Key = crypto:strong_rand_bytes(20),
> Data = crypto:strong_rand_bytes(1000),
> MacLength = 10,
> TC = fun(TC_M, TC_F, TC_A, TC_N) when TC_N > 0 -> TC_L = tl([begin {TC_T,
> _Result} = timer:tc(TC_M, TC_F, TC_A), TC_T end || _ <- lists:seq(1,
> TC_N)]), TC_Min = lists:min(TC_L), TC_Max = lists:max(TC_L), TC_Med =
> lists:nth(round((TC_N - 1) / 2), lists:sort(TC_L)), TC_Avg =
> round(lists:foldl(fun(TC_X, TC_Sum) -> TC_X + TC_Sum end, 0, TC_L) / (TC_N
> - 1)), io:format("Range: ~b - ~b mics~nMedian: ~b mics ~nAverage: ~b
> mics~n", [TC_Min, TC_Max, TC_Med, TC_Avg]), TC_Med end.
> TC(crypto, macN, [hmac, sha, Key, Data, MacLength], 1000).
>
> And with:
> openssl speed -evp sha1
>
> On Thu, Feb 20, 2020 at 2:15 PM Hans Nilsson R <
> hans.r.nilsson@REDACTED> wrote:
>
>> Well, it isn't super clear in the release notes, so it is not strange you
>> didn't know it.
>>
>> I'm VERY interested in the results of your benchmarking!
>>
>> /Hans
>> ------------------------------
>> *Från:* Ben Browitt <ben.browitt@REDACTED>
>> *Skickat:* den 19 februari 2020 18:14
>> *Till:* Hans Nilsson R <hans.r.nilsson@REDACTED>
>> *Kopia:* zxq9@REDACTED <zxq9@REDACTED>; erlang-questions@REDACTED <
>> erlang-questions@REDACTED>
>> *Ämne:* Re: [erlang-questions] crypto:hmac/3 using hardware acceleration
>>
>> Thank you Hans, that's great.
>> I probably missed it in the release notes.
>> I'll benchmark and compare hmac on a server with and without sha hardware
>> accelerations.
>>
>> On Wed, Feb 19, 2020 at 5:18 PM Hans Nilsson R <
>> hans.r.nilsson@REDACTED> wrote:
>>
>> Crypto uses the EVP interfase for hash and mac (as well ass ciphers) with
>> some conditions:
>>
>> Since OTP-22.1:
>> The hash functions in crypto (hash,/2, hash_init/1, hash_update/2 and
>> hash_final/1) use the EVP interface if the underlying cryptolib is OpenSSL
>> 1.0.0 or higher.
>>
>> Since OTP-22.1.3:
>> The mac functions (mac, macN, mac_init, mac_update, mac_final and
>> mac_finalN) use the EVP interface if the underlying cryptolib is OpenSSL
>> 1.1.1 or higher.
>>
>> /Hans
>> ------------------------------
>> *Från:* erlang-questions <erlang-questions-bounces@REDACTED> för Ben
>> Browitt <ben.browitt@REDACTED>
>> *Skickat:* den 18 februari 2020 17:55
>> *Till:* zxq9@REDACTED <zxq9@REDACTED>
>> *Kopia:* erlang-questions@REDACTED <erlang-questions@REDACTED>
>> *Ämne:* Re: [erlang-questions] crypto:hmac/3 using hardware acceleration
>>
>> AWS [1] and GCP [2] provide AMD EPYC servers  with SHA hardware
>> accelerations.
>> Intel Ice Lake servers will also have SHA hardware accelerations [3].
>> Is there a chance OTP 23 could use EVP for SHA? This will give a large
>> performance boost.
>>
>> [1] https://aws.amazon.com/ec2/amd/
>> [2]
>> https://cloud.google.com/blog/products/compute/announcing-the-n2d-vm-family-based-on-amd
>> <https://protect2.fireeye.com/v1/url?k=5f4e9246-03c459b2-5f4ed2dd-86cd58c48020-a6f86d945d59dea5&q=1&e=ae3465f0-e6c3-4143-ae82-51afb4cdbf8e&u=https%3A%2F%2Fcloud.google.com%2Fblog%2Fproducts%2Fcompute%2Fannouncing-the-n2d-vm-family-based-on-amd>
>> [3] https://en.wikipedia.org/wiki/Ice_Lake_(microprocessor)
>>
>> On Wed, May 8, 2019 at 4:34 PM <zxq9@REDACTED> wrote:
>>
>> On 2019年5月8日水曜日 14時15分51秒 JST Ben Browitt wrote:
>> > I've tested the speed with and without evp. evp is slower because Intel
>> > cpus don't have hardware acceleration for sha.
>> > So it's best to leave it without evp for now. Thanks.
>> > openssl speed sha1
>> > openssl speed -evp sha1
>>
>> I think it depends on how your openssl was built and which processor
>> family you have. IIRC Intel has SHA1 hardware support, and AMD has
>> SHA1 and SHA256 hardware instructions since RyZen.
>>
>> May also depend on if you are running virtualized and whether the
>> hypervisor is exposing the instructions.
>>
>> In the base case I imagine it would "just work", but not if this is
>> disabled in a vanilla Linux/BSD/whatever distribution binary, or if
>> your system is set to a mode that restricts some instructions.
>>
>> -Craig
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20200221/aed9c7ad/attachment.htm>


More information about the erlang-questions mailing list