[erlang-questions] crypto:hmac/3 using hardware acceleration

Ben Browitt ben.browitt@REDACTED
Wed Feb 26 01:29:06 CET 2020


I've run better test of just hash(sha, Data) instead of hmac that does more
things and got 2x speed (1 microsecond vs 2 microseconds).
Is there a way to improve the crypto:macN/5 speed to get it closer to 100%
gain instead of just 25% gain?

We can clear the SHA hardware extension to test its effect by running the
shell with the following env variable:
OPENSSL_ia32cap=:~0x20000000 erl

Benchmark code:
Data = crypto:strong_rand_bytes(1024),
TC = fun(TC_M, TC_F, TC_A, TC_N) when TC_N > 0 -> TC_L = tl([begin {TC_T,
_Result} = timer:tc(TC_M, TC_F, TC_A), TC_T end || _ <- lists:seq(1,
TC_N)]), TC_Min = lists:min(TC_L), TC_Max = lists:max(TC_L), TC_Med =
lists:nth(round((TC_N - 1) / 2), lists:sort(TC_L)), TC_Avg =
round(lists:foldl(fun(TC_X, TC_Sum) -> TC_X + TC_Sum end, 0, TC_L) / (TC_N
- 1)), io:format("Range: ~b - ~b mics~nMedian: ~b mics ~nAverage: ~b
mics~n", [TC_Min, TC_Max, TC_Med, TC_Avg]), TC_Med end.
TC(crypto, hash, [sha, Data], 1000000).

On Fri, Feb 21, 2020 at 2:38 AM Ben Browitt <ben.browitt@REDACTED> wrote:

> I've compared:
> Intel Skylake - no SHA hardware extension, N1 machine type on GCP [1]
> Second generation AMD EPYC Rome processor - has SHA hardware extension,
> N2D machine type on GCP [2]
>
> Ubuntu 18.04
> OpenSSL 1.1.1
> OTP-22.2.7 (erlang-solutions deb package)
>
> openssl speed -evp sha1 on AMD EPYC is about 2X faster than Intel Skylake.
> crypto:macN/5 on AMD EPYC is about 25% faster than Intel Skylake.
>
> It doesn't seem like crypto:macN/5 on AMD is using the SHA hardware
> extension. The 25% increase is probably just because Skylake is several
> years older than AMD EPYC second genration.
> Is my test correct?
>
> Tests:
> Key = crypto:strong_rand_bytes(20),
> Data = crypto:strong_rand_bytes(1000),
> MacLength = 10,
> TC = fun(TC_M, TC_F, TC_A, TC_N) when TC_N > 0 -> TC_L = tl([begin {TC_T,
> _Result} = timer:tc(TC_M, TC_F, TC_A), TC_T end || _ <- lists:seq(1,
> TC_N)]), TC_Min = lists:min(TC_L), TC_Max = lists:max(TC_L), TC_Med =
> lists:nth(round((TC_N - 1) / 2), lists:sort(TC_L)), TC_Avg =
> round(lists:foldl(fun(TC_X, TC_Sum) -> TC_X + TC_Sum end, 0, TC_L) / (TC_N
> - 1)), io:format("Range: ~b - ~b mics~nMedian: ~b mics ~nAverage: ~b
> mics~n", [TC_Min, TC_Max, TC_Med, TC_Avg]), TC_Med end.
> TC(crypto, macN, [hmac, sha, Key, Data, MacLength], 1000000).
>
> 1) Intel Skylake
> TC(crypto, macN, [hmac, sha, Key, Data, MacLength], 1000000).
> Range: 3 - 729 mics
> Median: 4 mics
> Average: 4 mics
>
> openssl speed sha1
> Doing sha1 for 3s on 16 size blocks: 19026044 sha1's in 2.99s
> Doing sha1 for 3s on 64 size blocks: 11512925 sha1's in 2.98s
> Doing sha1 for 3s on 256 size blocks: 5769743 sha1's in 2.98s
> Doing sha1 for 3s on 1024 size blocks: 1927668 sha1's in 2.98s
> Doing sha1 for 3s on 8192 size blocks: 265026 sha1's in 2.98s
> Doing sha1 for 3s on 16384 size blocks: 133488 sha1's in 2.98s
> OpenSSL 1.1.1  11 Sep 2018
> built on: Tue Nov 12 16:58:35 2019 UTC
> options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
> compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall
> -Wa,--noexecstack -g -O2
> -fdebug-prefix-map=/build/openssl-kxN_24/openssl-1.1.1=.
> -fstack-protector-strong -Wformat -Werror=format-security
> -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ
> -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5
> -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM
> -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM
> -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DNDEBUG
> -Wdate-time -D_FORTIFY_SOURCE=2
> The 'numbers' are in 1000s of bytes per second processed.
> type             16 bytes     64 bytes    256 bytes   1024 bytes   8192
> bytes  16384 bytes
> sha1            101811.61k   247257.45k   495655.77k   662393.30k
> 728554.70k   733915.23k
>
> openssl speed -evp sha1
> Doing sha1 for 3s on 16 size blocks: 11590063 sha1's in 2.99s
> Doing sha1 for 3s on 64 size blocks: 8259388 sha1's in 2.97s
> Doing sha1 for 3s on 256 size blocks: 4853323 sha1's in 2.99s
> Doing sha1 for 3s on 1024 size blocks: 1796528 sha1's in 2.98s
> Doing sha1 for 3s on 8192 size blocks: 259970 sha1's in 2.99s
> Doing sha1 for 3s on 16384 size blocks: 131515 sha1's in 2.99s
> OpenSSL 1.1.1  11 Sep 2018
> built on: Tue Nov 12 16:58:35 2019 UTC
> options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
> compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall
> -Wa,--noexecstack -g -O2
> -fdebug-prefix-map=/build/openssl-kxN_24/openssl-1.1.1=.
> -fstack-protector-strong -Wformat -Werror=format-security
> -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ
> -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5
> -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM
> -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM
> -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DNDEBUG
> -Wdate-time -D_FORTIFY_SOURCE=2
> The 'numbers' are in 1000s of bytes per second processed.
> type             16 bytes     64 bytes    256 bytes   1024 bytes   8192
> bytes  16384 bytes
> sha1             62020.40k   177980.08k   415535.35k   617330.43k
> 712265.63k   720649.42k
>
> 2) AMD EPYC
> TC(crypto, macN, [hmac, sha, Key, Data, MacLength], 1000000).
> Range: 3 - 515 mics
> Median: 3 mics
> Average: 3 mics
>
> openssl speed sha1
> Doing sha1 for 3s on 16 size blocks: 39862496 sha1's in 3.00s
> Doing sha1 for 3s on 64 size blocks: 25451866 sha1's in 3.00s
> Doing sha1 for 3s on 256 size blocks: 13073739 sha1's in 3.00s
> Doing sha1 for 3s on 1024 size blocks: 4463324 sha1's in 3.00s
> Doing sha1 for 3s on 8192 size blocks: 622138 sha1's in 3.00s
> Doing sha1 for 3s on 16384 size blocks: 314316 sha1's in 3.00s
> OpenSSL 1.1.1  11 Sep 2018
> built on: Tue Nov 12 16:58:35 2019 UTC
> options:bn(64,64) rc4(8x,int) des(int) aes(partial) blowfish(ptr)
> compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall
> -Wa,--noexecstack -g -O2
> -fdebug-prefix-map=/build/openssl-kxN_24/openssl-1.1.1=.
> -fstack-protector-strong -Wformat -Werror=format-security
> -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ
> -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5
> -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM
> -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM
> -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DNDEBUG
> -Wdate-time -D_FORTIFY_SOURCE=2
> The 'numbers' are in 1000s of bytes per second processed.
> type             16 bytes     64 bytes    256 bytes   1024 bytes   8192
> bytes  16384 bytes
> sha1            212599.98k   542973.14k  1115625.73k  1523481.26k
>  1698851.50k  1716584.45k
>
> openssl speed -evp sha1
> Doing sha1 for 3s on 16 size blocks: 17719869 sha1's in 3.00s
> Doing sha1 for 3s on 64 size blocks: 14559842 sha1's in 3.00s
> Doing sha1 for 3s on 256 size blocks: 9433054 sha1's in 3.00s
> Doing sha1 for 3s on 1024 size blocks: 3938020 sha1's in 3.00s
> Doing sha1 for 3s on 8192 size blocks: 607605 sha1's in 2.99s
> Doing sha1 for 3s on 16384 size blocks: 309279 sha1's in 3.00s
> OpenSSL 1.1.1  11 Sep 2018
> built on: Tue Nov 12 16:58:35 2019 UTC
> options:bn(64,64) rc4(8x,int) des(int) aes(partial) blowfish(ptr)
> compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall
> -Wa,--noexecstack -g -O2
> -fdebug-prefix-map=/build/openssl-kxN_24/openssl-1.1.1=.
> -fstack-protector-strong -Wformat -Werror=format-security
> -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ
> -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5
> -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM
> -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM
> -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DNDEBUG
> -Wdate-time -D_FORTIFY_SOURCE=2
> The 'numbers' are in 1000s of bytes per second processed.
> type             16 bytes     64 bytes    256 bytes   1024 bytes   8192
> bytes  16384 bytes
> sha1             94505.97k   310609.96k   804953.94k  1344177.49k
>  1664715.77k  1689075.71k
>
> [1] https://cloud.google.com/compute/docs/machine-types#n1_machine_type
> [2]
> https://cloud.google.com/compute/docs/machine-types#n2d_machine_types_beta
>
>
>
> On Thu, Feb 20, 2020 at 4:34 PM Ben Browitt <ben.browitt@REDACTED> wrote:
>
>> Hope to test soon. AMD servers on GCP will probably be available in the
>> next few days.
>> This is how I'm going to benchmark unless someone have a better
>> suggestion:
>> Key = crypto:strong_rand_bytes(20),
>> Data = crypto:strong_rand_bytes(1000),
>> MacLength = 10,
>> TC = fun(TC_M, TC_F, TC_A, TC_N) when TC_N > 0 -> TC_L = tl([begin {TC_T,
>> _Result} = timer:tc(TC_M, TC_F, TC_A), TC_T end || _ <- lists:seq(1,
>> TC_N)]), TC_Min = lists:min(TC_L), TC_Max = lists:max(TC_L), TC_Med =
>> lists:nth(round((TC_N - 1) / 2), lists:sort(TC_L)), TC_Avg =
>> round(lists:foldl(fun(TC_X, TC_Sum) -> TC_X + TC_Sum end, 0, TC_L) / (TC_N
>> - 1)), io:format("Range: ~b - ~b mics~nMedian: ~b mics ~nAverage: ~b
>> mics~n", [TC_Min, TC_Max, TC_Med, TC_Avg]), TC_Med end.
>> TC(crypto, macN, [hmac, sha, Key, Data, MacLength], 1000).
>>
>> And with:
>> openssl speed -evp sha1
>>
>> On Thu, Feb 20, 2020 at 2:15 PM Hans Nilsson R <
>> hans.r.nilsson@REDACTED> wrote:
>>
>>> Well, it isn't super clear in the release notes, so it is not strange
>>> you didn't know it.
>>>
>>> I'm VERY interested in the results of your benchmarking!
>>>
>>> /Hans
>>> ------------------------------
>>> *Från:* Ben Browitt <ben.browitt@REDACTED>
>>> *Skickat:* den 19 februari 2020 18:14
>>> *Till:* Hans Nilsson R <hans.r.nilsson@REDACTED>
>>> *Kopia:* zxq9@REDACTED <zxq9@REDACTED>; erlang-questions@REDACTED <
>>> erlang-questions@REDACTED>
>>> *Ämne:* Re: [erlang-questions] crypto:hmac/3 using hardware acceleration
>>>
>>> Thank you Hans, that's great.
>>> I probably missed it in the release notes.
>>> I'll benchmark and compare hmac on a server with and without sha
>>> hardware accelerations.
>>>
>>> On Wed, Feb 19, 2020 at 5:18 PM Hans Nilsson R <
>>> hans.r.nilsson@REDACTED> wrote:
>>>
>>> Crypto uses the EVP interfase for hash and mac (as well ass ciphers)
>>> with some conditions:
>>>
>>> Since OTP-22.1:
>>> The hash functions in crypto (hash,/2, hash_init/1, hash_update/2 and
>>> hash_final/1) use the EVP interface if the underlying cryptolib is OpenSSL
>>> 1.0.0 or higher.
>>>
>>> Since OTP-22.1.3:
>>> The mac functions (mac, macN, mac_init, mac_update, mac_final and
>>> mac_finalN) use the EVP interface if the underlying cryptolib is OpenSSL
>>> 1.1.1 or higher.
>>>
>>> /Hans
>>> ------------------------------
>>> *Från:* erlang-questions <erlang-questions-bounces@REDACTED> för Ben
>>> Browitt <ben.browitt@REDACTED>
>>> *Skickat:* den 18 februari 2020 17:55
>>> *Till:* zxq9@REDACTED <zxq9@REDACTED>
>>> *Kopia:* erlang-questions@REDACTED <erlang-questions@REDACTED>
>>> *Ämne:* Re: [erlang-questions] crypto:hmac/3 using hardware acceleration
>>>
>>> AWS [1] and GCP [2] provide AMD EPYC servers  with SHA hardware
>>> accelerations.
>>> Intel Ice Lake servers will also have SHA hardware accelerations [3].
>>> Is there a chance OTP 23 could use EVP for SHA? This will give a large
>>> performance boost.
>>>
>>> [1] https://aws.amazon.com/ec2/amd/
>>> [2]
>>> https://cloud.google.com/blog/products/compute/announcing-the-n2d-vm-family-based-on-amd
>>> <https://protect2.fireeye.com/v1/url?k=5f4e9246-03c459b2-5f4ed2dd-86cd58c48020-a6f86d945d59dea5&q=1&e=ae3465f0-e6c3-4143-ae82-51afb4cdbf8e&u=https%3A%2F%2Fcloud.google.com%2Fblog%2Fproducts%2Fcompute%2Fannouncing-the-n2d-vm-family-based-on-amd>
>>> [3] https://en.wikipedia.org/wiki/Ice_Lake_(microprocessor)
>>>
>>> On Wed, May 8, 2019 at 4:34 PM <zxq9@REDACTED> wrote:
>>>
>>> On 2019年5月8日水曜日 14時15分51秒 JST Ben Browitt wrote:
>>> > I've tested the speed with and without evp. evp is slower because Intel
>>> > cpus don't have hardware acceleration for sha.
>>> > So it's best to leave it without evp for now. Thanks.
>>> > openssl speed sha1
>>> > openssl speed -evp sha1
>>>
>>> I think it depends on how your openssl was built and which processor
>>> family you have. IIRC Intel has SHA1 hardware support, and AMD has
>>> SHA1 and SHA256 hardware instructions since RyZen.
>>>
>>> May also depend on if you are running virtualized and whether the
>>> hypervisor is exposing the instructions.
>>>
>>> In the base case I imagine it would "just work", but not if this is
>>> disabled in a vanilla Linux/BSD/whatever distribution binary, or if
>>> your system is set to a mode that restricts some instructions.
>>>
>>> -Craig
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20200226/ba142bb6/attachment.htm>


More information about the erlang-questions mailing list