[erlang-bugs] Unloading then reloading crypto.so causes erlang core dump on Solaris 11
Sverker Eriksson
sverker.eriksson@REDACTED
Fri Sep 7 15:03:31 CEST 2012
XinFeng Liu wrote:
> This I don't understand at all.
>
> reload() in crypto.c should only be called if the module is upgraded, not if
> it is unloaded and later loaded again.
>
> /Sverker, Erlang/OTP
>
> Thanks for pointing this. I'm sorry I made a wrong conclusion for the workaround
> in modifying reload() function. I wronly modfied reload() in 15B02.
>
>
Good to know.
> I have some new findings on this issue:
> I find the latest 15B02 (without any modification) does not cause core dump in
> running couchDB test suite. Digging this issue further, there's a subtle
> difference between 15B02 and 15B01 in crypto.so.
> In 15B01, crypto.so explicitly link libssl.so, while in 15B02 it does not.
>
>
From R15B02 README:
OTP-10064 Remove unnecessary dependency to libssl from crypto NIF
library. This dependency was introduced by accident in
R14B04.
> And more importantly, the libssl.so built by Sun/Oracle seems built with "-z
> nodelete" meaning RTLD_NODELETE. ("elfdump -d" can show that).
> In 15B01, loading crypto.so causes libssl.so to be loaded, since libssl.so
> depends on libcrypto.so, libcrypto.so is somehow promoted to RTLD_NODELETE
> (using solaris runtime LD debugger can show this). So, libcrypto.so is
> unloadable in dlclose().
> In 15B02, when running couchDB test suite, unloading crypto.so causes
> libcrypto.so unloaded too, then later reloading both crypto.so and libcrypto.so
> would not cause previous problem.
>
>
Ok, so it seems that removing the unnecessary dependency to libssl.se in
R15B02 happened to act as a workaround for your problem as that caused
libcrypto.so to be unloaded and thereby forgetting its old obsolete
callbacks into crypto.
> A new question, each time loading crypto.so will cause load() to be called, then
> it means CRYPTO_set_mem_functions() should be called again, I assume it should
> correctly set the callback funcs. But from instruction-level tracing and the
> src, it simply returned in line 129 because "!allow_customize" is true.
>
> (dbx) stepi
> t@REDACTED (l@REDACTED) stopped in CRYPTO_set_mem_functions at 0xfd053b0c
> 0xfd053b0c: CRYPTO_set_mem_functions+0x0034: retl
>
> 125 int CRYPTO_set_mem_functions(void *(*m)(size_t), void *(*r)(void *,
> size_t),
> 126 void (*f)(void *))
> 127 {
> 128 if (!allow_customize)
> 129 return 0;
> 130 if ((m == 0) || (r == 0) || (f == 0))
> 131 return 0;
> 132 malloc_func=m; malloc_ex_func=default_malloc_ex;
> 133 realloc_func=r; realloc_ex_func=default_realloc_ex;
> 134 free_func=f;
> 135 malloc_locked_func=m;
> malloc_locked_ex_func=default_malloc_locked_ex;
> 136 free_locked_func=f;
> 137 return 1;
> 138 }
>
Yes, I've seen this to. It looks like a simple way that openssl uses to
"protect" itself against re-registering of callbacks that it does not
support. Looking at the openssl code, 'allow_customize' seems to be set
to false at the first memory allocation. This means that if you try to
(re)set memory callbacks *after* doing something that allocated memory,
CRYPTO_set_mem_functions() will fail by returning 0. crypto.c ignores
this failure however and that's part of this problem I guess.
I have written a work ticket regarding the more general problem of NIF
libraries that do not support unloading.
Your solution seems to be running R15B02.
Another even safer solution could be to build crypto with static linking
to openssl (configure flag --disable-dynamic-ssl-lib).
/Sverker, Erlang/OTP
More information about the erlang-bugs
mailing list