[PATCH] Fix hang when calling functions in a module with an on_load attribute from a native module

Mikael Pettersson mikpe@REDACTED
Fri Sep 24 14:24:53 CEST 2010


Paul Guyot writes:
 > Hello,
 > 
 > Since I don't know how to send the patch and a comment which is not the commit message with git send-email, I am commenting in a reply to my patch e-mail.
 > 
 > There is a bug in R14B (and dev branch) with native modules having an on_load attribute OTP is configured with --enable-native-libs.
 > 
 > This bug was reported here:
 > http://www.erlang.org/cgi-bin/ezmlm-cgi?2:mss:1988:201009:nfbmkdhkkahljajmfaai
 > 
 > And a simpler way to reproduce it is to configure R14B with --enable-native-libs and then to start a shell and invoke 'crypto:md5("").' This call will never return.

I grabbed a pristine R14B tarball, unpacked it, did ./configure --enable-native-libs, make,
started bin/erl, and evaluated crypto:md5("").  There was no hang.

 > The bug is actually a conjunction of the following two patches:
 > * fix of crash in finish_after_on_load_2
 > http://github.com/erlang/otp/commit/90108371943ace300f1dcf1543545a40be035a4a
 > and
 > * load native code for modules loaded before the code server
 > http://github.com/erlang/otp/commit/a8b8ec5e858da86531933b545f752f436e411b58
 > 
 > When Mikael proposed a simplified version of my fix for the crash in finish_after_on_load_2, the second patch was not yet graduated and therefore the native code of error_handler was not loaded. The fix in R14B eventually consists in letting the call to error_handler:undefined_function/3, and if this function is native (which is the case with the second patch), it yields an infinite loop.
 > 
 > In fact, the infinite loop can be observed on a pristine OTP_R14A installation with 90108371943ace300f1dcf1543545a40be035a4a and the following code entered at the shell prompt:
 > 
 > hipe:c(error_handler),
 > crypto:md5("").

Tried this too in the R14B I built above.  Still no hang.

I agree there _may_ be a recursion between the native-traps-to-beam mechanism
and the error_handler module.  However, the real problem is that the chosen
mechanism (point to target MFA's BEAM code) isn't flexible enough to handle
newer features like on_load or (apparently) a native-mode error_handler.

My planned fix is to make remote calls link to the target's Export* instead,
just like BEAM does, which should solve the problems.  This will however
require HiPE to use different kinds of trap-to-beam stubs for remote and local
calls, since local calls must not and often cannot go via Export entries.

A simpler workaround for the error_handler issue (which I couldn't reproduce)
is to just never compile error_handler to native code.  It's not like there's
a lot to gain by doing that.  Please try the patch below.

/Mikael

--- otp_src_R14B/lib/kernel/src/error_handler.erl.~1~	2010-09-13 19:00:22.000000000 +0200
+++ otp_src_R14B/lib/kernel/src/error_handler.erl	2010-09-24 13:44:09.000000000 +0200
@@ -17,6 +17,7 @@
 %% %CopyrightEnd%
 %%
 -module(error_handler).
+-compile(no_native).
 
 %% A simple error handler.
 


More information about the erlang-patches mailing list