[PATCH] Fix hang when calling functions in a module with an on_load attribute from a native module
Fri Sep 24 14:24:53 CEST 2010
Paul Guyot writes:
> Since I don't know how to send the patch and a comment which is not the commit message with git send-email, I am commenting in a reply to my patch e-mail.
> There is a bug in R14B (and dev branch) with native modules having an on_load attribute OTP is configured with --enable-native-libs.
> This bug was reported here:
> And a simpler way to reproduce it is to configure R14B with --enable-native-libs and then to start a shell and invoke 'crypto:md5("").' This call will never return.
I grabbed a pristine R14B tarball, unpacked it, did ./configure --enable-native-libs, make,
started bin/erl, and evaluated crypto:md5(""). There was no hang.
> The bug is actually a conjunction of the following two patches:
> * fix of crash in finish_after_on_load_2
> * load native code for modules loaded before the code server
> When Mikael proposed a simplified version of my fix for the crash in finish_after_on_load_2, the second patch was not yet graduated and therefore the native code of error_handler was not loaded. The fix in R14B eventually consists in letting the call to error_handler:undefined_function/3, and if this function is native (which is the case with the second patch), it yields an infinite loop.
> In fact, the infinite loop can be observed on a pristine OTP_R14A installation with 90108371943ace300f1dcf1543545a40be035a4a and the following code entered at the shell prompt:
Tried this too in the R14B I built above. Still no hang.
I agree there _may_ be a recursion between the native-traps-to-beam mechanism
and the error_handler module. However, the real problem is that the chosen
mechanism (point to target MFA's BEAM code) isn't flexible enough to handle
newer features like on_load or (apparently) a native-mode error_handler.
My planned fix is to make remote calls link to the target's Export* instead,
just like BEAM does, which should solve the problems. This will however
require HiPE to use different kinds of trap-to-beam stubs for remote and local
calls, since local calls must not and often cannot go via Export entries.
A simpler workaround for the error_handler issue (which I couldn't reproduce)
is to just never compile error_handler to native code. It's not like there's
a lot to gain by doing that. Please try the patch below.
--- otp_src_R14B/lib/kernel/src/error_handler.erl.~1~ 2010-09-13 19:00:22.000000000 +0200
+++ otp_src_R14B/lib/kernel/src/error_handler.erl 2010-09-24 13:44:09.000000000 +0200
@@ -17,6 +17,7 @@
%% A simple error handler.
More information about the erlang-patches