[PATCH] Fix hang when calling functions in a module with an on_load attribute from a native module

Paul Guyot pguyot@REDACTED
Wed Sep 22 15:55:27 CEST 2010


Hello,

Since I don't know how to send the patch and a comment which is not the commit message with git send-email, I am commenting in a reply to my patch e-mail.

There is a bug in R14B (and dev branch) with native modules having an on_load attribute OTP is configured with --enable-native-libs.

This bug was reported here:
http://www.erlang.org/cgi-bin/ezmlm-cgi?2:mss:1988:201009:nfbmkdhkkahljajmfaai

And a simpler way to reproduce it is to configure R14B with --enable-native-libs and then to start a shell and invoke 'crypto:md5("").' This call will never return.

The bug is actually a conjunction of the following two patches:
* fix of crash in finish_after_on_load_2
http://github.com/erlang/otp/commit/90108371943ace300f1dcf1543545a40be035a4a
and
* load native code for modules loaded before the code server
http://github.com/erlang/otp/commit/a8b8ec5e858da86531933b545f752f436e411b58

When Mikael proposed a simplified version of my fix for the crash in finish_after_on_load_2, the second patch was not yet graduated and therefore the native code of error_handler was not loaded. The fix in R14B eventually consists in letting the call to error_handler:undefined_function/3, and if this function is native (which is the case with the second patch), it yields an infinite loop.

In fact, the infinite loop can be observed on a pristine OTP_R14A installation with 90108371943ace300f1dcf1543545a40be035a4a and the following code entered at the shell prompt:

hipe:c(error_handler),
crypto:md5("").

Since my original patch for the crash does change the function glue to avoid calling error_handler:undefined_function/3 when the module has been loaded and on_load succeeded, it also fixes the hang bug by avoiding the infinite loop. My original patch was here:
http://github.com/pguyot/otp/commit/495804b097aea4015e218d7b5da8d1372395580c

The submitted patch is simply a resolved merge of this original patch on dev branch, with an updated commit message and a comment slightly rephrased. It was just sent on this list and is also available on github:
git fetch git://github.com/pguyot/otp.git pg/fix-hipe-on_load-hang
http://github.com/pguyot/otp/commit/442599d7ed5464a0915a0f8ee3b822e2ccf8cd16

Mikael, if you want to provide a better patch that would fix the performance issue hinted here, I'm all for it. I would just like to argument in favor of including the non-regression test case, even if the HiPE team runs a separate, closed-source test suite. Indeed, this test case currently fails on R14B (without the fix) and from what I understand of the graduation policy, had this test case been included with Mikael's fix for the crash, the second patch would have been rejected and the bug would have been prevented in R14B.

Regards,

Paul
-- 
Semiocast                    http://semiocast.com/
+33.175000290 - 62 bis rue Gay-Lussac, 75005 Paris



More information about the erlang-patches mailing list