[PATCH] Fix hang when calling functions in a module with an on_load attribute from a native module
Wed Sep 22 15:55:27 CEST 2010
Since I don't know how to send the patch and a comment which is not the commit message with git send-email, I am commenting in a reply to my patch e-mail.
There is a bug in R14B (and dev branch) with native modules having an on_load attribute OTP is configured with --enable-native-libs.
This bug was reported here:
And a simpler way to reproduce it is to configure R14B with --enable-native-libs and then to start a shell and invoke 'crypto:md5("").' This call will never return.
The bug is actually a conjunction of the following two patches:
* fix of crash in finish_after_on_load_2
* load native code for modules loaded before the code server
When Mikael proposed a simplified version of my fix for the crash in finish_after_on_load_2, the second patch was not yet graduated and therefore the native code of error_handler was not loaded. The fix in R14B eventually consists in letting the call to error_handler:undefined_function/3, and if this function is native (which is the case with the second patch), it yields an infinite loop.
In fact, the infinite loop can be observed on a pristine OTP_R14A installation with 90108371943ace300f1dcf1543545a40be035a4a and the following code entered at the shell prompt:
Since my original patch for the crash does change the function glue to avoid calling error_handler:undefined_function/3 when the module has been loaded and on_load succeeded, it also fixes the hang bug by avoiding the infinite loop. My original patch was here:
The submitted patch is simply a resolved merge of this original patch on dev branch, with an updated commit message and a comment slightly rephrased. It was just sent on this list and is also available on github:
git fetch git://github.com/pguyot/otp.git pg/fix-hipe-on_load-hang
Mikael, if you want to provide a better patch that would fix the performance issue hinted here, I'm all for it. I would just like to argument in favor of including the non-regression test case, even if the HiPE team runs a separate, closed-source test suite. Indeed, this test case currently fails on R14B (without the fix) and from what I understand of the graduation policy, had this test case been included with Mikael's fix for the crash, the second patch would have been rejected and the bug would have been prevented in R14B.
+33.175000290 - 62 bis rue Gay-Lussac, 75005 Paris
More information about the erlang-patches