[PATCH] Fix hang when calling functions in a module with an on_load attribute from a native module

Mikael Pettersson mikpe@REDACTED
Mon Oct 4 16:15:17 CEST 2010


Tuncer Ayaz writes:
 > On Mon, Oct 4, 2010 at 2:28 PM, Mikael Pettersson wrote:
 > > Paul Guyot writes:
 > >  > >> --- otp_src_R14B/lib/kernel/src/error_handler.erl.~1~   2010-09-13
 > >  > >> 19:00:22.000000000 +0200
 > >  > >> +++ otp_src_R14B/lib/kernel/src/error_handler.erl       2010-09-24
 > >  > >> 13:44:09.000000000 +0200
 > >  > >> @@ -17,6 +17,7 @@
 > >  > >>  %% %CopyrightEnd%
 > >  > >>  %%
 > >  > >>  -module(error_handler).
 > >  > >> +-compile(no_native).
 > >  > >>
 > >  > >>  %% A simple error handler.
 > >  > >
 > >  > > Any objections to applying this fix to dev? Otherwise let's
 > >  > > include it as a trivial workaround that makes enable-native-libs
 > >  > > useable.
 > >  >
 > >  >
 > >  > This is no trivial workaround and it works by sheer luck.
 > >
 > > The loops always involve a native-mode error_handler, so by
 > > eliminating that the loops are eliminated. Â That's not "by sheer
 > > luck".
 > >
 > >> What this bug reveals is that there is a major design flaw with
 > >> on_load and HiPE, as illustrated by the fact that the emulator
 > >> goes into an infinite loop if error_handler is natively compiled:
 > >> every time there is a remote call to a function in a module that
 > >> has an on_load attribute, the emulator goes through
 > >> error_handler:undefined_function! I think it doesn't infinite loop
 > >> when error_handler is not native because it goes through the apply
 > >> BIF which works around the linkage issue.
 > >
 > > The short explanation is that BEAM-mode code doesn't loop in this
 > > case because BEAM has an additional level of indirection between
 > > caller and callee. Â The BEAM code loader updates the middle level
 > > (the code address in Export entries) which allows callers to
 > > immediately see the updates.
 > >
 > >  > A quick benchmark shows that (on dev, without --enable-native-libs)
 > >  > calling a remote function in an on_load attribute currently takes
 > >  > 100+ times longer from HiPE than from BEAM.
 > >
 > > Modules with on_load attributes are supposed to be prevented from
 > > being compiled to native code, so the "from HiPE" case shouldn't
 > > exist.
 > >
 > > If you meant to write that a call from native code in module N1 to
 > > BEAM code in module B2, where B2 also has an on_load, is slow, then
 > > yes I know that and it's because of the extra roundtrip through
 > > error_handler. In the long term my change to how native code links
 > > to BEAM code will fix that. In the short term you can work around it
 > > by splitting modules into on_load and non-on_load bits.
 > 
 > Speaking of short term fixes, what would you prefer to see in the dev
 > branch?
 >  S) split error_handler into on_load/non-on_load bits

The splitting I mentioned referred to the application module (B2 above),
not error_handler.

 >  D) assuming you see no drawback add no_native directive to
 >     error_handler

While it doesn't fix Paul's performance issue it does fix the
correctness issue, which is why I suggested it in the first
place some time ago.

 >  N) none of the above two options

error_handler gets no_native to fix the correctness issue.
Application developers who hit the performance issue apply
the splitting workaround.

/Mikael


More information about the erlang-patches mailing list