[PATCH] Fix hang when calling functions in a module with an on_load attribute from a native module
Mon Oct 4 16:15:17 CEST 2010
Tuncer Ayaz writes:
> On Mon, Oct 4, 2010 at 2:28 PM, Mikael Pettersson wrote:
> > Paul Guyot writes:
> > > >> --- otp_src_R14B/lib/kernel/src/error_handler.erl.~1~ 2010-09-13
> > > >> 19:00:22.000000000 +0200
> > > >> +++ otp_src_R14B/lib/kernel/src/error_handler.erl 2010-09-24
> > > >> 13:44:09.000000000 +0200
> > > >> @@ -17,6 +17,7 @@
> > > >> %% %CopyrightEnd%
> > > >> %%
> > > >> -module(error_handler).
> > > >> +-compile(no_native).
> > > >>
> > > >> %% A simple error handler.
> > > >
> > > > Any objections to applying this fix to dev? Otherwise let's
> > > > include it as a trivial workaround that makes enable-native-libs
> > > > useable.
> > >
> > >
> > > This is no trivial workaround and it works by sheer luck.
> > The loops always involve a native-mode error_handler, so by
> > eliminating that the loops are eliminated. Â That's not "by sheer
> > luck".
> >> What this bug reveals is that there is a major design flaw with
> >> on_load and HiPE, as illustrated by the fact that the emulator
> >> goes into an infinite loop if error_handler is natively compiled:
> >> every time there is a remote call to a function in a module that
> >> has an on_load attribute, the emulator goes through
> >> error_handler:undefined_function! I think it doesn't infinite loop
> >> when error_handler is not native because it goes through the apply
> >> BIF which works around the linkage issue.
> > The short explanation is that BEAM-mode code doesn't loop in this
> > case because BEAM has an additional level of indirection between
> > caller and callee. Â The BEAM code loader updates the middle level
> > (the code address in Export entries) which allows callers to
> > immediately see the updates.
> > > A quick benchmark shows that (on dev, without --enable-native-libs)
> > > calling a remote function in an on_load attribute currently takes
> > > 100+ times longer from HiPE than from BEAM.
> > Modules with on_load attributes are supposed to be prevented from
> > being compiled to native code, so the "from HiPE" case shouldn't
> > exist.
> > If you meant to write that a call from native code in module N1 to
> > BEAM code in module B2, where B2 also has an on_load, is slow, then
> > yes I know that and it's because of the extra roundtrip through
> > error_handler. In the long term my change to how native code links
> > to BEAM code will fix that. In the short term you can work around it
> > by splitting modules into on_load and non-on_load bits.
> Speaking of short term fixes, what would you prefer to see in the dev
> S) split error_handler into on_load/non-on_load bits
The splitting I mentioned referred to the application module (B2 above),
> D) assuming you see no drawback add no_native directive to
While it doesn't fix Paul's performance issue it does fix the
correctness issue, which is why I suggested it in the first
place some time ago.
> N) none of the above two options
error_handler gets no_native to fix the correctness issue.
Application developers who hit the performance issue apply
the splitting workaround.
More information about the erlang-patches