[erlang-questions] OTP 22.1 socket.erl somehow breaks message delivery or scheduler

Andreas Schultz andreas.schultz@REDACTED
Tue Oct 29 13:06:54 CET 2019

Hi Björn,

Thanks for the detailed and  thorough explanation.

Can you point me to the fix for the second bug?

I'm currently chasing what looks very much like the same issue, only the
message that is not arriving in time is originating in the inet driver
(gen_tcp) this time.
But it could also be something different or me being too stupid. A complete
explanation is here

Many thanks,

Am Di., 29. Okt. 2019 um 12:40 Uhr schrieb Björn Gustavsson <

> On Tue, Oct 29, 2019 at 9:14 AM Andreas Schultz
> <andreas.schultz@REDACTED> wrote:
> >
> > I'm sure that over time all the benefits of the new compiler
> architecture are well worth the price.
> > But for OTP 22.x it has led to a few, but highly frustrating problems.
> Even the few incorrect code generation bugs have led to very hard to
> understand problems for the users.
> >
> > I'm not sure that scope and impact of this bug here is even fully
> understood.
> >
> > My demonstration code used a bare receive, but the code that actually
> triggered it used a plain gen_server. The result of the bug was that
> gen_sever:calls seemed to arrive extremely late (multiple seconds).
> > It would therefore seem that the incorrect code was present in main
> receive loop of gen_server (and probably also gen_statem and gen_event).
> This would mean almost all Erlang applications on OTP 22.x could be
> affected.
> >
> > The effects might go unnoticed in many tests cases in other projects,
> until it causes unexplainable failures.
> >
> > Since all this was introduced in OTP 22, the sensible suggestion for
> everyone seem to be test OTP 22.x as well as they can, but to stay away
> from it for production use.
> >
> We have done some more investigating.
> The issues you saw are caused by TWO bugs,
> the compiler bug and a bug in the run-time
> system.
> The bug in the compiler caused a position in
> the message queue to be saved even when it
> was not guaranteed that a receive would be
> executed. The only module in OTP that was
> hit by this bug was the socket module, and
> in this case it would have been harmless
> without the other bug.
> The bug in the run-time system would cause
> the saved position to be used for a receive that
> was not supposed to use the saved position.
> This bug was introduced in OTP 21, but it is
> unlikely that it could ever be triggered by
> code emitted by the compiler in OTP 21.
> So what does that mean for production use
> of OTP 22.x?
> We know that OTP 22 is already used in
> production. The problems you saw were
> caused by the use of the socket module that
> was hit by the compiler bug. Unless one has
> another module that uses receive in a similar
> way to how socket uses it, using gen_server
> and the other modules is perfectly safe.
> Since the socket module was introduced
> in OTP 22 and is still experimental, it
> would be advisable to avoid it in production
> use for the moment.
> The fix for the compiler should be enough
> to fix the kind of problems you saw, but we
> will of course fix the bug in the run-time
> system as well.
> /Björn
> --
> Björn Gustavsson, Erlang/OTP, Ericsson AB


Andreas Schultz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20191029/58b009e4/attachment.htm>

More information about the erlang-questions mailing list