[erlang-questions] OTP 22.1 socket.erl somehow breaks message delivery or scheduler

Lukas Larsson lukas@REDACTED
Tue Oct 29 13:25:48 CET 2019

On Tue, Oct 29, 2019 at 1:07 PM Andreas Schultz <
andreas.schultz@REDACTED> wrote:

> Hi Björn,
> Thanks for the detailed and  thorough explanation.
> Can you point me to the fix for the second bug?

You can get a temporary workwaround here:

It is most likely not the fix we are going to merge as it is too
conservative about when the message queue optimization can trigger, but it
is correct.


> I'm currently chasing what looks very much like the same issue, only the
> message that is not arriving in time is originating in the inet driver
> (gen_tcp) this time.
> But it could also be something different or me being too stupid. A
> complete explanation is here
> http://erlang.org/pipermail/erlang-questions/2019-September/098419.html
> Many thanks,
> Andreas
> Am Di., 29. Okt. 2019 um 12:40 Uhr schrieb Björn Gustavsson <
> bjorn@REDACTED>:
>> On Tue, Oct 29, 2019 at 9:14 AM Andreas Schultz
>> <andreas.schultz@REDACTED> wrote:
>> >
>> > I'm sure that over time all the benefits of the new compiler
>> architecture are well worth the price.
>> > But for OTP 22.x it has led to a few, but highly frustrating problems.
>> Even the few incorrect code generation bugs have led to very hard to
>> understand problems for the users.
>> >
>> > I'm not sure that scope and impact of this bug here is even fully
>> understood.
>> >
>> > My demonstration code used a bare receive, but the code that actually
>> triggered it used a plain gen_server. The result of the bug was that
>> gen_sever:calls seemed to arrive extremely late (multiple seconds).
>> > It would therefore seem that the incorrect code was present in main
>> receive loop of gen_server (and probably also gen_statem and gen_event).
>> This would mean almost all Erlang applications on OTP 22.x could be
>> affected.
>> >
>> > The effects might go unnoticed in many tests cases in other projects,
>> until it causes unexplainable failures.
>> >
>> > Since all this was introduced in OTP 22, the sensible suggestion for
>> everyone seem to be test OTP 22.x as well as they can, but to stay away
>> from it for production use.
>> >
>> We have done some more investigating.
>> The issues you saw are caused by TWO bugs,
>> the compiler bug and a bug in the run-time
>> system.
>> The bug in the compiler caused a position in
>> the message queue to be saved even when it
>> was not guaranteed that a receive would be
>> executed. The only module in OTP that was
>> hit by this bug was the socket module, and
>> in this case it would have been harmless
>> without the other bug.
>> The bug in the run-time system would cause
>> the saved position to be used for a receive that
>> was not supposed to use the saved position.
>> This bug was introduced in OTP 21, but it is
>> unlikely that it could ever be triggered by
>> code emitted by the compiler in OTP 21.
>> So what does that mean for production use
>> of OTP 22.x?
>> We know that OTP 22 is already used in
>> production. The problems you saw were
>> caused by the use of the socket module that
>> was hit by the compiler bug. Unless one has
>> another module that uses receive in a similar
>> way to how socket uses it, using gen_server
>> and the other modules is perfectly safe.
>> Since the socket module was introduced
>> in OTP 22 and is still experimental, it
>> would be advisable to avoid it in production
>> use for the moment.
>> The fix for the compiler should be enough
>> to fix the kind of problems you saw, but we
>> will of course fix the bug in the run-time
>> system as well.
>> /Björn
>> --
>> Björn Gustavsson, Erlang/OTP, Ericsson AB
> --
> Andreas Schultz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20191029/1850e126/attachment.htm>

More information about the erlang-questions mailing list