<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Oct 29, 2019 at 1:07 PM Andreas Schultz <<a href="mailto:andreas.schultz@travelping.com">andreas.schultz@travelping.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">Hi Björn,<div><br></div><div>Thanks for the detailed and thorough explanation. </div><div><br></div><div>Can you point me to the fix for the second bug?</div></div></div></blockquote><div><br></div><div>You can get a temporary workwaround here: <a href="https://github.com/garazdawi/otp/tree/lukas/erts/fix-sigq-save-bug">https://github.com/garazdawi/otp/tree/lukas/erts/fix-sigq-save-bug</a></div><div><br></div><div>It is most likely not the fix we are going to merge as it is too conservative about when the message queue optimization can trigger, but it is correct.</div><div><br></div><div>Lukas</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div><br></div><div>I'm currently chasing what looks very much like the same issue, only the message that is not arriving in time is originating in the inet driver (gen_tcp) this time.</div><div>But it could also be something different or me being too stupid. A complete explanation is here <a href="http://erlang.org/pipermail/erlang-questions/2019-September/098419.html" target="_blank">http://erlang.org/pipermail/erlang-questions/2019-September/098419.html</a></div><div><br></div><div>Many thanks,</div><div>Andreas</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Am Di., 29. Okt. 2019 um 12:40 Uhr schrieb Björn Gustavsson <<a href="mailto:bjorn@erlang.org" target="_blank">bjorn@erlang.org</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Tue, Oct 29, 2019 at 9:14 AM Andreas Schultz<br>
<<a href="mailto:andreas.schultz@travelping.com" target="_blank">andreas.schultz@travelping.com</a>> wrote:<br>
><br>
> I'm sure that over time all the benefits of the new compiler architecture are well worth the price.<br>
> But for OTP 22.x it has led to a few, but highly frustrating problems. Even the few incorrect code generation bugs have led to very hard to understand problems for the users.<br>
><br>
> I'm not sure that scope and impact of this bug here is even fully understood.<br>
><br>
> My demonstration code used a bare receive, but the code that actually triggered it used a plain gen_server. The result of the bug was that gen_sever:calls seemed to arrive extremely late (multiple seconds).<br>
> It would therefore seem that the incorrect code was present in main receive loop of gen_server (and probably also gen_statem and gen_event). This would mean almost all Erlang applications on OTP 22.x could be affected.<br>
><br>
> The effects might go unnoticed in many tests cases in other projects, until it causes unexplainable failures.<br>
><br>
> Since all this was introduced in OTP 22, the sensible suggestion for everyone seem to be test OTP 22.x as well as they can, but to stay away from it for production use.<br>
><br>
<br>
We have done some more investigating.<br>
<br>
The issues you saw are caused by TWO bugs,<br>
the compiler bug and a bug in the run-time<br>
system.<br>
<br>
The bug in the compiler caused a position in<br>
the message queue to be saved even when it<br>
was not guaranteed that a receive would be<br>
executed. The only module in OTP that was<br>
hit by this bug was the socket module, and<br>
in this case it would have been harmless<br>
without the other bug.<br>
<br>
The bug in the run-time system would cause<br>
the saved position to be used for a receive that<br>
was not supposed to use the saved position.<br>
This bug was introduced in OTP 21, but it is<br>
unlikely that it could ever be triggered by<br>
code emitted by the compiler in OTP 21.<br>
<br>
So what does that mean for production use<br>
of OTP 22.x?<br>
<br>
We know that OTP 22 is already used in<br>
production. The problems you saw were<br>
caused by the use of the socket module that<br>
was hit by the compiler bug. Unless one has<br>
another module that uses receive in a similar<br>
way to how socket uses it, using gen_server<br>
and the other modules is perfectly safe.<br>
<br>
Since the socket module was introduced<br>
in OTP 22 and is still experimental, it<br>
would be advisable to avoid it in production<br>
use for the moment.<br>
<br>
The fix for the compiler should be enough<br>
to fix the kind of problems you saw, but we<br>
will of course fix the bug in the run-time<br>
system as well.<br>
<br>
/Björn<br>
<br>
-- <br>
Björn Gustavsson, Erlang/OTP, Ericsson AB<br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><p><span style="font-family:verdana,geneva,sans-serif;font-size:10pt">Andreas Schultz</span></p></div></div>
</blockquote></div></div>