[erlang-questions] Sender punishment removed

Jesper Louis Andersen jesper.louis.andersen@REDACTED
Mon Jan 22 16:41:46 CET 2018


To be more concrete:

Amazon might have a fancy ALB load balancer.

Amazon might also use HTTP/2 in that fancy ALB load balancer.

HTTP/2 has a flow control system because you are muxing several streams
into one TCP connection. Also note that TCP has a send window.

Amazon might have defaulted their HTTP/2 Window frames to 16kilobyte in the
upload direction.

If you do the tcp plots, this leads to a fun situation where the packets
are flapping between MSS and ack-only packets, and your speed is limited by
the RTT of the line.

People running gRPC knows that "using the Amazon layer 4 load balancer
works", but they have yet to analyze the problem further. They know it
"doesn't work with HTTP/2".

gRPC has its own flow control (!!) on top of HTTP/2, because "gRPC has to
be transport agnostic in case you want to use UDP".

The end result is that you are not running at maximal utilization of the
underlying TCP connection.

...

Another thing to think about: If you employ flow control, you have to think
about the "lock" situation in which the flow control locks up the system.
If the sender can be flow controlled, the problem is twofold at sender and
at receiver side. This interaction is somewhat complex, compared to
interaction in one side only. Corollary: Go, with bounded channels, have
way more points where quasi-deadlock can occur than an Erlang program. But
the Erlang program can overflow its mailbox in contrast.


On Mon, Jan 22, 2018 at 3:25 PM Fred Hebert <mononcqc@REDACTED> wrote:

> I have written Handling Overload (https://ferd.ca/handling-overload.html)
> as tour of the multiple options in Erlang. It may prove helpful.
>
> On Mon, Jan 22, 2018 at 4:52 AM, Karl Nilsson <kjnilsson@REDACTED> wrote:
>
>> Thanks Jesper, keeping flow control to the input edge makes perfect sense
>> to me but developing a good feedback mechanism that is both safe and not
>> overly cautious is likely to be quite challenging in non-trivial,
>> potentially distributed domains.
>>
>> I have some thinking to do. :) Any further reading along these lines that
>> anyone knows of would be very welcome.
>>
>> On Fri, 19 Jan 2018 at 21:17 Jesper Louis Andersen <
>> jesper.louis.andersen@REDACTED> wrote:
>>
>>> As Lukas already wrote on slack somewhere:
>>>
>>> Imagine you have a 2000'es machine: single core, around 400 mhz if you
>>> are lucky. In that setting, sender punishment can in some situations
>>> rectify a system that is going overboard. We simply let the offending
>>> process have less time on the scheduler in the hope that the overloaded
>>> mailbox process can catch up and do its work. It is not a surefire
>>> solution, but it may avoid some situations in which the system would
>>> otherwise topple.
>>>
>>> Fast forward 18 years. Now, the machines are multicore, at least 4
>>> threads commonly. Here, a sender might live on one core whereas the
>>> reciever might live on another process. It is less clear why the punishment
>>> strategy is good: we get to stop the sender, but there were already a
>>> scheduler for the other core and it is still overloaded. Worse, perhaps all
>>> the other cores are able to send messages through to the overloaded process.
>>>
>>> As for the flow control: Erlang systems already employ flow control,
>>> namely TCP flow control between distributed nodes. I've seen two recent
>>> problems pertaining to having flow control inside flow control: gRPC has 3
>>> layers: gRPC itself, HTTP/2 and TCP. And HTTP/2 has a layer on top of TCP.
>>> This is dangerous as the flow control of the underlying system can
>>> interfere with the flow control of the system above.
>>>
>>> By extension, any Erlang-mechanism of flow control needs to protect
>>> against a scenario where your application has its own layer and make sure
>>> it doesn't interfere.
>>>
>>> Personally, I think Ulf Wiger's "Jobs" model paved the way[0]: Apply
>>> flow control on input edge of the system, but don't apply it internally. If
>>> you do this correct, then a system shouldn't overload because of the
>>> border-limit. If you apply internal flow control, you also expose yourself
>>> to the danger of an artificial internal bottleneck. Rather, sample
>>> internally and use this as a feedback mechanism for the input edge.
>>>
>>> Also note distributed flow control is a considerably harder problem to
>>> solve, and since Erlang is distributed by default, any general solution has
>>> to address this as well.
>>>
>>> [0] https://github.com/uwiger/jobs/blob/master/doc/erlang07g-wiger.pdf
>>>
>>> On Fri, Jan 19, 2018 at 9:51 AM Karl Nilsson <kjnilsson@REDACTED>
>>> wrote:
>>>
>>>> So I saw that the sender punishment was removed in [1]. The commit
>>>> message doesn't outline any of the reasoning behind this. Is there any more
>>>> details available about this anywhere I can read? I understand it never
>>>> really worked that well but it would still be interesting to understand a
>>>> bit further.
>>>>
>>>> On a similar note what is the current thinking on flow control between
>>>> erlang processes? Are there any improvements on mixing in a few calls in
>>>> with the casts?
>>>>
>>>> [1]
>>>> https://github.com/erlang/otp/commit/2e601a2efc19d64ed0628a5973596e6331ddcc7c
>>>>
>>> _______________________________________________
>>>> erlang-questions mailing list
>>>> erlang-questions@REDACTED
>>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>>
>>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180122/fefa48e1/attachment.htm>


More information about the erlang-questions mailing list