How to deploy an upgrade (build with rebar3)

Fred Hebert mononcqc@REDACTED
Mon Feb 24 15:15:05 CET 2020


On Sun, Feb 23, 2020 at 1:37 AM by <by@REDACTED> wrote:

> Although there are some other problems (on my running system, the
> established WebSocket connection (process) been killed after the upgrade),
> the whole thing works fine.
>
> I will investigate more about why the running process been killed after
> the upgrade, I think the link below might be helpful:
>
> https://github.com/lrascao/rebar3_appup_plugin/blob/develop/doc/UPGRADE_DOWNGRADE.md#soft-vs-brutal-purge
>
>
Yeah it's likely the soft vs. brutal purge thing will be significant. On a
long-lived process that may be hanging and doing nothing (just waiting for
a message), more than one upgrade can take place before the next message is
handled, which will cause a failure when the process is killed for holding
on to an old version. Either all these processes need to be sent a message
to force them to load a local module reference (fully-qualified call), or
drop old ones (i.e. local funs like fun f/2 or closures like fun(X) -> X +
1 end) which are technically references to the module version whence they
were declared.

Another common pattern is going to see some servers die when their acceptor
pool crashes for the same reasons -- the acceptors maintain a reference to
an older module, and following a couple of reloads, they get killed all at
once in a kind of storm. Ranch, which underpins cowboy, is the most
frequently seen one doing this as its acceptor pool isn't safe (all accept
calls wait for infinity
<https://github.com/ninenines/ranch/blob/ae84436f7ceed06a09e3fe1afb30e675579b7621/src/ranch_acceptor.erl#L35>,
which as far as I can tell is a performance-based decision); a workaround
for this is to shrink your acceptor pool to be small enough that you can
reasonably expect all acceptors to be used between two upgrades.

A server that would otherwise be safe for reloads in that case could be
Elli, which has a timeout-then-reload pattern
<https://github.com/elli-lib/elli/blob/c16ac7dca11947cbb1ded6f72764943193e6fdcf/src/elli_http.erl#L48-L54>
that specifically guards for upgrades (as long as they happen less
frequently than the accept timeout duration), or YAWS, which will start a
new acceptor process on a timeout
<https://github.com/klacke/yaws/blob/365d4f83d5f29907945b6efa8575749150af022d/src/yaws_server.erl#L1133-L1146>
for a similar result.

Do note that this latter point on acceptor pools is only tangentially
related to websockets, since having the server handle its acceptor pool
timing out (and allowing upgrades) will not prevent websocket connections
from dying during upgrades; they're just another thing to test and worry
about.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20200224/8f0c2580/attachment.htm>


More information about the erlang-questions mailing list