[erlang-questions] gen_tcp send_timeout and memory leak caveats
Max Lapshin
max.lapshin@REDACTED
Mon Jul 20 13:24:33 CEST 2015
Hi.
Seems that I met rather rare situation when send_timeout doesn't help from
memory leak.
I have an architecture, when central process send messages with big binary
blobs to subscribed processes via plain !
Each process takes messages and tries to write to socket which is
configured with {send_timeout, 10000}. If socket replies with {error,
timeout} then process is closed and no more messages are received.
This architecture can work only if gen_tcp:send will 100% return after 10,
maybe 15 or at least 60 seconds.
After customer has masterly configured bonding on 4 ethernet ports and cut
2 of 4 cables, we've got rare situation with 50% of packet loss.
It lead to very interesting situation when plenty of processes are locked
in prim_inet send for many minutes.
gen_tcp:send doesn't allow to pass nosuspend option to port_command and
thus it is impossible to send data to socket with real timeout. If tcp
socket is blocked, erlang:port_command will be blocked for a veeeryy long
time.
I had to refuse from gen_tcp to port_command and enable add missing timeout
there:
send(Socket, Data) ->
try erlang:port_command(Socket, Data, [nosuspend]) of
false ->
{error, busy};
true ->
receive
{inet_reply, Socket, Status} -> Status
after
20000 -> {error, timeout}
end
catch
error:Error ->
{error, Error}
end.
Another way to handle it is to launch a process that will scan
erlang:processes() and check their message_queue_len. Very "elegant" and
"good designed" solution, but it will work.
What is the proper way to handle such situation? And yes, I cannot use
gen_server:call from central process to clients, because I need it to scale
to 2000-3000 secondary processes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150720/f94dc001/attachment.htm>
More information about the erlang-questions
mailing list