[erlang-questions] How does Erlang TCP determine the end of a TCP stream?

Jesper Louis Andersen jesper.louis.andersen@REDACTED
Tue Oct 31 20:38:36 CET 2017


On Tue, Oct 31, 2017 at 3:41 PM code wiget <codewiget95@REDACTED> wrote:

> Handle_info({tcp/ssl, Sock, Data}, State) -> … and every time that I
> receive data into the socket, it comes as the “full package”, meaning that
> when another server sends a message to it over the socket, It is received
> as the full length of the packet every time with no extra bytes here or
> there. On the other hand, a co-worker’s server has to implement socket
> reads and can’t determine one “group” from another. They have to read in
> the length or set delimiters.
>

If you set an option such as {packet, 4} on the socket, then the VM will
expect a 4 byte big endian length header followed by that many bytes in
payload. Running with {active, once} or {active, N} will have the socket
send you payloads one message at a time, stripped of said header.

If you don't set an option such as {packet, 4} then the VM can send you
anything from 1 byte to buffering everything for a while and deliver a
megabyte to you. In this case, you are receiving chunks of the stream at a
time. It is common to receive around the MTU of the underlying network if
your rate is fairly low (something like 1440-1460 bytes on ethernet is
typical). You will have to do some work on your end in order to handle the
case where one chunk doesn't have all the data necessary for a correct
decode.

The typical solution is to buffer the partial chunk in the process and then
append it when the next message arrives, trying to decode then.

It is a coincidence if you are in the latter of the above cases and happen
to receive things in "full package" form. It'll break in any real network
setting. Beware localhost as an interface, which often has a 16K or 64K
MTU. Packets are then not broken here, but will be in any real network.

If you can, run with something like {packet, 4}. It is simple and works.
Delimiters are worse because you have to scan for them and you often need
to escape them in your payload. A Length-Value-Type encoding provides
framing in which you don't have to scan and you don't need to escape.
Putting the type last in the packet is debatable, but one advantage is that
other languages have to allocate a full buffer which can often eliminate
certain security-concerning bad implementations.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20171031/cff7c1a9/attachment.htm>


More information about the erlang-questions mailing list