[erlang-questions] {active, N} to build an echo TCP server?

Wed Dec 28 10:37:45 CET 2016

Hello.

Can you give an example of what this gen_protocol could look like in use?

Thanks,
Oliver

Gesendet: Mittwoch, 28. Dezember 2016 um 07:08 Uhr
Von: zxq9 <zxq9@REDACTED>
An: erlang-questions@REDACTED
Betreff: Re: [erlang-questions] {active, N} to build an echo TCP server?
On 2016年12月28日 水曜日 05:14:10 you wrote:
> Hi Craig
>
> Exactly what I wanted to know. Thank you for the clear explanations.
>
> One last question: how one can determine the best value of N? (i.e not too
> big to avoid overwhelming the server, not too small to avoid N close to 1.

tl;dr:

{active, once} is sufficient for every case you are likely to ever encounter; departure from The True Path will lead you on a descent into madness and undiagnosable weirdness.

Discussion:

This is a bit tricky. Actually, I think I have only ever used {active, N} as single time -- and its use was a cheap hack based on some overly intimate knowledge of the sender, which is *not* a good way to build networked things (and computers are so insanely fast and getting ever faster as we go... this sort of thing is just not called for). So don't do this based on your knowledge of the sender, it will just make your hair catch fire some day -- "reap what you sow" and all that.

Because TCP is a stream you can never know how much data will be in the buffer so the consumption loop will have to be built in such a way that it can correctly interpret partial, complete, and overrun inputs (by "overrun" in this case I mean that you receive part of the next message in a receive). If the message is "I am a message\r\n" you may receive that whole thing in one message, or receive it as a series like "I am ", then "a mess" and finally "age\r\nSurprise! I am the next one\r\n". This is also true of whatever protocol you are using. Almost any standard protocol will have a way to either disambiguate the size of the message or have a trailing delimiter (like telnet, for example, delimits messages per line, so "\r\n" is the end of a single message -- you read through the input until you see "\r\n").

Some protocols will pack the beginning of a message with data you might want to interpret before you read the rest of the stream. For example, HTTP/1.1 does this with a header that tells the message size (but indicates the boundary between the end of the headers, which are of unknown size, and the body that you are told the size of with "\r\n\r\n"). Nearly every binary protocol goes one better by packing the total size of the expected message with a fixed-size field at the front that tells you how many bytes to expect -- so most binary protocols are very easy to interpret in Erlang with something like <<Size:32/big, Message:Size/binary, Rest/binary>> and a few additional function clauses that match on variations of that + "remaining length" checking + concatenating to the buffer as partials are received.

[Digression: I have a sort of canonical generic behavior for dealing with this we used at Tsuriai (and now elsewhere) which made the protocol definition itself a callback module (or in cases where the callback overhead was an issue, simply making the protocol handling functions in-module definitions, so that the binary interpreter and loop is basically boilerplate). I'm actually sort of surprised something like this isn't a generic OTP behavior, like gen_protocol or something.]

In none of these cases have I ever found it useful to do anything other than recv from a passive socket or (usually better) receive erlang messages from an {active, once} socket, resetting the socket active state at the top of the receiving loop. Setting the socket state is very fast, and there is almost never a case where I want "N receives of unknown size" because that just gives me an aggregate of unknown size. This naturally provides pushback on the sender based on whatever may be slowing the interpretation or handling of the received messages without requiring any more consideration.

As a side note, I find socket handling code inside of gen_* modules to usually be horribly ugly and often confusing to read if the sole purpose of the process is to handle a socket. When combined with other activities it might make sense, but that is usually not the case -- normally socket handlers handle sockets, other stuff does other stuff (they usually provide a 1::1 abstraction for whatever is on the other end of the connection). Anyway, it is fairly common to see socket handlers written using proc_lib, probably for the same reason I tend to do it. Or maybe that's just a trend that exists only here in Japan and we're techno outcasts living on the fringe of opinion.

_______________________________________________
erlang-questions mailing list
erlang-questions@REDACTED
http://erlang.org/mailman/listinfo/erlang-questions