Parsing an incoming Shoutcast request

Joe Armstrong (AL/EAB) joe.armstrong@REDACTED
Tue May 30 10:15:38 CEST 2006


 

> -----Original Message-----
> From: Andrew Lentvorski [mailto:bsder@REDACTED] 
> Sent: den 30 maj 2006 00:06
> To: Joe Armstrong (AL/EAB)
> Cc: erlang-questions@REDACTED
> Subject: Re: Parsing an incoming Shoutcast request
> 
> Joe Armstrong (AL/EAB) wrote:
> > This is covered in "Joe's spitting in the sawdust Erlang tutorials 
> > Tutorial number 2"
> > (http://www.sics.se/~joe/tutorials/web_server/web_server.html)
> > 
> > You want to spawn (presumably) one handler/per connection -
> > 
> > I have a module that does this:
> >   
> >    http://www.sics.se/~joe/tutorials/web_server/tcp_server.erl
> 
> I stared at the code a while.  I clearly don't understand all 
> of the subtleties because I don't quite get why the dire 
> comments are needed.

I guess you mean the "don't mess with the following code unless you 
really know what you're doing .. bit" 

This is a frightener - and it means what it says. Code like this looks
simple (hopefully it is the least complex code that does the job) but
getting
it correct is not simple. *proving* code like this to be correct is
impossible (proving code very similar to this, but with certain 
simplifications, is just about possible).

So I believe this code to be correct - the comments are witness to the
fact
that when I write it went through several what were not correct.

Fortunately, you don't have to understand it in order to use it - just
understand
the API.

The basic idea tcp_server.erl is to provide a routine

	start_raw_server(Port, Max, Fun, Len)

You call this and you get the following:

One process per/session which evaluates Fun(Socket, Controller)

Controller is a Global process that can be used to synchronise
all the child processes (if you need to do this).

So the simplest possible server is created with something like
this:

	start_raw_server(Port, Max, fun handle/2, 0).

where

	handle(Socket, Controller) ->
	    receive
	       {Socket, tcp, Socket, Bin} ->
	           ...
   
 	       {tcp_closed, Socket} ->
		      ...
	     end.

Which is pretty much all that you need. The rest is just concerned with
parsing
and responding to the packets in Bin.
 		
> 
> So, my question is:
> 
> What concerns prevented you from using the 
> gen_server/supervisor behaviors?
>

Nothing - you could, merge the code in tcp_server into a gen_server but
the result
would probably be a mess and obscure what you are trying to do.

gen_server is designed primarily for writing servers in a classic
client-server architecture.

The underlying assumption in gen_server was that the client was an
erlang client, and 
that RPCs between the client and server are Erlang RPCs. gen_server
provides a number
of "bells and whistles" in the form of debugging support, decent
diagnostics when things crash
etc. It was not designed for lightweight servers that need no error
recovery, nor
for handling non erlang messages.

In the TCP case packets might be fragmented, and some protocols will be
involved, recombining
the packets, and parsing the protocol can be done in a gen_server, but
this is not what it was designed for.

The architecture I would choose is as follows:


               p +-------------+  u  +--------+   t
    ---socket ---|  defragment |-----| parse  |---------- Server
                 +-------------+     +--------+       |
                                                      |
               p +-------------+  u  +--------+   t   |
    ---socket ---|  defragment |-----| parse  |-------|
                 +-------------+     +--------+       |
                                                      |
                                                      |
               p +-------------+  u  +--------+   t   |
    ---socket ---|  defragment |-----| parse  |-------+
                 +-------------+     +--------+
                     

For each session you create pipeline of two processes. The first
defragments the packets 
the second parses complete packets.

In the diagram p means "fragmented TCP packets" u means "unfragmented
packets"
t means erlang terms. After parsing only erlang terms are "seen".

Now if the protocol  is particulary simple you might like to merge
defragmentation and parsing into the same process - but you should 
remember that this is an optimisation.

Another optimisation is not to parse the entire packet, since the
application might not wish to examine all the data in the protocol.
So some kind of lazy parser might be appropriate.

Having parsed the protocol we are now in the erlang world where
everything
is represented by pure erlang terms and pure erlang messages - so there
are no messy fragmented packets or protocol parsing problems.

NOW we can gen_servers. The backend server *is* appropriate to write as
a gen_server.

What about supervisors? - these are totally orgthogal to the above -
they
are added to define error recovery zones - a small application with only
a couple of
server will probably not need supervisors.

Supervisors are mainly used when you have dozens of servers to monitor.

The OTP behaviours are not magic bullets, they are just libraries of
erlang code for performing repetitious tasks in a consistent manner.

The main benefit of using (say) gen_servers is organisational - if you
have
a large team of programmers (say a few hundred) and they are all writing

client-servers, then it might be a good idea if they all go about this 
the same way.

The OTP libraries were written to the standardise the way severs, etc.-
were
written, this was so that one programmer in a large organisation could
understand
the code of another programmer in the same organisation.

There are, of course, no such benefits in a small one-person project.

Writing a client-server in Erlang is really easy. You need to understand
send receive and spawn. Making it fault tolerant is easy (you need to
understand
spawn_link, links, and exit signals). 

You can have 95% of all the fun by understanding how to roll-your own
client-servers
using spawn, send, receive etc.

gen_server provides one commonly used architectural pattern, in a
context where it is
suitable for large programmer teams.

Another source of information might be to look in the examples
I wrote years ago

http://www.erlang.org/examples/examples-2.0.html

Scroll down and look at:

FTP 

Some code which might help you is the ftp client and server
which shows how and FTP like server would have been implemented if the 
implementation language had been erlang.

SOS

A simple operating system (think of this as OTP very lite)

and

RSA

Everything you ever wanted to secure your applications :-)

Have fun

/Joe



 
> I'd rather not run blindly into them given that the comments 
> indicate that it took two Erlang experts to get this code 
> correct originally.
> 
> Thanks,
> -a
> 



More information about the erlang-questions mailing list