Erlang philosophy explained (was Re: Joe's "deep trickery" )

Fri Feb 28 17:15:44 CET 2003

On Fri, 28 Feb 2003, Chris Pressey wrote:

> But I've been using a very much simpler method for starting generic TCP/IP
> servers, roughly:
> 
... cut ...

> This version seems to work ok for me, and even with those improvements, it
> could hardly be called deep trickery; so in deference to Joe's wisdom, and
> knowing how much of a fan he is of simplicity, I must assume either:
> 
> a) mine has a fatal flaw or flaws that I can't presently perceive, or
> b) Joe forgot the word "efficiently" before "arrange" in his sentence
> 

  Neither  really, If  you compare  the two  you'll find  our  code is
pretty  similar -  I have  a bit  of extra  stuff so  I can  limit the
maximum  number of  simultaneously open  sockets, and  close  down all
sockets etc.

  I think  if you added this to  your code they'd end  up being pretty
similar in length - they *should* be similar since they solve the same
problem.

> My concern is mainly that an intimidating module like tcp_server.erl could
> scare off new Erlang programmers by giving them the impression that
> working with TCP/IP in Erlang is a bit nasty.  It's not nasty at all, at
> least I don't think it is unless you want to push the envelope.

Perhaps for the tutorial I should put in the simpler version.

> This is otherwise a fine tutorial.  I like it.
> 
> Middlemen gets me thinking: are there generic consumer/producer patterns
> that can be packaged?  I find I'm writing a lot of processes along the
> lines of: accept data in arbitrary-sized chunks from one process and
> dispense it in chunks of some calculated size to another process,
> sometimes reformatting it on the way.  

  yes yes yes ^ 100

  Congratulations you have discovered the Erlang philosophy

  Let me reformulate what you said in another way to clarify this.

> Middlemen gets me thinking: are there generic consumer/producer patterns
> that can be packaged?  I find I'm writing a lot of processes along the
> lines of: accept data in arbitrary-sized chunks from one process and
> SEND IT AS AN ERLANG MESSAGE TO ANOTHER PROCESS

  I'm kicking myself  here, this way of programming  was so obvious to
me that I never explicitly wrote  it down.  I always used the *say* it
when giving lectures but never actually committed it to paper.

  The Erlang "philosophy" is "everything is an Erlang process" 

  Remember, Erlang processes share no data and only interact by exchanging
Erlang messages.

  So if you have a non-Erlang thing  you should fake it up so that the
other things in the system think that it *is* an Erlang process.

  Then everything become ridiculously easy.

  That's where the middle-man comes in:

  Back to my tutorial. A web sever is like this:

	         +---------------------+                  +--------+
    ------>------| Middle man          |--------->--------| Web    |
    TCP/packets  | defragments packets |  {get,URL,Args}  | server |
                 | parse HTTP requests |                  |        |
    ------<------| and formats HTTP    |---------<--------|        |
                 | responses           |  {Header,Data}   +--------+
                 +---------------------+

  The middle man turns the HTTP data stream (where TCP can fragment the
packets) into a nice stream of fully parsed Erlang terms.

  An HTTP/1.0 server is trivial:

	server() ->
	    receive
		{From, {get,URL,Args}} ->
			Response = process_get(URL, Args),
		        From ! {self(), Response}
	    end.

  And an HTTP/1.1 server with keep-alive sockets

	server() ->
	    loop().

        loop() ->
	    receive
		{From, {get,URL,Args}} ->
			Response = process_get(URL, Args),
		        From ! {self(), Response},
		        loop();
		after 10000 ->
			exit(timeout)
	    end.

   Which is *very* clear and easy to write etc.

  If you  munge these  into a  single process you  get an  unholy mess
(this is what I call getting  the concurrency model wrong) - using one
process per connection is simple obvious and highly efficient (as I've
said earlier YAWS beats the socks of Apache)

  <aside> - in a sequential  language you are virtually *forced* to get
the concurrency model wrong - remember the world *is* parallel, in the
world  things really do  happen *concurrently*  and trying  to program
concurrent  things in  a sequential  language is  just plain  stupid -
often the  biggest mistake people make  in Erlang is  not using enough
processes - the best code maps the concurrent structure of the problem
1:1 onto a set of processes.

  If  you think  about  the web  server  - when  a  server has  12,456
simultaneous connections  there are actually  at that instant  in time
12,456 clients connected  to the server, and 12456  people are staring
at the screen waiting for an answer  - kind of scary really :-) - this
problem  should at  this point  of  time have  spawned exactly  24,912
processes to  handle this  (which is why  you can't  do it in  Java or
anything that eventually creates an OS process to do this)
</aside>

   Look what we've done here, we've kind of "lifted" the abstraction level
of a device driver.

  In  unix things  are  nice  because *everything*  is  a producer  or
consumer of  flat streams of  bytes - sockets  and pipes are  just the
plumbing that carry the data from a producer to a consumer.

  In Erlang the data level is  lifted instead of flat stream of bytes,
everything is an object of type "term" but *no parsing or deparsing is
necessary" and no fragmentation of the term can occur.

  We might like to ask what a unix pipe:

	cat <file1 | x | y | z > file2

  Might look like in Erlang

	This is surely 4 process linked together

   cat is a process which sends a stream of

	{self(), {line, Str}}

   followed by a stream of 

	{self(), eof}

   messages

	x and y are processes that look like

	loop(IN, Out) ->
		receive
			{In, Msg} ->
				...		
				Out ! {self(), Msg2}
				loop(In, Out)

   etc.

  All of this makes me wonder if perhaps the modules with API ways of
programming is wrong.

  Perhaps we  should be thinking  more in terms of  abstractions that
allow us to glue things together with pipes etc.

  This seems to be related to my "bang bang" notation but I haven't yet
made the connection - I'm still thinking about it.

> Is there something like a
> gen_stream that I've overlooked in OTP?

  No

> then I start thinking: why the hell do I want more gen_* modules when
> I rarely ever use the existing ones?  For better or worse, I usually build
> my own with receive and !, which I find easier to read (at least while
> coding,) with the assumption that some day, if it becomes really
> important, I'll rewrite them to be gen_*'s.  So I sat myself down the
> other day and forced myself to write one each gen_server, gen_event and
> gen_fsm, to practice.

  Me too :-) The gen_ things  were put together for projects with lots
of programmers  in the same  team - without  gen_server (say) in  a 20
programmer projects  we'd end up  with 20 ways  of writing a  server -
using one way means the people can understand each other's code.

  For small projects you can happily "roll you own"

> 
> I learned more about them, but unfortunately I learned more about why I
> never end up using them.  I totally understand Suresh's confusion over
> gen_event:call.  It's unintuitive until you think about how Erlang grew up
> - in an ideal world, if you wanted an event handler you could treat like a
> server, you'd say -behaviour([gen_event, gen_server]).  Clearly, you can't
> do that, and gen_event:call smells like the workaround for it.
> 
> Also, it would be really nice if the event handler could actually *handle*
> events and not just react to them after they've happened - i.e. if
> gen_event:notify took, and returned, a term to represent state, perhaps
> modified by the event handlers.  (That way you wouldn't need
> gen_event:call either; you could configure an event handler using events.)
> 
> Anyway, sorry that got off on sort of a tangent.
> 
> -Chris
>