Erlang philosophy explained (was Re: Joe's "deep trickery" )

Sat Mar 1 02:56:38 CET 2003

On Fri, 28 Feb 2003 17:15:44 +0100 (CET)
Joe Armstrong <joe@REDACTED> wrote:

> On Fri, 28 Feb 2003, Chris Pressey wrote:
> 
> > But I've been using a very much simpler method for starting generic
> > TCP/IP servers, roughly:
> > 
> ... cut ...
> 
> > This version seems to work ok for me, and even with those
> > improvements, it could hardly be called deep trickery; so in deference
> > to Joe's wisdom, and knowing how much of a fan he is of simplicity, I
> > must assume either:
> > 
> > a) mine has a fatal flaw or flaws that I can't presently perceive, or
> > b) Joe forgot the word "efficiently" before "arrange" in his sentence
> > 
> 
>   Neither  really, If  you compare  the two  you'll find  our  code is
> pretty  similar -  I have  a bit  of extra  stuff so  I can  limit the
> maximum  number of  simultaneously open  sockets, and  close  down all
> sockets etc.
> 
>   I think  if you added this to  your code they'd end  up being pretty
> similar in length - they *should* be similar since they solve the same
> problem.

Yes, I realize now, after thinking about how to code the max servers
thing, that there appears to be no super simple way like I thought there
might be.

I still think that the warning in tcp_server, without any explanation of
what is going on, is pretty harsh for a tutorial, though.

> > My concern is mainly that an intimidating module like tcp_server.erl
> > could scare off new Erlang programmers by giving them the impression
> > that working with TCP/IP in Erlang is a bit nasty.  It's not nasty at
> > all, at least I don't think it is unless you want to push the
> > envelope.
> 
> Perhaps for the tutorial I should put in the simpler version.

Yes :)  If you could include a simpler version in the body of the
tutorial, and say something about how tcp_server basically boils down to
it, I think that would make it mind-blowingly approachable.

> 
> > This is otherwise a fine tutorial.  I like it.
> > 
> > Middlemen gets me thinking: are there generic consumer/producer
> > patterns that can be packaged?  I find I'm writing a lot of processes
> > along the lines of: accept data in arbitrary-sized chunks from one
> > process and dispense it in chunks of some calculated size to another
> > process, sometimes reformatting it on the way.  
> 
>   yes yes yes ^ 100
> 
>   Congratulations you have discovered the Erlang philosophy

Cool :)

I guess what tripped the switch for me was seeing that any problem can be
decomposed into a bunch of middlemen with state - that is, a middleman
that translates carriage returns to linefeeds isn't very interesting, but
one that buffers data until the next linefeed, then sends a whole line, is
much more interesting.

>   Let me reformulate what you said in another way to clarify this.
> 
> > Middlemen gets me thinking: are there generic consumer/producer
> > patterns that can be packaged?  I find I'm writing a lot of processes
> > along the lines of: accept data in arbitrary-sized chunks from one
> > process and SEND IT AS AN ERLANG MESSAGE TO ANOTHER PROCESS
>   
>   I'm kicking myself  here, this way of programming  was so obvious to
> me that I never explicitly wrote  it down.  I always used the *say* it
> when giving lectures but never actually committed it to paper.
> 
>   The Erlang "philosophy" is "everything is an Erlang process" 
>
>   Remember, Erlang processes share no data and only interact by
>   exchanging
> Erlang messages.
> 
>   So if you have a non-Erlang thing  you should fake it up so that the
> other things in the system think that it *is* an Erlang process.
> 
>   Then everything become ridiculously easy.
> 
>   That's where the middle-man comes in:
> 
> 
>   Back to my tutorial. A web sever is like this:
> 
>                  +---------------------+                  +--------+
>     ------>------| Middle man          |--------->--------| Web    |
>     TCP/packets  | defragments packets |  {get,URL,Args}  | server |
>                  | parse HTTP requests |                  |        |
>     ------<------| and formats HTTP    |---------<--------|        |
>                  | responses           |  {Header,Data}   +--------+
>                  +---------------------+
>   
>   The middle man turns the HTTP data stream (where TCP can fragment the
> packets) into a nice stream of fully parsed Erlang terms.
> 
>   An HTTP/1.0 server is trivial:
> 
> 	server() ->
> 	    receive
> 		{From, {get,URL,Args}} ->
> 			Response = process_get(URL, Args),
> 		        From ! {self(), Response}
> 	    end.
> 
>   And an HTTP/1.1 server with keep-alive sockets
> 
> 	server() ->
> 	    loop().
> 
>         loop() ->
> 	    receive
> 		{From, {get,URL,Args}} ->
> 			Response = process_get(URL, Args),
> 		        From ! {self(), Response},
> 		        loop();
> 		after 10000 ->
> 			exit(timeout)
> 	    end.
> 
>    Which is *very* clear and easy to write etc.

Yes.

>   If you  munge these  into a  single process you  get an  unholy mess
> (this is what I call getting  the concurrency model wrong) - using one
> process per connection is simple obvious and highly efficient (as I've
> said earlier YAWS beats the socks of Apache)
> 
>   <aside> - in a sequential  language you are virtually *forced* to get
> the concurrency model wrong - remember the world *is* parallel, in the
> world  things really do  happen *concurrently*  and trying  to program
> concurrent  things in  a sequential  language is  just plain  stupid -
> often the  biggest mistake people make  in Erlang is  not using enough
> processes - the best code maps the concurrent structure of the problem
> 1:1 onto a set of processes.
> 
>   If  you think  about  the web  server  - when  a  server has  12,456
> simultaneous connections  there are actually  at that instant  in time
> 12,456 clients connected  to the server, and 12456  people are staring
> at the screen waiting for an answer  - kind of scary really :-) - this
> problem  should at  this point  of  time have  spawned exactly  24,912
> processes to  handle this  (which is why  you can't  do it in  Java or
> anything that eventually creates an OS process to do this)
> </aside>
> 
>    Look what we've done here, we've kind of "lifted" the abstraction
>    level
> of a device driver.
>
>   In  unix things  are  nice  because *everything*  is  a producer  or
> consumer of  flat streams of  bytes - sockets  and pipes are  just the
> plumbing that carry the data from a producer to a consumer.
> 
>   In Erlang the data level is  lifted instead of flat stream of bytes,
> everything is an object of type "term" but *no parsing or deparsing is
> necessary" and no fragmentation of the term can occur.
> 
>   We might like to ask what a unix pipe:
> 
> 	cat <file1 | x | y | z > file2
> 
>   Might look like in Erlang
> 
> 	This is surely 4 process linked together
> 
>    cat is a process which sends a stream of
> 
> 	{self(), {line, Str}}
> 
>    followed by a stream of 
> 
> 	{self(), eof}
> 
>    messages
> 
> 	x and y are processes that look like
> 
> 	loop(IN, Out) ->
> 		receive
> 			{In, Msg} ->
> 				...		
> 				Out ! {self(), Msg2}
> 				loop(In, Out)
> 
> 
>    etc.
> 
>   All of this makes me wonder if perhaps the modules with API ways of
> programming is wrong.

Well, I definately have some thoughts on API's that coincide with that.

Current conceptions of what makes an API are crude.  What you generally
have is a list of entry points (names of synchronous function calls) with
the number of arguments and their types for each.  This is I think because
this fits in with current systems-construction linker technology well.

But it's not as powerful as it could be if it were to provide more
information (such as the complexity of the exported functions) and to
provide it in a more flexible way (in patterns, which may or may not be
synchronous, i.e. like Erlang messages.)

>   Perhaps we  should be thinking  more in terms of  abstractions that
> allow us to glue things together with pipes etc.
> 
>   This seems to be related to my "bang bang" notation but I haven't yet
> made the connection - I'm still thinking about it.
> 
> > Is there something like a
> > gen_stream that I've overlooked in OTP?
> 
>   No
> 
> > then I start thinking: why the hell do I want more gen_* modules when
> > I rarely ever use the existing ones?  For better or worse, I usually
> > build my own with receive and !, which I find easier to read (at least
> > while coding,) with the assumption that some day, if it becomes really
> > important, I'll rewrite them to be gen_*'s.  So I sat myself down the
> > other day and forced myself to write one each gen_server, gen_event
> > and gen_fsm, to practice.
> 
>   Me too :-) The gen_ things  were put together for projects with lots
> of programmers  in the same  team - without  gen_server (say) in  a 20
> programmer projects  we'd end up  with 20 ways  of writing a  server -
> using one way means the people can understand each other's code.

Yes, I guess I can see how any convention would help with that.

>   For small projects you can happily "roll you own"

And for that, it's just as important to know how they work, that is, to
know what they look like (that is, to be able to recognize a design
pattern instead of just knowing how to use the off-the-shelf
implementation of the pattern.)  I take that point well now...

-Chris