Erlang philosophy explained (was Re: Joe's "deep trickery" )
Joe Armstrong
joe@REDACTED
Fri Feb 28 17:15:44 CET 2003
On Fri, 28 Feb 2003, Chris Pressey wrote:
> But I've been using a very much simpler method for starting generic TCP/IP
> servers, roughly:
>
... cut ...
> This version seems to work ok for me, and even with those improvements, it
> could hardly be called deep trickery; so in deference to Joe's wisdom, and
> knowing how much of a fan he is of simplicity, I must assume either:
>
> a) mine has a fatal flaw or flaws that I can't presently perceive, or
> b) Joe forgot the word "efficiently" before "arrange" in his sentence
>
Neither really, If you compare the two you'll find our code is
pretty similar - I have a bit of extra stuff so I can limit the
maximum number of simultaneously open sockets, and close down all
sockets etc.
I think if you added this to your code they'd end up being pretty
similar in length - they *should* be similar since they solve the same
problem.
> My concern is mainly that an intimidating module like tcp_server.erl could
> scare off new Erlang programmers by giving them the impression that
> working with TCP/IP in Erlang is a bit nasty. It's not nasty at all, at
> least I don't think it is unless you want to push the envelope.
Perhaps for the tutorial I should put in the simpler version.
> This is otherwise a fine tutorial. I like it.
>
> Middlemen gets me thinking: are there generic consumer/producer patterns
> that can be packaged? I find I'm writing a lot of processes along the
> lines of: accept data in arbitrary-sized chunks from one process and
> dispense it in chunks of some calculated size to another process,
> sometimes reformatting it on the way.
yes yes yes ^ 100
Congratulations you have discovered the Erlang philosophy
Let me reformulate what you said in another way to clarify this.
> Middlemen gets me thinking: are there generic consumer/producer patterns
> that can be packaged? I find I'm writing a lot of processes along the
> lines of: accept data in arbitrary-sized chunks from one process and
> SEND IT AS AN ERLANG MESSAGE TO ANOTHER PROCESS
I'm kicking myself here, this way of programming was so obvious to
me that I never explicitly wrote it down. I always used the *say* it
when giving lectures but never actually committed it to paper.
The Erlang "philosophy" is "everything is an Erlang process"
Remember, Erlang processes share no data and only interact by exchanging
Erlang messages.
So if you have a non-Erlang thing you should fake it up so that the
other things in the system think that it *is* an Erlang process.
Then everything become ridiculously easy.
That's where the middle-man comes in:
Back to my tutorial. A web sever is like this:
+---------------------+ +--------+
------>------| Middle man |--------->--------| Web |
TCP/packets | defragments packets | {get,URL,Args} | server |
| parse HTTP requests | | |
------<------| and formats HTTP |---------<--------| |
| responses | {Header,Data} +--------+
+---------------------+
The middle man turns the HTTP data stream (where TCP can fragment the
packets) into a nice stream of fully parsed Erlang terms.
An HTTP/1.0 server is trivial:
server() ->
receive
{From, {get,URL,Args}} ->
Response = process_get(URL, Args),
From ! {self(), Response}
end.
And an HTTP/1.1 server with keep-alive sockets
server() ->
loop().
loop() ->
receive
{From, {get,URL,Args}} ->
Response = process_get(URL, Args),
From ! {self(), Response},
loop();
after 10000 ->
exit(timeout)
end.
Which is *very* clear and easy to write etc.
If you munge these into a single process you get an unholy mess
(this is what I call getting the concurrency model wrong) - using one
process per connection is simple obvious and highly efficient (as I've
said earlier YAWS beats the socks of Apache)
<aside> - in a sequential language you are virtually *forced* to get
the concurrency model wrong - remember the world *is* parallel, in the
world things really do happen *concurrently* and trying to program
concurrent things in a sequential language is just plain stupid -
often the biggest mistake people make in Erlang is not using enough
processes - the best code maps the concurrent structure of the problem
1:1 onto a set of processes.
If you think about the web server - when a server has 12,456
simultaneous connections there are actually at that instant in time
12,456 clients connected to the server, and 12456 people are staring
at the screen waiting for an answer - kind of scary really :-) - this
problem should at this point of time have spawned exactly 24,912
processes to handle this (which is why you can't do it in Java or
anything that eventually creates an OS process to do this)
</aside>
Look what we've done here, we've kind of "lifted" the abstraction level
of a device driver.
In unix things are nice because *everything* is a producer or
consumer of flat streams of bytes - sockets and pipes are just the
plumbing that carry the data from a producer to a consumer.
In Erlang the data level is lifted instead of flat stream of bytes,
everything is an object of type "term" but *no parsing or deparsing is
necessary" and no fragmentation of the term can occur.
We might like to ask what a unix pipe:
cat <file1 | x | y | z > file2
Might look like in Erlang
This is surely 4 process linked together
cat is a process which sends a stream of
{self(), {line, Str}}
followed by a stream of
{self(), eof}
messages
x and y are processes that look like
loop(IN, Out) ->
receive
{In, Msg} ->
...
Out ! {self(), Msg2}
loop(In, Out)
etc.
All of this makes me wonder if perhaps the modules with API ways of
programming is wrong.
Perhaps we should be thinking more in terms of abstractions that
allow us to glue things together with pipes etc.
This seems to be related to my "bang bang" notation but I haven't yet
made the connection - I'm still thinking about it.
> Is there something like a
> gen_stream that I've overlooked in OTP?
No
> then I start thinking: why the hell do I want more gen_* modules when
> I rarely ever use the existing ones? For better or worse, I usually build
> my own with receive and !, which I find easier to read (at least while
> coding,) with the assumption that some day, if it becomes really
> important, I'll rewrite them to be gen_*'s. So I sat myself down the
> other day and forced myself to write one each gen_server, gen_event and
> gen_fsm, to practice.
Me too :-) The gen_ things were put together for projects with lots
of programmers in the same team - without gen_server (say) in a 20
programmer projects we'd end up with 20 ways of writing a server -
using one way means the people can understand each other's code.
For small projects you can happily "roll you own"
>
> I learned more about them, but unfortunately I learned more about why I
> never end up using them. I totally understand Suresh's confusion over
> gen_event:call. It's unintuitive until you think about how Erlang grew up
> - in an ideal world, if you wanted an event handler you could treat like a
> server, you'd say -behaviour([gen_event, gen_server]). Clearly, you can't
> do that, and gen_event:call smells like the workaround for it.
>
> Also, it would be really nice if the event handler could actually *handle*
> events and not just react to them after they've happened - i.e. if
> gen_event:notify took, and returned, a term to represent state, perhaps
> modified by the event handlers. (That way you wouldn't need
> gen_event:call either; you could configure an event handler using events.)
>
> Anyway, sorry that got off on sort of a tangent.
>
> -Chris
>
More information about the erlang-questions
mailing list