Distributed programming

Tue Mar 28 10:10:07 CEST 2006

> -----Original Message-----
> From: owner-erlang-questions@REDACTED 
> [mailto:owner-erlang-questions@REDACTED] On Behalf Of Pupeno
> Sent: den 27 mars 2006 13:07
> To: erlang-questions@REDACTED
> Subject: Distributed programming
> 
> Hello.
> I have a basic question on distributed programming. My case 
> is: I have a module called launcher which opens a tcp port 
> and listens to connections. 
> When a new connection is made, another process is launched to 
> attend that connection.
> Now, when I think about distributing this for load balancing 
> I see this
> possibilities:
> - Run various launchers on various computers and do the 
> balancing through DNS.

Yes - easy.

> - Run one launcher on one computer and make the launched 
> processes run on other computers.

Yes - but since you open the connection one machine, and the processing
is
taking place on some other computer you have three possibilities:

	1) tunnel all the data through the original machine to the new
machine
	2) migrate the live session to a new computer
	3) send a re-direct to the originating computer and ask it to
	   re-connect to the new machine 

1) is ok but only provided the ratio of computation in the back-end to
   work done to throughput the data in the front-end is acceptable
2) is *possible* though difficult - there was an (Erlang) paper
published on 
   moving live TCP sessions between machines - but I'm not sure if the
code
   is stable and available
3) is IMHO the best possible method. BUT it needs active participation
in the
   client. Protocols like HTTP have redirect and "moved permanently"
build into 
   the protocol.

So if you have control of the protocol use method 3. By far the best. I
note this technique
is used by MSN messenger - you start off by logging in to a "login
server" it immediately
redirects you to a "traffic server" - if you start chatting to somebody,
both partners
might in principle be abruptly redirected to yet another server - so you
can take advantage of locality (ie it would be silly for two people in
say Sweden to be chatting thorough a common
server in France - in this case one would redirect both parties to a
server in Sweden)

> The first has the advantage of providing high-availability as 
> well and all the processes may access the same (mnesia) 
> database. This is something that I could possible do in C (or 
> C++, Python or any language) using Mysql and MySQL 
> clustering, am I wrong ?

Do it is not wrong but inadvisable - every time you change languages
(and I count changing between mySQL and C as a language change) you get
a semantic
"mismatch" between the bits.

Data base operations have "transaction semantics" (or should have) -
many programming languages
do not. Consider the following pseudo-code fragment:

	foo(N) ->
	    database(do this),   <- this is a data base call	
	    ...		       <- some code in some programming language
	    1/N,                 <- some arithmetic
	    ...

	    datbase(do that). ...

now doing foo(1) will end up with "this" and "that" being done to the
data base.

but doing foo(0) will cause only "this" to be done to the data base.

Really "this" should be undone if the following computation failed.

In other words, code and data base transactions are not composable.
This is a consequence of mixing things that have different semantics and
it makes
programming very difficult.

Ok - lets do this in Erlang. Now mnesia is written in Erlang and by
judiciously
trapping any exceptions in our Erlang code we can make the code and the
database updates have 
transaction semantics.

	foo(N) ->
	    mnesia:transaction(
	       fun() ->
			database(do this)
			1/N
		      database(do that)
	       end).

Will either succeed in which case "this" and "that" are done - or it
will fail
and the data base will have its original state.

So code and database updates are composable.

Then there is the problem of efficiency - changing languages (from C, to
MySQL to Erlang) 
whatever means you have to muck around changing the internal
representations of all your
data types - how are integers represented in C, Erlang, MySQL - answer
"you're not supposed to know"
but try sending an Erlang bignum to C or storing it in MySQL and you'll
soon learn the
slow and painful way.

If you keep within the same language framework you have non of these
mismatches -
and you have the added benefit of only having to learn one thing. 

For "webby" things I go for erlang+yaws+mnesia alternative like
php+apache+mySQL have me
shuddering with horror - not only do I have to learn three different
things but the 
bits don't fit together properly.

Now in the Erlang case the bits fit together properly - some people call
this
"conceptual integrity" - but believe me, fitting things together when
they are all written
in the same language is bad enough but fitting them together when they
are written in
different unsafe languages is a pain in the thing which I am sitting
upon.

> The second... the second, is it possible at all ? Can I 
> launch a process in another node and still let it handle a 
> local socket ? 

yes - if you're mad

> And is it possible to have a pool of nodes and 
> launch new process in the one with less load ?

Yes - virtually anything is possible - even pretty easy :-)

> Somehow I feel like I am not seeing the whole picture (or 
> that I am missing some important Erlang feature).
> Can anybody enlighten me ? (reading material is welcome).

Enlightenment come by building a few systems - just keep at it - 
after about 30 years you'll either have seen the light - or become
a management consultant. 

Incidentally, I think you should think very carefully about the
protocols
and not how they are terminated - you can terminate a protocol in any
language - but you cannot correct a protocol design error with the
smartest and
fastest compiler in the world.

Having a "redirect" message in your protocol which could occur at ANY
point
would make your architecture much better.

If you have control over the client software then life gets even nicer -
you can
let the client try different hosts, until it finds one that it is happy
with.

After all why bother with DNS if the clients can probe multiple-machines
- most
modern P2P systems just need DNS to bootstrap themselves - thereafter
they server their
own namespaces.

Cheers

/Joe 

> --
> Pupeno <pupeno@REDACTED> (http://pupeno.com)
> 
> PS: When I mention servers thing about typical Internet 
> servers: web, smtp, pop3, imap, dns, jabber, etc.
>