clusters

Sat Oct 29 00:37:53 CEST 2005

Hello Joe,

If I understand correctly, we need to rebuild the whole mnesia database each 
time we add a new node pair. Cause the hash key is dependant on the number 
of nodes. Is that right?

Renyi.

>From: "Joe Armstrong (AL/EAB)" <joe.armstrong@REDACTED>
>To: "Renyi Xiong" <renyix1@REDACTED>
>CC: <bdoyle@REDACTED>
>Subject: RE: clusters
>Date: Mon, 24 Oct 2005 09:59:22 +0200
>
>Hello Renyi,
>
>Interesting question - I'll give a short answer (actually why not post this 
>to the
>Erlang list - (to join the list follow the instruction in 
>http://www.erlang.org/faq.html)
>
>I've no idea what the windows 2003 clusting service is :-)
>
>Firstly - let E = # exposed servers. I = # internal servers U = # users
>
>questions
>
>	- is E + I large
>	- is U  very large (ie outside the mnesia adress space?)
>	- how many U's/machine do you allocate
>
>IMHO you can get a long way with a pool of PC's - assume a transaction 
>takes
>50 ms. CPU - then you can do 1,7 M transactions/day. So if we have 1.7 M 
>users
>doing one transaction/day then if each needs (say) 10KB data you'd need 17G 
>of data.
>
>ie a low-end PC (1 Gmemory, 2GHz processor, 80 G disk) could easly handle 
>(say) 1.5M users
>
>Now you need at least TWO PC's (fault-tolerence)
>
>So if you make them in pairs each pair can handle 1.5M users - use a 
>replictaed mnesia
>disk/ram table.
>
>Now you want to scale up ...
>
>Easy.
>
>The unit of scaling is the pair I have just described.
>
>Call these pairs P1, P2, P3, ..... In each pair the machine with the lowest 
>IP is the
>primary - the other is the take-over machine.
>
>Assume a user makes a HTTP request to the primary in ANY pair - all you now 
>need to
>do is figure out which of the Pairs P1 .. Pn is "the correct machine" (ie 
>the one that stores their data) - then send them an HTTP re-direct to the 
>correct machine.
>
>If the address space is small you can just use a ram-replicated mnesia 
>table for the
>redicrection table.
>
>If it is very large use consistent hashing. Call the IP address of the 
>primaries in
>in the pairs Ip1, Ip2, ... Ipn. Assume the user Key is K.
>
>Compute hash values of Ip1, Ip2, ... K using some hash algorithm. Say 
>md5(X) mod 2^32
>
>Call theses IpH1, IpH2, .... KH - now the data corresponding to key K is 
>found on the
>machine with hash IpHk where k is the smallest value in IpHk such that IpHk 
> > KH
>
>(look up the "chord" algorithm for details)
>
>- here's what I'd do
>
>Phase A
>	- build a basic pair of processors (as I have described)
>	- deploy it (it will take some time to get millions of customers)
>
>Phase B
>	- when you get more customers build more pairs
>	- user mnesia and a ram replicated dispatch table
>
>Phase C
>	- when you get outside the addressing limits of mnesia (G users)
>	- make a layer with consistent hashing to replace the mnesia replicated 
>table
>
>I hope you make it to C
>
>/Joe
>
>
> > -----Original Message-----
> > From: Renyi Xiong [mailto:renyix1@REDACTED]
> > Sent: den 22 oktober 2005 04:54
> > To: Joe Armstrong (AL/EAB)
> > Cc: bdoyle@REDACTED
> > Subject:
> >
> >
> > Hello Joe,
> >
> > I'm a programmer working for Brian. I have a question for you
> > in terms of
> > concurrent programming.
> >
> > On client side, customers only see fixed number of servers
> > based on IP
> > addresses. My understanding is these exposed servers are
> > listening for
> > client requests, dispatching transactions to internal
> > variable number of
> > ERLANG servers, collecting replies and forwarding them to clients.
> >
> > So one of our jobs here is to write an ERLANG program to
> > implement a kind of
> > clustering service or ERLANG already has such kind of server
> > included?(like
> > WIndows 2003 clustering service?)
> >
> > Thanks,
> > Renyi.
> >
> >
> >