clusters

Mon Oct 31 13:36:24 CET 2005

On Mon, 31 Oct 2005, Hakan Mattsson wrote:

HM> Date: Mon, 31 Oct 2005 11:43:00 +0100 (CET)
HM> From: Hakan Mattsson <hakan@REDACTED>
HM> To: Renyi Xiong <renyix1@REDACTED>
HM> Cc: erlang-questions@REDACTED, joe.armstrong@REDACTED, bdoyle@REDACTED
HM> Subject: RE: clusters
HM> 
HM> 
HM> Take a look at the "linear hashing" algorithm and its
HM> relative "linear hashing star". With linear hashing you
HM> may add nodes smoothly without rebuilding the entire
HM> database.
HM> 
HM> We are using that for fragmented tables in Mnesia:
HM> 
HM>   http://erlang.se/doc/doc-5.4.8/lib/mnesia-4.2.2/doc/html/part_frame.html

Both links in my previous post was intended to point
into individual pages and not the enclosing frames.

Here it should be a link to:

  http://erlang.se/doc/doc-5.4.8/lib/mnesia-4.2.2/doc/html/Mnesia_App_D.html#11

HM> 
HM> When you add a new fragment you only need to split one
HM> of the old fragments, regardless of the total number of
HM> existing fragments.
HM> 
HM> If you plan to outgrow Mnesia, it might be a good idea
HM> to customize the hash algorithm for fragmented tables
HM> with one that fits your needs:
HM> 
HM>   http://erlang.se/doc/doc-5.4.8/lib/mnesia-4.2.2/doc/html/application_frame.html

and here a link to:

  http://erlang.se/doc/doc-5.4.8/lib/mnesia-4.2.2/doc/html/mnesia_frag_hash.html
/Håkan

HM> 
HM> I don't know how the "chord" algorithm works, but if it
HM> can handle addition of new nodes smoothly, it might be a
HM> good candidate for your customized mnesia_frag_hash
HM> implementation.
HM> 
HM> /Håkan
HM> 
HM> On Fri, 28 Oct 2005, Renyi Xiong wrote:
HM> 
HM> RX> Date: Fri, 28 Oct 2005 15:37:53 -0700
HM> RX> From: Renyi Xiong <renyix1@REDACTED>
HM> RX> To: erlang-questions@REDACTED
HM> RX> Cc: joe.armstrong@REDACTED, bdoyle@REDACTED
HM> RX> Subject: RE: clusters
HM> RX> 
HM> RX> Hello Joe,
HM> RX> 
HM> RX> If I understand correctly, we need to rebuild the whole mnesia
HM> RX> database each time we add a new node pair. Cause the hash key
HM> RX> is dependant on the number of nodes. Is that right?
HM> RX> 
HM> RX> Renyi.
HM> RX> 
HM> RX> 
HM> RX> > From: "Joe Armstrong (AL/EAB)" <joe.armstrong@REDACTED>
HM> RX> > To: "Renyi Xiong" <renyix1@REDACTED>
HM> RX> > CC: <bdoyle@REDACTED>
HM> RX> > Subject: RE: clusters
HM> RX> > Date: Mon, 24 Oct 2005 09:59:22 +0200
HM> RX> > 
HM> RX> > Hello Renyi,
HM> RX> > 
HM> RX> > Interesting question - I'll give a short answer (actually why not post
HM> RX> > this to the
HM> RX> > Erlang list - (to join the list follow the instruction in
HM> RX> > http://www.erlang.org/faq.html)
HM> RX> > 
HM> RX> > I've no idea what the windows 2003 clusting service is :-)
HM> RX> > 
HM> RX> > Firstly - let E = # exposed servers. I = # internal servers U = # users
HM> RX> > 
HM> RX> > questions
HM> RX> > 
HM> RX> > 	- is E + I large
HM> RX> > 	- is U  very large (ie outside the mnesia adress space?)
HM> RX> > 	- how many U's/machine do you allocate
HM> RX> > 
HM> RX> > IMHO you can get a long way with a pool of PC's - assume a transaction
HM> RX> > takes
HM> RX> > 50 ms. CPU - then you can do 1,7 M transactions/day. So if we have 1.7 M
HM> RX> > users
HM> RX> > doing one transaction/day then if each needs (say) 10KB data you'd need
HM> RX> > 17G of data.
HM> RX> > 
HM> RX> > ie a low-end PC (1 Gmemory, 2GHz processor, 80 G disk) could easly handle
HM> RX> > (say) 1.5M users
HM> RX> > 
HM> RX> > Now you need at least TWO PC's (fault-tolerence)
HM> RX> > 
HM> RX> > So if you make them in pairs each pair can handle 1.5M users - use a
HM> RX> > replictaed mnesia
HM> RX> > disk/ram table.
HM> RX> > 
HM> RX> > Now you want to scale up ...
HM> RX> > 
HM> RX> > Easy.
HM> RX> > 
HM> RX> > The unit of scaling is the pair I have just described.
HM> RX> > 
HM> RX> > Call these pairs P1, P2, P3, ..... In each pair the machine with the
HM> RX> > lowest IP is the
HM> RX> > primary - the other is the take-over machine.
HM> RX> > 
HM> RX> > Assume a user makes a HTTP request to the primary in ANY pair - all you
HM> RX> > now need to
HM> RX> > do is figure out which of the Pairs P1 .. Pn is "the correct machine" (ie
HM> RX> > the one that stores their data) - then send them an HTTP re-direct to the
HM> RX> > correct machine.
HM> RX> > 
HM> RX> > If the address space is small you can just use a ram-replicated mnesia
HM> RX> > table for the
HM> RX> > redicrection table.
HM> RX> > 
HM> RX> > If it is very large use consistent hashing. Call the IP address of the
HM> RX> > primaries in
HM> RX> > in the pairs Ip1, Ip2, ... Ipn. Assume the user Key is K.
HM> RX> > 
HM> RX> > Compute hash values of Ip1, Ip2, ... K using some hash algorithm. Say
HM> RX> > md5(X) mod 2^32
HM> RX> > 
HM> RX> > Call theses IpH1, IpH2, .... KH - now the data corresponding to key K is
HM> RX> > found on the
HM> RX> > machine with hash IpHk where k is the smallest value in IpHk such that
HM> RX> > IpHk > KH
HM> RX> > 
HM> RX> > (look up the "chord" algorithm for details)
HM> RX> > 
HM> RX> > - here's what I'd do
HM> RX> > 
HM> RX> > Phase A
HM> RX> > 	- build a basic pair of processors (as I have described)
HM> RX> > 	- deploy it (it will take some time to get millions of customers)
HM> RX> > 
HM> RX> > Phase B
HM> RX> > 	- when you get more customers build more pairs
HM> RX> > 	- user mnesia and a ram replicated dispatch table
HM> RX> > 
HM> RX> > Phase C
HM> RX> > 	- when you get outside the addressing limits of mnesia (G users)
HM> RX> > 	- make a layer with consistent hashing to replace the mnesia
HM> RX> > replicated table
HM> RX> > 
HM> RX> > I hope you make it to C
HM> RX> > 
HM> RX> > /Joe
HM> RX> > 
HM> RX> > 
HM> RX> > > -----Original Message-----
HM> RX> > > From: Renyi Xiong [mailto:renyix1@REDACTED]
HM> RX> > > Sent: den 22 oktober 2005 04:54
HM> RX> > > To: Joe Armstrong (AL/EAB)
HM> RX> > > Cc: bdoyle@REDACTED
HM> RX> > > Subject:
HM> RX> > >
HM> RX> > >
HM> RX> > > Hello Joe,
HM> RX> > >
HM> RX> > > I'm a programmer working for Brian. I have a question for you
HM> RX> > > in terms of
HM> RX> > > concurrent programming.
HM> RX> > >
HM> RX> > > On client side, customers only see fixed number of servers
HM> RX> > > based on IP
HM> RX> > > addresses. My understanding is these exposed servers are
HM> RX> > > listening for
HM> RX> > > client requests, dispatching transactions to internal
HM> RX> > > variable number of
HM> RX> > > ERLANG servers, collecting replies and forwarding them to clients.
HM> RX> > >
HM> RX> > > So one of our jobs here is to write an ERLANG program to
HM> RX> > > implement a kind of
HM> RX> > > clustering service or ERLANG already has such kind of server
HM> RX> > > included?(like
HM> RX> > > WIndows 2003 clustering service?)
HM> RX> > >
HM> RX> > > Thanks,
HM> RX> > > Renyi.