clusters

Sun Nov 6 08:11:41 CET 2005

Thanks all,

It really saves us time to well understand this ELANG/Mnesia stuff.

Renyi.

>From: Robert Virding <robert.virding@REDACTED>
>To: Hakan Mattsson <hakan@REDACTED>
>CC: Renyi Xiong <renyix1@REDACTED>,  erlang-questions@REDACTED,  
>joe.armstrong@REDACTED,  bdoyle@REDACTED
>Subject: Re: clusters
>Date: Wed, 02 Nov 2005 00:36:17 +0100
>
>You will find that linear hashing was also used internally for ets tables. 
>It is also used in the standard modules dict and sets. where it is 
>implemented in Erlang. There is (was) also a reference to the the original 
>paper I discovered which describes it. It is truly impressive being both 
>dynamic as yu grow and shrink the table.
>
>Robert
>
>Hakan Mattsson wrote:
>
>>Take a look at the "linear hashing" algorithm and its
>>relative "linear hashing star". With linear hashing you
>>may add nodes smoothly without rebuilding the entire
>>database.
>>
>>We are using that for fragmented tables in Mnesia:
>>
>>  http://erlang.se/doc/doc-5.4.8/lib/mnesia-4.2.2/doc/html/part_frame.html
>>
>>When you add a new fragment you only need to split one
>>of the old fragments, regardless of the total number of
>>existing fragments.
>>
>>If you plan to outgrow Mnesia, it might be a good idea
>>to customize the hash algorithm for fragmented tables
>>with one that fits your needs:
>>
>>  
>>http://erlang.se/doc/doc-5.4.8/lib/mnesia-4.2.2/doc/html/application_frame.html
>>
>>I don't know how the "chord" algorithm works, but if it
>>can handle addition of new nodes smoothly, it might be a
>>good candidate for your customized mnesia_frag_hash
>>implementation.
>>
>>/Håkan
>>
>>On Fri, 28 Oct 2005, Renyi Xiong wrote:
>>
>>RX> Date: Fri, 28 Oct 2005 15:37:53 -0700
>>RX> From: Renyi Xiong <renyix1@REDACTED>
>>RX> To: erlang-questions@REDACTED
>>RX> Cc: joe.armstrong@REDACTED, bdoyle@REDACTED
>>RX> Subject: RE: clusters
>>RX> RX> Hello Joe,
>>RX> RX> If I understand correctly, we need to rebuild the whole mnesia
>>RX> database each time we add a new node pair. Cause the hash key
>>RX> is dependant on the number of nodes. Is that right?
>>RX> RX> Renyi.
>>RX> RX> RX> > From: "Joe Armstrong (AL/EAB)" <joe.armstrong@REDACTED>
>>RX> > To: "Renyi Xiong" <renyix1@REDACTED>
>>RX> > CC: <bdoyle@REDACTED>
>>RX> > Subject: RE: clusters
>>RX> > Date: Mon, 24 Oct 2005 09:59:22 +0200
>>RX> > RX> > Hello Renyi,
>>RX> > RX> > Interesting question - I'll give a short answer (actually why 
>>not post
>>RX> > this to the
>>RX> > Erlang list - (to join the list follow the instruction in
>>RX> > http://www.erlang.org/faq.html)
>>RX> > RX> > I've no idea what the windows 2003 clusting service is :-)
>>RX> > RX> > Firstly - let E = # exposed servers. I = # internal servers U 
>>= # users
>>RX> > RX> > questions
>>RX> > RX> > 	- is E + I large
>>RX> > 	- is U  very large (ie outside the mnesia adress space?)
>>RX> > 	- how many U's/machine do you allocate
>>RX> > RX> > IMHO you can get a long way with a pool of PC's - assume a 
>>transaction
>>RX> > takes
>>RX> > 50 ms. CPU - then you can do 1,7 M transactions/day. So if we have 
>>1.7 M
>>RX> > users
>>RX> > doing one transaction/day then if each needs (say) 10KB data you'd 
>>need
>>RX> > 17G of data.
>>RX> > RX> > ie a low-end PC (1 Gmemory, 2GHz processor, 80 G disk) could 
>>easly handle
>>RX> > (say) 1.5M users
>>RX> > RX> > Now you need at least TWO PC's (fault-tolerence)
>>RX> > RX> > So if you make them in pairs each pair can handle 1.5M users - 
>>use a
>>RX> > replictaed mnesia
>>RX> > disk/ram table.
>>RX> > RX> > Now you want to scale up ...
>>RX> > RX> > Easy.
>>RX> > RX> > The unit of scaling is the pair I have just described.
>>RX> > RX> > Call these pairs P1, P2, P3, ..... In each pair the machine 
>>with the
>>RX> > lowest IP is the
>>RX> > primary - the other is the take-over machine.
>>RX> > RX> > Assume a user makes a HTTP request to the primary in ANY pair 
>>- all you
>>RX> > now need to
>>RX> > do is figure out which of the Pairs P1 .. Pn is "the correct 
>>machine" (ie
>>RX> > the one that stores their data) - then send them an HTTP re-direct 
>>to the
>>RX> > correct machine.
>>RX> > RX> > If the address space is small you can just use a 
>>ram-replicated mnesia
>>RX> > table for the
>>RX> > redicrection table.
>>RX> > RX> > If it is very large use consistent hashing. Call the IP 
>>address of the
>>RX> > primaries in
>>RX> > in the pairs Ip1, Ip2, ... Ipn. Assume the user Key is K.
>>RX> > RX> > Compute hash values of Ip1, Ip2, ... K using some hash 
>>algorithm. Say
>>RX> > md5(X) mod 2^32
>>RX> > RX> > Call theses IpH1, IpH2, .... KH - now the data corresponding 
>>to key K is
>>RX> > found on the
>>RX> > machine with hash IpHk where k is the smallest value in IpHk such 
>>that
>>RX> > IpHk > KH
>>RX> > RX> > (look up the "chord" algorithm for details)
>>RX> > RX> > - here's what I'd do
>>RX> > RX> > Phase A
>>RX> > 	- build a basic pair of processors (as I have described)
>>RX> > 	- deploy it (it will take some time to get millions of customers)
>>RX> > RX> > Phase B
>>RX> > 	- when you get more customers build more pairs
>>RX> > 	- user mnesia and a ram replicated dispatch table
>>RX> > RX> > Phase C
>>RX> > 	- when you get outside the addressing limits of mnesia (G users)
>>RX> > 	- make a layer with consistent hashing to replace the mnesia
>>RX> > replicated table
>>RX> > RX> > I hope you make it to C
>>RX> > RX> > /Joe
>>RX> > RX> > RX> > > -----Original Message-----
>>RX> > > From: Renyi Xiong [mailto:renyix1@REDACTED]
>>RX> > > Sent: den 22 oktober 2005 04:54
>>RX> > > To: Joe Armstrong (AL/EAB)
>>RX> > > Cc: bdoyle@REDACTED
>>RX> > > Subject:
>>RX> > >
>>RX> > >
>>RX> > > Hello Joe,
>>RX> > >
>>RX> > > I'm a programmer working for Brian. I have a question for you
>>RX> > > in terms of
>>RX> > > concurrent programming.
>>RX> > >
>>RX> > > On client side, customers only see fixed number of servers
>>RX> > > based on IP
>>RX> > > addresses. My understanding is these exposed servers are
>>RX> > > listening for
>>RX> > > client requests, dispatching transactions to internal
>>RX> > > variable number of
>>RX> > > ERLANG servers, collecting replies and forwarding them to clients.
>>RX> > >
>>RX> > > So one of our jobs here is to write an ERLANG program to
>>RX> > > implement a kind of
>>RX> > > clustering service or ERLANG already has such kind of server
>>RX> > > included?(like
>>RX> > > WIndows 2003 clustering service?)
>>RX> > >
>>RX> > > Thanks,
>>RX> > > Renyi.
>>