clusters

Robert Virding <>
Wed Nov 2 00:36:17 CET 2005


You will find that linear hashing was also used internally for ets 
tables. It is also used in the standard modules dict and sets. where it 
is implemented in Erlang. There is (was) also a reference to the the 
original paper I discovered which describes it. It is truly impressive 
being both dynamic as yu grow and shrink the table.

Robert

Hakan Mattsson wrote:

>Take a look at the "linear hashing" algorithm and its
>relative "linear hashing star". With linear hashing you
>may add nodes smoothly without rebuilding the entire
>database.
>
>We are using that for fragmented tables in Mnesia:
>
>  http://erlang.se/doc/doc-5.4.8/lib/mnesia-4.2.2/doc/html/part_frame.html
>
>When you add a new fragment you only need to split one
>of the old fragments, regardless of the total number of
>existing fragments.
>
>If you plan to outgrow Mnesia, it might be a good idea
>to customize the hash algorithm for fragmented tables
>with one that fits your needs:
>
>  http://erlang.se/doc/doc-5.4.8/lib/mnesia-4.2.2/doc/html/application_frame.html
>
>I don't know how the "chord" algorithm works, but if it
>can handle addition of new nodes smoothly, it might be a
>good candidate for your customized mnesia_frag_hash
>implementation.
>
>/Håkan
>
>On Fri, 28 Oct 2005, Renyi Xiong wrote:
>
>RX> Date: Fri, 28 Oct 2005 15:37:53 -0700
>RX> From: Renyi Xiong <>
>RX> To: 
>RX> Cc: , 
>RX> Subject: RE: clusters
>RX> 
>RX> Hello Joe,
>RX> 
>RX> If I understand correctly, we need to rebuild the whole mnesia
>RX> database each time we add a new node pair. Cause the hash key
>RX> is dependant on the number of nodes. Is that right?
>RX> 
>RX> Renyi.
>RX> 
>RX> 
>RX> > From: "Joe Armstrong (AL/EAB)" <>
>RX> > To: "Renyi Xiong" <>
>RX> > CC: <>
>RX> > Subject: RE: clusters
>RX> > Date: Mon, 24 Oct 2005 09:59:22 +0200
>RX> > 
>RX> > Hello Renyi,
>RX> > 
>RX> > Interesting question - I'll give a short answer (actually why not post
>RX> > this to the
>RX> > Erlang list - (to join the list follow the instruction in
>RX> > http://www.erlang.org/faq.html)
>RX> > 
>RX> > I've no idea what the windows 2003 clusting service is :-)
>RX> > 
>RX> > Firstly - let E = # exposed servers. I = # internal servers U = # users
>RX> > 
>RX> > questions
>RX> > 
>RX> > 	- is E + I large
>RX> > 	- is U  very large (ie outside the mnesia adress space?)
>RX> > 	- how many U's/machine do you allocate
>RX> > 
>RX> > IMHO you can get a long way with a pool of PC's - assume a transaction
>RX> > takes
>RX> > 50 ms. CPU - then you can do 1,7 M transactions/day. So if we have 1.7 M
>RX> > users
>RX> > doing one transaction/day then if each needs (say) 10KB data you'd need
>RX> > 17G of data.
>RX> > 
>RX> > ie a low-end PC (1 Gmemory, 2GHz processor, 80 G disk) could easly handle
>RX> > (say) 1.5M users
>RX> > 
>RX> > Now you need at least TWO PC's (fault-tolerence)
>RX> > 
>RX> > So if you make them in pairs each pair can handle 1.5M users - use a
>RX> > replictaed mnesia
>RX> > disk/ram table.
>RX> > 
>RX> > Now you want to scale up ...
>RX> > 
>RX> > Easy.
>RX> > 
>RX> > The unit of scaling is the pair I have just described.
>RX> > 
>RX> > Call these pairs P1, P2, P3, ..... In each pair the machine with the
>RX> > lowest IP is the
>RX> > primary - the other is the take-over machine.
>RX> > 
>RX> > Assume a user makes a HTTP request to the primary in ANY pair - all you
>RX> > now need to
>RX> > do is figure out which of the Pairs P1 .. Pn is "the correct machine" (ie
>RX> > the one that stores their data) - then send them an HTTP re-direct to the
>RX> > correct machine.
>RX> > 
>RX> > If the address space is small you can just use a ram-replicated mnesia
>RX> > table for the
>RX> > redicrection table.
>RX> > 
>RX> > If it is very large use consistent hashing. Call the IP address of the
>RX> > primaries in
>RX> > in the pairs Ip1, Ip2, ... Ipn. Assume the user Key is K.
>RX> > 
>RX> > Compute hash values of Ip1, Ip2, ... K using some hash algorithm. Say
>RX> > md5(X) mod 2^32
>RX> > 
>RX> > Call theses IpH1, IpH2, .... KH - now the data corresponding to key K is
>RX> > found on the
>RX> > machine with hash IpHk where k is the smallest value in IpHk such that
>RX> > IpHk > KH
>RX> > 
>RX> > (look up the "chord" algorithm for details)
>RX> > 
>RX> > - here's what I'd do
>RX> > 
>RX> > Phase A
>RX> > 	- build a basic pair of processors (as I have described)
>RX> > 	- deploy it (it will take some time to get millions of customers)
>RX> > 
>RX> > Phase B
>RX> > 	- when you get more customers build more pairs
>RX> > 	- user mnesia and a ram replicated dispatch table
>RX> > 
>RX> > Phase C
>RX> > 	- when you get outside the addressing limits of mnesia (G users)
>RX> > 	- make a layer with consistent hashing to replace the mnesia
>RX> > replicated table
>RX> > 
>RX> > I hope you make it to C
>RX> > 
>RX> > /Joe
>RX> > 
>RX> > 
>RX> > > -----Original Message-----
>RX> > > From: Renyi Xiong [mailto:]
>RX> > > Sent: den 22 oktober 2005 04:54
>RX> > > To: Joe Armstrong (AL/EAB)
>RX> > > Cc: 
>RX> > > Subject:
>RX> > >
>RX> > >
>RX> > > Hello Joe,
>RX> > >
>RX> > > I'm a programmer working for Brian. I have a question for you
>RX> > > in terms of
>RX> > > concurrent programming.
>RX> > >
>RX> > > On client side, customers only see fixed number of servers
>RX> > > based on IP
>RX> > > addresses. My understanding is these exposed servers are
>RX> > > listening for
>RX> > > client requests, dispatching transactions to internal
>RX> > > variable number of
>RX> > > ERLANG servers, collecting replies and forwarding them to clients.
>RX> > >
>RX> > > So one of our jobs here is to write an ERLANG program to
>RX> > > implement a kind of
>RX> > > clustering service or ERLANG already has such kind of server
>RX> > > included?(like
>RX> > > WIndows 2003 clustering service?)
>RX> > >
>RX> > > Thanks,
>RX> > > Renyi.
>



More information about the erlang-questions mailing list