safedets

Hakan Mattsson hakan@REDACTED
Thu Sep 30 01:52:09 CEST 1999


On 30 Sep 1999, Torbjorn Tornkvist wrote:

tobbe>> By using the feature of fragmented tables in Mnesia,
tobbe>> a table may be split into lots of fragments where
tobbe>> each fragment is implemented as a normal Mnesia
tobbe>> table. 
tobbe>
tobbe>I guess this is something new (i.e not in the Open Source Erlang) ?

Yes, it is "new".

tobbe>Can you give an example of how to do this ?

Ok, here follows an extract from the User's Guide,
about fragmented tables in Mnesia: 

/Håkan


The Concept
-----------
A concept of table fragmentation has been introduced in order to cope
with very large tables. The idea is to split a table into several more
manageable fragments. Each fragment is implemented as a first class
Mnesia table and may be replicated, have indecies etc. as any other
table. But the tables may neither have local_content nor have the snmp
connection activated.  

In order to be able to access a record in a fragmented table, Mnesia
must determine to which fragment the actual record belongs. This is
done by the mnesia_frag module, which implements the mnesia_access
callback behaviour. Please, read the documentation about
mnesia:activity/4 to see how mnesia_frag can be used as a
mnesia_access callback module.  

At each record access mnesia_frag first computes a hash value from the
record key. Secondly the name of the table fragment is determined from
the hash value. And finally the actual table access is performed by
the same functions as for non-fragmented tables. When the key is not
known beforehand, all fragments are searched for matching records. The
following piece of code illustrates how an existing Mnesia table is
converted to be a fragmented table and how more fragments are added
later on.  

Eshell V4.7.3.3  (abort with ^G)
(a@REDACTED)1> mnesia:start().
ok
(a@REDACTED)2> mnesia:system_info(running_db_nodes).
[b@REDACTED,c@REDACTED,a@REDACTED]
(a@REDACTED)3> Tab = dictionary.
dictionary
(a@REDACTED)4> mnesia:create_table(Tab, [{ram_copies, [a@REDACTED, b@REDACTED]}]).
{atomic,ok}
(a@REDACTED)5> Write = fun(Keys) -> [mnesia:write({Tab,K,-K}) || K <- Keys], ok end.
#Fun<erl_eval>
(a@REDACTED)6> mnesia:activity(sync_dirty, Write, [lists:seq(1, 256)], mnesia_frag).
ok
(a@REDACTED)7> mnesia:change_table_frag(Tab, {activate, []}).
{atomic,ok}
(a@REDACTED)8> mnesia:table_info(Tab, frag_properties).
[{base_table,dictionary},
 {foreign_key,undefined},
 {n_doubles,0},
 {n_fragments,1},
 {next_n_to_split,1},
 {node_pool,[a@REDACTED,b@REDACTED,c@REDACTED]}]
(a@REDACTED)9> Info = fun(Item) -> mnesia:table_info(Tab, Item) end.
#Fun<erl_eval>
(a@REDACTED)10> Dist = mnesia:activity(sync_dirty, Info, [frag_dist], mnesia_frag).
[{c@REDACTED,0},{a@REDACTED,1},{b@REDACTED,1}]
(a@REDACTED)11> mnesia:change_table_frag(Tab, {add_frag, Dist}).
{atomic,ok}
(a@REDACTED)12> Dist2 = mnesia:activity(sync_dirty, Info, [frag_dist], mnesia_frag).
[{b@REDACTED,1},{c@REDACTED,1},{a@REDACTED,2}]
(a@REDACTED)13> mnesia:change_table_frag(Tab, {add_frag, Dist2}).
{atomic,ok}
(a@REDACTED)14> Dist3 = mnesia:activity(sync_dirty, Info, [frag_dist], mnesia_frag).
[{a@REDACTED,2},{b@REDACTED,2},{c@REDACTED,2}]
(a@REDACTED)15> mnesia:change_table_frag(Tab, {add_frag, Dist3}).
{atomic,ok}
(a@REDACTED)16> Read = fun(Key) -> mnesia:read({Tab, Key}) end.
#Fun<erl_eval>
(a@REDACTED)17> mnesia:activity(transaction, Read, [12], mnesia_frag).
[{dictionary,12,-12}]
(a@REDACTED)18> mnesia:activity(sync_dirty, Info, [frag_size], mnesia_frag).
[{dictionary,64},
 {dictionary_frag2,64},
 {dictionary_frag3,64},
 {dictionary_frag4,64}]
(a@REDACTED)19> 
          
Fragmentation Properties
------------------------
There is a table property called frag_properties and may be read with
mnesia:table_info(Tab, frag_properties). The fragmentation properties
is a list of tagged tuples with the arity 2. By default the list is
empty, but when it is non-empty it triggers Mnesia to regard the table
as fragmented. The fragmentation properties are:

{n_fragments, Int} 
    n_fragments regulates how many fragments that the table currently
    has. This property may explictly be set at table creation and later be
    changed with {add_frag, NodesOrDist} or del_frag. n_fragments defaults
    to 1.  
{node_pool, List} 
    The node pool contains a list of nodes and may explicitly be set
    at table creation and later be changed with {add_node, Node} or
    {del_node, Node}. At table creation Mnesia tries to distribute the
    replicas of each fragment evenly over all the nodes in the node
    pool. Hopefully all nodes will end up with the same number of
    replicas. node_pool defaults to the return value from
    mnesia:system_info(db_nodes).  
{n_ram_copies, Int} 
    Regulates how many ram_copies replicas that each fragment should
    have. This property may explicitly be set at table creation. The
    default is 0, but if n_disc_copies and n_disc_only_copies also are 0,
    n_ram_copies will default be set to 1.  
{n_disc_copies, Int} 
    Regulates how many disc_copies replicas that each fragment should
    have. This property may explicitly be set at table creation. The
    default is 0.  
{n_disc_only_copies, Int} 
    Regulates how many disc_only_copies replicas that each fragment should
    have. This property may explicitly be set at table creation. The
    default is 0.  
{foreign_key, ForeignKey} 
    ForeignKey may either be the atom undefined or the tuple {ForeignTab,
    Attr}, where Attr denotes an attribute which should be interpreted as
    a key in another fragmented table named ForeignTab. Mnesia will ensure
    that the number of fragments in this table and in the foreign table
    are always the same. When fragments are added or deleted Mnesia will
    automatically propagate the operation to all fragmented tables that
    has a foreign key referring to this table. Instead of using the record
    key to determine which fragment to access, the value of the Attr field
    is used. This feature makes it possible to automatically co-locate
    records in different tables to the same node. foreign_key defaults to
    undefined. However if the foreign key is set to something else it will
    cause the default values of the other fragmentation properties to be
    the same values as the actual fragmentation properties of the foreign
    table.  

    Eshell V4.7.3.3  (abort with ^G)
    (a@REDACTED)1> mnesia:start().
    ok
    (a@REDACTED)2> PrimProps = [{n_fragments, 7}, {node_pool, [node()]}].
    [{n_fragments,7},{node_pool,[a@REDACTED]}]
    (a@REDACTED)3> mnesia:create_table(prim_dict, [{frag_properties, PrimProps},
					      {attributes,[prim_key,prim_val]}]).
    {atomic,ok}
    (a@REDACTED)4> SecProps = [{foreign_key, {prim_dict, sec_val}}].
    [{foreign_key,{prim_dict,sec_val}}]
    (a@REDACTED)5> mnesia:create_table(sec_dict, [{frag_properties, SecProps},
    (a@REDACTED)5>                                {attributes, [sec_key, sec_val]}]).
    {atomic,ok}
    (a@REDACTED)6> Write = fun(Rec) -> mnesia:write(Rec) end.
    #Fun<erl_eval>
    (a@REDACTED)7> PrimKey = 11.
    11
    (a@REDACTED)8> SecKey = 42.
    42
    (a@REDACTED)9> mnesia:activity(sync_dirty, Write,
			      [{prim_dict, PrimKey, -11}], mnesia_frag).
    ok
    (a@REDACTED)10> mnesia:activity(sync_dirty, Write,
			       [{sec_dict, SecKey, PrimKey}], mnesia_frag).
    ok
    (a@REDACTED)11> mnesia:change_table_frag(prim_dict, {add_frag, [node()]}).
    {atomic,ok}
    (a@REDACTED)12> SecRead = fun(PrimKey, SecKey) ->
			   mnesia:read({sec_dict, PrimKey}, SecKey, read) end.
    #Fun<erl_eval>
    (a@REDACTED)13> mnesia:activity(transaction, SecRead,
			       [PrimKey, SecKey], mnesia_frag).
    [{sec_dict,42,11}]
    (a@REDACTED)14> Info = fun(Tab, Item) -> mnesia:table_info(Tab, Item) end.
    #Fun<erl_eval>
    (a@REDACTED)15> mnesia:activity(sync_dirty, Info,
			       [prim_dict, frag_size], mnesia_frag).
    [{prim_dict,0},
     {prim_dict_frag2,0},
     {prim_dict_frag3,0},
     {prim_dict_frag4,1},
     {prim_dict_frag5,0},
     {prim_dict_frag6,0},
     {prim_dict_frag7,0},
     {prim_dict_frag8,0}]
    (a@REDACTED)16> mnesia:activity(sync_dirty, Info,
			       [sec_dict, frag_size], mnesia_frag).
    [{sec_dict,0},
     {sec_dict_frag2,0},
     {sec_dict_frag3,0},
     {sec_dict_frag4,1},
     {sec_dict_frag5,0},
     {sec_dict_frag6,0},
     {sec_dict_frag7,0},
     {sec_dict_frag8,0}]
    (a@REDACTED)17>
        
Management of Fragmented Tables
-------------------------------
The function mnesia:change_table_frag(Tab, Change) is intended to be
used for reconfiguration of fragmented tables. The Change argument
should have one of the following values:  

{activate, FragProps} 
    Activates the fragmentation properties of an existing table. FragProps
    should either contain {node_pool, Nodes} or be empty.  

deactivate 
    Deactivates the fragmentation properties of a table. The number of
    fragments must be 1. No other tables may refer to this table in its
    foreign key.  

{add_frag, NodesOrDist} 
    Adds one new fragment to a fragmented table. All records in one of the
    old fragments will be rehashed and about half of them will be moved to
    the new (last) fragment. All other fragmented tables, which refers to
    this table in their foreign key, will automatically get a new
    fragment, and their records will also be dynamically rehashed in the
    same manner as for the main table.  

    The NodesOrDist argument may either be a list of nodes or the
    result from mnesia:table_info(Tab, frag_dist). The NodesOrDist
    argument is assumed to be a sorted list with the best nodes to host
    new replicas first in the list. The new fragment will get the same
    number of replicas as the first fragment (see n_ram_copies,
    n_disc_copies and n_disc_only_copies). The NodesOrDist list must at
    least contain one element for each replica that needs to be allocated.

del_frag 
    Deletes one fragment from a fragmented table. All records in the last
    fragment will be moved to one of the other fragments. All other
    fragmented tables which refers to this table in their foreign key,
    will automatically loose their last fragment and their records will
    also be dynamically rehashed in the same manner as for the main table.

{add_node, Node} 
    Adds a new node to the node_pool. The new node pool will affect
    the list returned from mnesia:table_info(Tab, frag_dist).  

{del_node, Node} 
    Deletes a new node from the node_pool. The new node pool will affect
    the list returned from mnesia:table_info(Tab, frag_dist).  

Extensions of Existing Functions
--------------------------------
The function mnesia:create_table/2 is used to create a brand new
fragmented table, by setting the table property frag_properties to
some proper values.  

The function mnesia:delete_table/2 is used to delete a fragmented
table including all its fragments. There must however not exist any
other fragmented tables which refers to this table in their foreign
key.  

The function mnesia:table_table/2 now understands the frag_properties
item. If the function mnesia:table_info/2 is invoked in the activity
context of the mnesia_frag module, information of several new items
may be obtained:  

base_table 
    the name of the fragmented table 
n_fragments 
    the actual number of fragments 
node_pool 
    the pool of nodes 
n_ram_copies 
n_disc_copies 
n_disc_only_copies 
    the number of replicas with storage type ram_copies, disc_copies and
    disc_only_copies respectively. The actual values are dynamically
    derived from the first fragment. The first fragment serves as a
    protype and when the actual values needs to be computed (e.g. when
    adding new fragments) they are simply determined by counting the
    number of each replicas for each storage type. This means, when the
    functions mnesia:add_table_copy/3, mnesia:del_table_copy/2 and
    mnesia:change_table_copy_type/2 are applied on the first fragment, it
    will affect the settings on n_ram_copies, n_disc_copies, and
    n_disc_only_copies.  
foreign_key 
    the foreign key. 
foreigners 
    all other tables that refers to this table in their foreign key. 
frag_names 
    the names of all fragments. 
frag_dist 
    a sorted list of {Node, Count} tuples which is sorted in
    increasing Count order. The Count is the total number of replicas that
    this fragmented table hosts on each Node. The list always contains at
    least all nodes in the node_pool. The nodes which not belongs to the
    node_pool will be put last in the list even if their Count is lower.  
frag_size 
    a list of {Name, Size} tuples where Name is a fragment Name and
    Size is how many records it contains.  
frag_memory 
    a list of {Name, Memory} tuples where Name is a fragment Name and
    Memory is how much memory it occupies.  
size 
    total size of all fragments 
memory 
    the total memory of all fragments 

Load Balancing
--------------
There are several algorithms for distributing records in a fragmented
table evenly over a pool of nodes. No one is best, it simply depends
of the application needs. Here follows some examples of situations
which may need some attention:  

permanent change of nodes when a new permanent db_node is introduced
or dropped, it may be time to change the pool of nodes and
re-distribute the replicas evenly over the new pool of nodes. It may
also be time to add or delete a fragment before the replicas are
re-distributed. 

size/memory threshold when the total size or total memory of a
fragmented table (or a single fragment) exceeds some application
specific threshold, it may be time to dynamically add a new fragment
in order obtain a better distribution of records. 

temporary node down when a node temporarily goes down it may be time
to compensate some fragments with new replicas in order to keep the
desired level of redundancy. When the node comes up again it may be
time to remove the superfluous replica.  

overload threshold when the load on some node is exceeds some
application specific threshold, it may be time to either add or move
some fragment replicas to nodes with lesser load. Extra care should be
taken if the table has a foreign key relation to some other table. In
order to avoid severe performance penalties, the same re-distribution
must be performed for all of the related tables.  

Use mnesia:change_table_frag/2 to add new fragments and apply the
usual schema manipulation functions (such as mnesia:add_table_copy/3,
mnesia:del_table_copy/2 and mnesia:change_table_copy_type/2) on each
fragment to perform the actual re-distribution. 





More information about the erlang-questions mailing list