safedets
Hakan Mattsson
hakan@REDACTED
Thu Sep 30 01:52:09 CEST 1999
On 30 Sep 1999, Torbjorn Tornkvist wrote:
tobbe>> By using the feature of fragmented tables in Mnesia,
tobbe>> a table may be split into lots of fragments where
tobbe>> each fragment is implemented as a normal Mnesia
tobbe>> table.
tobbe>
tobbe>I guess this is something new (i.e not in the Open Source Erlang) ?
Yes, it is "new".
tobbe>Can you give an example of how to do this ?
Ok, here follows an extract from the User's Guide,
about fragmented tables in Mnesia:
/Håkan
The Concept
-----------
A concept of table fragmentation has been introduced in order to cope
with very large tables. The idea is to split a table into several more
manageable fragments. Each fragment is implemented as a first class
Mnesia table and may be replicated, have indecies etc. as any other
table. But the tables may neither have local_content nor have the snmp
connection activated.
In order to be able to access a record in a fragmented table, Mnesia
must determine to which fragment the actual record belongs. This is
done by the mnesia_frag module, which implements the mnesia_access
callback behaviour. Please, read the documentation about
mnesia:activity/4 to see how mnesia_frag can be used as a
mnesia_access callback module.
At each record access mnesia_frag first computes a hash value from the
record key. Secondly the name of the table fragment is determined from
the hash value. And finally the actual table access is performed by
the same functions as for non-fragmented tables. When the key is not
known beforehand, all fragments are searched for matching records. The
following piece of code illustrates how an existing Mnesia table is
converted to be a fragmented table and how more fragments are added
later on.
Eshell V4.7.3.3 (abort with ^G)
(a@REDACTED)1> mnesia:start().
ok
(a@REDACTED)2> mnesia:system_info(running_db_nodes).
[b@REDACTED,c@REDACTED,a@REDACTED]
(a@REDACTED)3> Tab = dictionary.
dictionary
(a@REDACTED)4> mnesia:create_table(Tab, [{ram_copies, [a@REDACTED, b@REDACTED]}]).
{atomic,ok}
(a@REDACTED)5> Write = fun(Keys) -> [mnesia:write({Tab,K,-K}) || K <- Keys], ok end.
#Fun<erl_eval>
(a@REDACTED)6> mnesia:activity(sync_dirty, Write, [lists:seq(1, 256)], mnesia_frag).
ok
(a@REDACTED)7> mnesia:change_table_frag(Tab, {activate, []}).
{atomic,ok}
(a@REDACTED)8> mnesia:table_info(Tab, frag_properties).
[{base_table,dictionary},
{foreign_key,undefined},
{n_doubles,0},
{n_fragments,1},
{next_n_to_split,1},
{node_pool,[a@REDACTED,b@REDACTED,c@REDACTED]}]
(a@REDACTED)9> Info = fun(Item) -> mnesia:table_info(Tab, Item) end.
#Fun<erl_eval>
(a@REDACTED)10> Dist = mnesia:activity(sync_dirty, Info, [frag_dist], mnesia_frag).
[{c@REDACTED,0},{a@REDACTED,1},{b@REDACTED,1}]
(a@REDACTED)11> mnesia:change_table_frag(Tab, {add_frag, Dist}).
{atomic,ok}
(a@REDACTED)12> Dist2 = mnesia:activity(sync_dirty, Info, [frag_dist], mnesia_frag).
[{b@REDACTED,1},{c@REDACTED,1},{a@REDACTED,2}]
(a@REDACTED)13> mnesia:change_table_frag(Tab, {add_frag, Dist2}).
{atomic,ok}
(a@REDACTED)14> Dist3 = mnesia:activity(sync_dirty, Info, [frag_dist], mnesia_frag).
[{a@REDACTED,2},{b@REDACTED,2},{c@REDACTED,2}]
(a@REDACTED)15> mnesia:change_table_frag(Tab, {add_frag, Dist3}).
{atomic,ok}
(a@REDACTED)16> Read = fun(Key) -> mnesia:read({Tab, Key}) end.
#Fun<erl_eval>
(a@REDACTED)17> mnesia:activity(transaction, Read, [12], mnesia_frag).
[{dictionary,12,-12}]
(a@REDACTED)18> mnesia:activity(sync_dirty, Info, [frag_size], mnesia_frag).
[{dictionary,64},
{dictionary_frag2,64},
{dictionary_frag3,64},
{dictionary_frag4,64}]
(a@REDACTED)19>
Fragmentation Properties
------------------------
There is a table property called frag_properties and may be read with
mnesia:table_info(Tab, frag_properties). The fragmentation properties
is a list of tagged tuples with the arity 2. By default the list is
empty, but when it is non-empty it triggers Mnesia to regard the table
as fragmented. The fragmentation properties are:
{n_fragments, Int}
n_fragments regulates how many fragments that the table currently
has. This property may explictly be set at table creation and later be
changed with {add_frag, NodesOrDist} or del_frag. n_fragments defaults
to 1.
{node_pool, List}
The node pool contains a list of nodes and may explicitly be set
at table creation and later be changed with {add_node, Node} or
{del_node, Node}. At table creation Mnesia tries to distribute the
replicas of each fragment evenly over all the nodes in the node
pool. Hopefully all nodes will end up with the same number of
replicas. node_pool defaults to the return value from
mnesia:system_info(db_nodes).
{n_ram_copies, Int}
Regulates how many ram_copies replicas that each fragment should
have. This property may explicitly be set at table creation. The
default is 0, but if n_disc_copies and n_disc_only_copies also are 0,
n_ram_copies will default be set to 1.
{n_disc_copies, Int}
Regulates how many disc_copies replicas that each fragment should
have. This property may explicitly be set at table creation. The
default is 0.
{n_disc_only_copies, Int}
Regulates how many disc_only_copies replicas that each fragment should
have. This property may explicitly be set at table creation. The
default is 0.
{foreign_key, ForeignKey}
ForeignKey may either be the atom undefined or the tuple {ForeignTab,
Attr}, where Attr denotes an attribute which should be interpreted as
a key in another fragmented table named ForeignTab. Mnesia will ensure
that the number of fragments in this table and in the foreign table
are always the same. When fragments are added or deleted Mnesia will
automatically propagate the operation to all fragmented tables that
has a foreign key referring to this table. Instead of using the record
key to determine which fragment to access, the value of the Attr field
is used. This feature makes it possible to automatically co-locate
records in different tables to the same node. foreign_key defaults to
undefined. However if the foreign key is set to something else it will
cause the default values of the other fragmentation properties to be
the same values as the actual fragmentation properties of the foreign
table.
Eshell V4.7.3.3 (abort with ^G)
(a@REDACTED)1> mnesia:start().
ok
(a@REDACTED)2> PrimProps = [{n_fragments, 7}, {node_pool, [node()]}].
[{n_fragments,7},{node_pool,[a@REDACTED]}]
(a@REDACTED)3> mnesia:create_table(prim_dict, [{frag_properties, PrimProps},
{attributes,[prim_key,prim_val]}]).
{atomic,ok}
(a@REDACTED)4> SecProps = [{foreign_key, {prim_dict, sec_val}}].
[{foreign_key,{prim_dict,sec_val}}]
(a@REDACTED)5> mnesia:create_table(sec_dict, [{frag_properties, SecProps},
(a@REDACTED)5> {attributes, [sec_key, sec_val]}]).
{atomic,ok}
(a@REDACTED)6> Write = fun(Rec) -> mnesia:write(Rec) end.
#Fun<erl_eval>
(a@REDACTED)7> PrimKey = 11.
11
(a@REDACTED)8> SecKey = 42.
42
(a@REDACTED)9> mnesia:activity(sync_dirty, Write,
[{prim_dict, PrimKey, -11}], mnesia_frag).
ok
(a@REDACTED)10> mnesia:activity(sync_dirty, Write,
[{sec_dict, SecKey, PrimKey}], mnesia_frag).
ok
(a@REDACTED)11> mnesia:change_table_frag(prim_dict, {add_frag, [node()]}).
{atomic,ok}
(a@REDACTED)12> SecRead = fun(PrimKey, SecKey) ->
mnesia:read({sec_dict, PrimKey}, SecKey, read) end.
#Fun<erl_eval>
(a@REDACTED)13> mnesia:activity(transaction, SecRead,
[PrimKey, SecKey], mnesia_frag).
[{sec_dict,42,11}]
(a@REDACTED)14> Info = fun(Tab, Item) -> mnesia:table_info(Tab, Item) end.
#Fun<erl_eval>
(a@REDACTED)15> mnesia:activity(sync_dirty, Info,
[prim_dict, frag_size], mnesia_frag).
[{prim_dict,0},
{prim_dict_frag2,0},
{prim_dict_frag3,0},
{prim_dict_frag4,1},
{prim_dict_frag5,0},
{prim_dict_frag6,0},
{prim_dict_frag7,0},
{prim_dict_frag8,0}]
(a@REDACTED)16> mnesia:activity(sync_dirty, Info,
[sec_dict, frag_size], mnesia_frag).
[{sec_dict,0},
{sec_dict_frag2,0},
{sec_dict_frag3,0},
{sec_dict_frag4,1},
{sec_dict_frag5,0},
{sec_dict_frag6,0},
{sec_dict_frag7,0},
{sec_dict_frag8,0}]
(a@REDACTED)17>
Management of Fragmented Tables
-------------------------------
The function mnesia:change_table_frag(Tab, Change) is intended to be
used for reconfiguration of fragmented tables. The Change argument
should have one of the following values:
{activate, FragProps}
Activates the fragmentation properties of an existing table. FragProps
should either contain {node_pool, Nodes} or be empty.
deactivate
Deactivates the fragmentation properties of a table. The number of
fragments must be 1. No other tables may refer to this table in its
foreign key.
{add_frag, NodesOrDist}
Adds one new fragment to a fragmented table. All records in one of the
old fragments will be rehashed and about half of them will be moved to
the new (last) fragment. All other fragmented tables, which refers to
this table in their foreign key, will automatically get a new
fragment, and their records will also be dynamically rehashed in the
same manner as for the main table.
The NodesOrDist argument may either be a list of nodes or the
result from mnesia:table_info(Tab, frag_dist). The NodesOrDist
argument is assumed to be a sorted list with the best nodes to host
new replicas first in the list. The new fragment will get the same
number of replicas as the first fragment (see n_ram_copies,
n_disc_copies and n_disc_only_copies). The NodesOrDist list must at
least contain one element for each replica that needs to be allocated.
del_frag
Deletes one fragment from a fragmented table. All records in the last
fragment will be moved to one of the other fragments. All other
fragmented tables which refers to this table in their foreign key,
will automatically loose their last fragment and their records will
also be dynamically rehashed in the same manner as for the main table.
{add_node, Node}
Adds a new node to the node_pool. The new node pool will affect
the list returned from mnesia:table_info(Tab, frag_dist).
{del_node, Node}
Deletes a new node from the node_pool. The new node pool will affect
the list returned from mnesia:table_info(Tab, frag_dist).
Extensions of Existing Functions
--------------------------------
The function mnesia:create_table/2 is used to create a brand new
fragmented table, by setting the table property frag_properties to
some proper values.
The function mnesia:delete_table/2 is used to delete a fragmented
table including all its fragments. There must however not exist any
other fragmented tables which refers to this table in their foreign
key.
The function mnesia:table_table/2 now understands the frag_properties
item. If the function mnesia:table_info/2 is invoked in the activity
context of the mnesia_frag module, information of several new items
may be obtained:
base_table
the name of the fragmented table
n_fragments
the actual number of fragments
node_pool
the pool of nodes
n_ram_copies
n_disc_copies
n_disc_only_copies
the number of replicas with storage type ram_copies, disc_copies and
disc_only_copies respectively. The actual values are dynamically
derived from the first fragment. The first fragment serves as a
protype and when the actual values needs to be computed (e.g. when
adding new fragments) they are simply determined by counting the
number of each replicas for each storage type. This means, when the
functions mnesia:add_table_copy/3, mnesia:del_table_copy/2 and
mnesia:change_table_copy_type/2 are applied on the first fragment, it
will affect the settings on n_ram_copies, n_disc_copies, and
n_disc_only_copies.
foreign_key
the foreign key.
foreigners
all other tables that refers to this table in their foreign key.
frag_names
the names of all fragments.
frag_dist
a sorted list of {Node, Count} tuples which is sorted in
increasing Count order. The Count is the total number of replicas that
this fragmented table hosts on each Node. The list always contains at
least all nodes in the node_pool. The nodes which not belongs to the
node_pool will be put last in the list even if their Count is lower.
frag_size
a list of {Name, Size} tuples where Name is a fragment Name and
Size is how many records it contains.
frag_memory
a list of {Name, Memory} tuples where Name is a fragment Name and
Memory is how much memory it occupies.
size
total size of all fragments
memory
the total memory of all fragments
Load Balancing
--------------
There are several algorithms for distributing records in a fragmented
table evenly over a pool of nodes. No one is best, it simply depends
of the application needs. Here follows some examples of situations
which may need some attention:
permanent change of nodes when a new permanent db_node is introduced
or dropped, it may be time to change the pool of nodes and
re-distribute the replicas evenly over the new pool of nodes. It may
also be time to add or delete a fragment before the replicas are
re-distributed.
size/memory threshold when the total size or total memory of a
fragmented table (or a single fragment) exceeds some application
specific threshold, it may be time to dynamically add a new fragment
in order obtain a better distribution of records.
temporary node down when a node temporarily goes down it may be time
to compensate some fragments with new replicas in order to keep the
desired level of redundancy. When the node comes up again it may be
time to remove the superfluous replica.
overload threshold when the load on some node is exceeds some
application specific threshold, it may be time to either add or move
some fragment replicas to nodes with lesser load. Extra care should be
taken if the table has a foreign key relation to some other table. In
order to avoid severe performance penalties, the same re-distribution
must be performed for all of the related tables.
Use mnesia:change_table_frag/2 to add new fragments and apply the
usual schema manipulation functions (such as mnesia:add_table_copy/3,
mnesia:del_table_copy/2 and mnesia:change_table_copy_type/2) on each
fragment to perform the actual re-distribution.
More information about the erlang-questions
mailing list