Author: Walter Weinmann
Status: Draft
Type: Standards Track
Created: 06-Dec-2016
Erlang-Version: OTP 20.0
Post-History: 6-Dec-2016
****
EEP 46: B-trees: balanced search trees of order n
----
Abstract
========
This EEP proposes the creation of a new module named b\_trees for the
administration of b-trees. Both the optional persistence and the sort
order should be implemented by pluggable functionality.
Copyright
=========
This document has been placed in the public domain.
Specification
=============
## Data Structure ##
{MinimumSubtrees,
MaximumKeys,
SizeKeyValues,
SortFunction/2,
State,
Tree}
`Tree` is composed of nodes of the form
{KeyNumber,
SubtreeNumber,
[{Key, Value}],
[Tree]}
and the "empty b-tree" node
nil
`State` is a tuple composed of the following parameters:
{StateTarget,
DeleteFunction/3,
InsertFunction/3,
LookupFunction/3}
Since the b-trees are always balanced, there is no need for a balance
operation.
## DATA TYPES ##
b_tree() = {pos_integer(),
pos_integer(),
non_neg_integer(),
sort_function(),
state(),
tree()}
A general balanced tree.
iterator() = [{key_values(), subtrees()}]
A general balanced tree iterator.
## EXPORTS ##
### copy(Tree1, Tree2) -> Tree3 ###
Types:
Tree1 = Tree2 = Tree3 = b_tree() | gb_trees:tree()
Copies tree Tree1 to an empty tree Tree2. Both trees may be either
of type b-tree or binary tree (gb_trees). Returns the new tree
Tree3 of the same type as tree Tree2.
### delete(Key, B-Tree1) -> B-Tree2 ###
Types:
Key = any()
B-Tree1 = B-Tree2 = b_tree()
Removes the node with key Key from b-tree B-Tree1 and returns the new
b-tree B-Tree2. Assumes that key Key is present in b-tree B-Tree1,
crashes otherwise.
### delete\_any (Key, B-Tree1) -> B-Tree2 ###
Types:
Key = any()
B-Tree1 = B-Tree2 = b_tree()
Removes the node with key Key from b-tree B-Tree1 if key Key is present
in b-tree B-Tree1, otherwise does nothing. Returns the new b-tree B-Tree2.
### empty (Order) -> B-Tree ###
Types:
Order = pos_integer()
B-Tree = b_tree()
Returns a new empty b-tree. The order is defined as the maximum number
of children nodes a non-leaf node may hold.
### enter (Key, Value, B-Tree1) -> B-Tree2 ###
Types:
Key = any()
Value = any()
B-Tree1 = B-Tree2 = b_tree()
Inserts key Key with value Value into b-tree B-Tree1 if key Key is not
present in b-tree B-Tree1, otherwise updates the current value of key Key
to value Value in b-tree B-Tree1. Returns a new b-tree B-Tree2.
### from\_dict (B-Tree1, List) -> B-Tree2 ###
Types:
B-Tree1 = B-Tree2 = b_tree()
List = [{Key, Value}]
Turns an ordered list List of key value tuples into a b-tree. The given
b-tree B-Tree1 must be empty. The list must not contain duplicate keys.
### get (Key, B-Tree) -> Value ###
Types:
Key = any()
B-Tree = b_tree()
Value = any()
Retrieves the value stored with key Key in b-tree B-Tree. Assumes that
key Key is present in b-tree B-Tree, crashes otherwise.
### height (B-Tree) -> integer() >= 0 ###
Types:
B-Tree = b_tree()
Returns the height of b-tree B-Tree as an integer. Assumes that b-tree
B-Tree is non-empty.
### insert (Key, Value, B-Tree1) -> B-Tree2 ###
Types:
Key = any()
Value = any()
B-Tree1 = B-Tree2 = b_tree()
Inserts key Key with value Value into b-tree B-Tree1 and returns the new
b-tree B-Tree2. Assumes that key Key is **not** present in b-tree B-Tree1,
crashes otherwise.
### is\_defined (Key, B-Tree) -> boolean() ###
Types:
Key = any()
B-Tree = b_tree()
Returns `true` if key Key is present in b-tree B-Tree, otherwise `false`.
### is\_empty (B-Tree) -> boolean() ###
Types:
B-Tree = b_tree()
Returns `true` if b-tree B-Tree is an empty b-tree, otherwise `false`.
### iterator (B-Tree) -> Iterator ###
Types:
B-Tree = b_tree()
Iterator = iterator()
Returns iterator Iterator that can be used for traversing the entries
of b-tree B-Tree; see `next/1`. The implementation of this iterator is
very efficient; traversing the whole b-tree using `next/1` is only slightly
slower than getting the list of all key-value pairs using `to_list/1`
and traversing that. The main advantage of the iterator approach is that
it does not require the complete list of all key-value pairs to be built
in memory at one time.
### iterator\_from (Key, B-Tree) -> Iterator ###
Types:
Key = any(9
B-Tree = b_tree()
Iterator = iterator()
Returns iterator Iterator that can be used for traversing the entries
of b-tree B-Tree; see `next/1`. The difference, as compared to the
iterator returned by iterator/1, is that the first key greater than
or equal to key Key is returned.
### keys (B-Tree) -> [Key] ###
Types:
B-Tree = b_tree()
Key = any()
Returns the keys in b-tree B-Tree as an ordered list.
### largest (B-Tree) -> {Key, Value} ###
Types:
B-Tree = b_tree()
Key = any()
Value = any()
Returns a tuple {Key, Value}, where Key is the largest key in b-tree
B-Tree, and Value is the value associated with this key. Assumes that
b-tree B-Tree is not empty.
### lookup (Key, B-Tree) -> none | {value, Value} ###
Types:
Key = any()
B-Tree = b_tree()
Value = any()
Looks up key Key in b-tree B-Tree. Returns {value, Value}, or none if
key Key is not present.
### map (Function, B-Tree1) -> B-Tree2 ###
Types:
Function = fun((Key, Value1) -> Value2)
B-Tree1 = B-Tree2 = b_tree()
Key = any()
Value1 = Value2 = any()
Maps function Function(Key, Value1) -> Value2 to all key value pairs of
b-tree B-Tree1. Returns the new b-tree B-Tree2 with the same set of
keys as b-tree B-Tree1 and the new set of values.
### next (Iterator1) -> 'none' | {Key, Value, Iterator2} ###
Types:
Iterator1 = Iterator2 = iterator()
Key = any()
Value = any()
Returns the tuple {Key, Value, Iterator2}, where Key is the smallest
key referred to by iterator Iterator1, and iterator Iterator2 is the
new iterator to be used for traversing the remaining nodes, or the
atom '**none**' if no nodes remain.
### set\_parameter (B-Tree1, Name, Value) -> B-Tree2 ###
Types:
B-Tree1 = B-Tree2 = b_tree()
Name : Value = sort : Function = fun((Key1, Key2) -> equal |
greater |
less)
| state : {StateTarget,
Function = fun(StateTarget, delete, Key)
-> true,
Function = fun(StateTarget, insert, Subtrees)
-> Key,
Function = fun(StateTarget, lookup, Key)
-> Subtrees}
Sets the parameter Name to value Value in the empty b-tree B-Tree1 and
returns the new b-tree B-Tree2. This function can only be used in
conjunction with an empty b-tree.
### size\_key\_values (B-Tree) -> integer() >= 0 ###
Types:
B-Tree = b_tree()
Returns the number of key value pairs in b-tree B-Tree as an integer.
Returns 0 (zero) if b-tree B-Tree is empty.
### size\_nodes (B-Tree) -> {integer() >= 0, integer() >= 0} ###
Types:
B-Tree = b_tree()
Returns the number of total nodes and the number of leaf nodes in b-tree
B-Tree as a tuple of two integers. Returns {0, 0} (zero) if b-tree B-Tree
is empty.
### smallest (B-Tree) -> {Key, Value} ###
Types:
B-Tree = b_tree()
Key = any()
Value = any()
Returns tuple {Key, Value}, where Key is the smallest key in b-tree
B-Tree, and Value is the value associated with this key. Assumes that
b-tree B-Tree is not empty.
### sort\_ascending (Key1, Key2) -> 'equal' | 'greater' | 'less' ###
Types:
Key1 = Key2 = any()
equal = greater = less = atom()
Returns the atom '**greater**' if Key1 > Key2, the atom '**less**'
if Key1 < Key2 and otherwise the atom '**equal**'.
### sort\_descending (Key1, Key2) -> 'equal' | 'greater' | 'less' ###
Types:
Key1 = Key2 = any()
equal = greater = less = atom()
Returns the atom '**less**' if Key1 > Key2, the atom '**greater**'
if Key1 < Key2 and otherwise the atom '**equal**'.
### take(Key, B-Tree1) -> B-Tree2 ###
Types:
Key = any()
B-Tree1 = B-Tree2 = b_tree()
Removes the node with key Key from b-tree B-Tree1 and returns the new
b-tree B-Tree2. Assumes that key Key is present in b-tree B-Tree1,
crashes otherwise.
### delete\_any (Key, B-Tree1) -> B-Tree2 ###
Types:
Key = any()
B-Tree1 = B-Tree2 = b_tree()
Removes the node with key Key from b-tree B-Tree1 if key Key is present
in b-tree B-Tree1, otherwise does nothing. Returns the new b-tree B-Tree2.
### take\_largest (B-Tree1) -> {Key, Value, B-Tree2} ###
Types:
B-Tree1 = B-Tree2 = b_tree()
Key = any()
Value = any()
Returns tuple {Key, Value, B-Tree2}, where Key is the largest key in
b-tree B-Tree1, Value is the value associated with this key, and b-tree
B-Tree2 is this b-tree with the corresponding key value pair deleted.
Assumes that b-tree B-Tree1 is not empty.
### take\_smallest (B-Tree1) -> {Key, Value, B-Tree2} ###
Types:
B-Tree1 = B-Tree2 = b_tree()
Key = any()
Value = any()
Returns tuple {Key, Value, B-Tree2}, where Key is the smallest key in
b-tree B-Tree1, Value is the value associated with this key, and b-tree
B-Tree2 is this b-tree with the corresponding key value pair deleted.
Assumes that b-tree B-Tree1 is not empty.
### to\_list (B-Tree) -> [{Key, Value}] ###
Types:
B-Tree = b_tree()
Key = any()
Value = any()
Converts b-tree B-Tree into an ordered list of key value tuples.
### update (Key, Value, B-Tree1) -> B-Tree2 ###
Types:
Key = any()
Value = any()
B-Tree1 = B-Tree2 = b_tree()
Updates key Key to value Value in b-tree B-Tree1 and returns the new
b-tree B-Tree2. Assumes that key Key is present in b-tree B-Tree1.
### values (B-Tree) -> [Value] ###
Types:
B-Tree = b_tree()
Value = any()
Returns the values in b-tree B-Tree as an ordered list, sorted by their
corresponding keys. Duplicates are not removed.
## Pluggable Persistence Functionality ##
### Format: ###
{StateTarget, DeleteFunction, InsertFunction, LookupFunction}
StateTarget = any()
DeleteFunction(StateTarget, delete, Key) -> true
InsertFunction(StateTarget, insert, Subtrees) -> Key
LookupFunction(StateTarget, lookup, Key) -> Subtrees
Examples for state targets are a Dets table or a Mnesia table. The delete
function takes a state target, the atom `delete` and a key as arguments
and returns the atom `true` if successful. The insert function takes a
state target, the atom `insert` and a subtrees data structure as arguments
and returns a key if successful. The lookup function takes a state target,
the atom `lookup` and a key as arguments and returns a subtrees data
structure if successful.
### Example functions: ###
The following examples are based on Mnesia.
persistence_by_mnesia(_, delete, SubtreesKey)
when is_list(SubtreesKey) ->
true;
persistence_by_mnesia(StateTarget, delete, SubtreesKey) ->
F = fun() ->
ok = mnesia:delete({StateTarget, SubtreesKey}),
true
end,
mnesia:activity(transaction, F);
persistence_by_mnesia(_, insert, []) ->
[];
persistence_by_mnesia(StateTarget, insert,
[{_, _, [{Key, _} | _], _} | _] = Subtrees) ->
SubtreesKey = list_to_binary(Key),
F = fun() ->
ok = mnesia:write(StateTarget,
#subtrees{subtreesKey = SubtreesKey,
subtrees = Subtrees}, write),
SubtreesKey
end,
mnesia:activity(transaction, F);
persistence_by_mnesia(_, lookup, SubtreesKey)
when is_list(SubtreesKey) ->
SubtreesKey;
persistence_by_mnesia(StateTarget, lookup, SubtreesKey) ->
F = fun() ->
[{subtrees, SubtreesKey, Subtrees}] = mnesia:read(StateTarget,
SubtreesKey),
Subtrees
end,
mnesia:activity(transaction, F).
### Example usage: ###
Creating the Mnesia table:
-record(subtrees, {subtreesKey, subtrees}).
{atomic, ok} = mnesia:create_table(StateTargetName, [{record_name,
subtrees}]),
Creating the b-tree:
BTree1 = b_trees:empty(500),
BTree2 = b_trees:set_parameter(BTree1, state,
{StateTargetName,
fun persistence_by_mnesia/3,
fun persistence_by_mnesia/3,
fun persistence_by_mnesia/3}),
## Pluggable Sort Functionality ##
### Format: ###
FunctionName(Key1, Key2) -> equal | greater | less
Key1 = Key2 = any()
The sort function takes two keys as arguments and returns the atom `less`
if Key1 < Key2, the atom `greater` if Key1 > Key2 and otherwise the
atom `equal`.
### Example function: ###
-spec sort_descending(key(), key()) -> sort_result().
sort_descending(Key_1, Key_2) ->
if
Key_1 < Key_2 -> greater;
Key_1 > Key_2 -> less;
true -> equal
end.
### Example usage: ###
BTree1 = b_trees:empty(500),
BTree2 = b_trees:set_parameter(BTree1, sort, fun sort_descending/2),
Motivation
==========
B-trees are self-balancing tree data structures that keep data sorted
and allow searches, sequential access, insertions, and deletions in
logarithmic time. B-trees are a generalization of a binary search
trees in that a node can have more than two children. Unlike self-balancing
binary search trees, the b-tree is optimized for systems that read and
write large blocks of data. B-trees are a good example of a data structure
for external memory.
Rationale
=========
The functional design of the module b\_trees is based on the module gb\_trees:
b_trees | gb_trees
------------------|---------
n/a | balance/1
copy/2 | n/a
delete/2 | delete/2
delete_any/2 | delete_any/2
empty/1 | empty/0
enter/3 | enter/3
from_dict/2 | from_orddict/1
get/2 | get/2
height/1 | n/a
insert/3 | insert/3
is_defined/2 | is_defined/2
is_empty/1 | is_empty/1
iterator/1 | iterator/1
iterator_from/2 | iterator_from/2
keys/1 | keys/1
largest/1 | largest/1
lookup/2 | lookup/2
map/2 | map/2
next/1 | next/1
set_parameter/3 | n/a
size_key_values/1| size/1
size_nodes/1 | n/a
smallest/1 | smallest/1
sort_ascending/2 | n/a
sort_descending/2| n/a
take/2 | take/2
take_any/2 | take_any/2
take_largest/1 | take_largest/1
take_smallest/1 | take_smallest/1
to_list/1 | to_list/1
update/3 | update/3
values/1 | values/1
The functions `delete/2` and `insert/3` are implementations of the algorithms
of Cormen, Thomas; Leiserson, Charles; Rivest, Ronald; Stein, Clifford (2009),
Introduction to Algorithms (Third ed.), MIT Press and McGraw-Hill, pp. 484-504,
ISBN 0-262-03384-4. Chapter 18: B-Trees.
Backwards Compatibility
=======================
No issues - except module name collisions.
Reference Implementation
========================
The reference implementation can be fetched from Github:
https://github.com/walter-weinmann/b_trees
[EmacsVar]: <> "Local Variables:"
[EmacsVar]: <> "mode: indented-text"
[EmacsVar]: <> "indent-tabs-mode: nil"
[EmacsVar]: <> "sentence-end-double-space: t"
[EmacsVar]: <> "fill-column: 70"
[EmacsVar]: <> "coding: utf-8"
[EmacsVar]: <> "End:"
[VimVar]: <> " vim: set fileencoding=utf-8 expandtab shiftwidth=4 softtabstop=4: "