gb_trees:search/3

Sven-Olof Nystr|m svenolof@REDACTED
Wed Sep 14 13:03:02 CEST 2005


Ulf Wiger (AL/EAB) writes:
 > 
 > Sven-Olof Nyström wrote:
 > > 
 > > Ulf Wiger (AL/EAB) writes:
 > >  > The first example doesn't really introduce
 > >  > anything new, as it scans the whole tree
 > >  > (most likely less efficient than iterator())
 > > 
 > > or to_list().
 > 
 > Yes, well, I've grown a bit allergic to functions
 > that return the whole contents of a data structure
 > designed to be scalable. Given how the garbage 
 > collector works, you should only use them if you
 > are sure that the number of objects is relatively
 > small; and then you might as well use orddict or
 > similar with very little performance penalty. (:
 > 
 > That's not to say that to_list() isn't useful.

See your next mail :-) 

In general, scanning the tree using to_list is about as fast as using
iterator, as long as you need to look at all nodes (iterator will also
allocate stuff on the heap). If you only look at a small number of
nodes, the situation is different, of course.

 > 
 > > So you scan the part of a tree where keys are smaller than 5.

 > With ets:select(), you can do much more than that.
 > A fairly common type of pattern could be to 
 > have objects with a structured key, e.g.:
 > 
 > {{DocId, [Chapter, SubChapter, Table, Row]}, Value}
 > (fictitious example)
 > 
 > and with ordered_set ets, you can quite easily
 > then extract all rows in a given table in a given
 > sub-chapter, ... etc in this fashion
 > 
 > ets:select(
 >   docs, 
 >   [{{DocId, [Chapter,SubChapter,Table,'$1']},'$2'},
 >    [],
 >    [{{'$1', '$2'}}]}])
 > 
 > The syntax is horrible, but that's partly because 
 > you can't really copy each object to the process
 > heap just to determine whether it's a match, and 
 > you don't have a context within ets for evaluating
 > a fun. gb_trees can do much better, IMO.

In the documentation, it is claimed that ets tables provide constant
time access. Obviously, this does not hold if you use select() to look
for tuples where, say, the third element + the fourth is less than
4711.

Gb-trees (or other standard datastructures like for example
hash-tables) only allow efficient access using one key. It is possible
to have efficent search on several keys if one maintains one gb-tree
for each key. 

 > 
 > You could read up on the match specification
 > syntax in the ERTS user's guide
 > (http://erlang.se/doc/doc-5.4.8/erts-5.4.8/doc/html/match_spec.html#1)
 > but I think that a functional data structure
 > should just disregard match specifications
 > and allow the user to provide a fun instead.

I'm a little bit curious about the efficency of select(). In the
general case, select must be implemented using a search through the
entire table. It seems reasonable to assume that select() may
sometimes use a more efficient search. (If you accept the cost of
linear search, implementing select over a gb-tree is easy, of course.)


 > > Another way to do partial scans might be a function 
 > > 
 > > to_list(Key1, Key2) 
 > > 
 > > that returns a list of the entries in the tree between
 > > the two keys.
 > 
 > Nice, but decidedly less flexible than the ets:select()
 > function.
 > 
 > Perhaps it's just me, but I would like to see some 
 > functional data structures offer some of the power
 > of ets, and perhaps manage to lure some people 
 > away from ets. For one thing, using a functional 
 > data structure, you get "undo" for free.


It should be possible to build a general ets-like data structure on
top of gb-trees. One tricky thing when designing such a datastructure
is specifying the set of operations that need to be
efficient. 

Personally, I have found that building specialized data structures
using several gb-trees works fine, but perhaps people who have grown
accustomed to using ets-tables see things differently.



Sven-Olof








More information about the erlang-questions mailing list