Erlang -- qlc

qlc

Query Interface to Mnesia, ETS, Dets, etc

The qlc module provides a query interface to Mnesia, ETS, Dets and other data structures that implement an iterator style traversal of objects.

The qlc module implements a query interface to QLC tables. Typical QLC tables are ETS, Dets, and Mnesia tables. There is also support for user defined tables, see the Implementing a QLC table section. A query is stated using Query List Comprehensions (QLCs). The answers to a query are determined by data in QLC tables that fulfill the constraints expressed by the QLCs of the query. QLCs are similar to ordinary list comprehensions as described in the Erlang Reference Manual and Programming Examples except that variables introduced in patterns cannot be used in list expressions. In fact, in the absence of optimizations and options such as cache and unique (see below), every QLC free of QLC tables evaluates to the same list of answers as the identical ordinary list comprehension.

While ordinary list comprehensions evaluate to lists, calling qlc:q/1,2 returns a Query Handle. To obtain all the answers to a query, qlc:eval/1,2 should be called with the query handle as first argument. Query handles are essentially functional objects ("funs") created in the module calling q/1,2. As the funs refer to the module's code, one should be careful not to keep query handles too long if the module's code is to be replaced. Code replacement is described in the Erlang Reference Manual. The list of answers can also be traversed in chunks by use of a Query Cursor. Query cursors are created by calling qlc:cursor/1,2 with a query handle as first argument. Query cursors are essentially Erlang processes. One answer at a time is sent from the query cursor process to the process that created the cursor.

Syntactically QLCs have the same parts as ordinary list comprehensions:

[Expression || Qualifier1, Qualifier2, ...]

Expression (the template) is an arbitrary Erlang expression. Qualifiers are either filters or generators. Filters are Erlang expressions returning bool(). Generators have the form Pattern <- ListExpression, where ListExpression is an expression evaluating to a query handle or a list. Query handles are returned from qlc:table/2, qlc:append/1,2, qlc:sort/1,2, qlc:keysort/2,3, qlc:q/1,2, and qlc:string_to_handle/1,2,3.

The evaluation of a query handle begins by the inspection of options and the collection of information about tables. As a result qualifiers are modified during the optimization phase. Next all list expressions are evaluated. If a cursor has been created evaluation takes place in the cursor process. For those list expressions that are QLCs, the list expressions of the QLCs' generators are evaluated as well. One has to be careful if list expressions have side effects since the order in which list expressions are evaluated is unspecified. Finally the answers are found by evaluating the qualifiers from left to right, backtracking when some filter returns false, or collecting the template when all filters return true.

Filters that do not return bool() but fail are handled differently depending on their syntax: if the filter is a guard it returns false, otherwise the query evaluation fails. This behavior makes it possible for the qlc module to do some optimizations without affecting the meaning of a query. For example, when testing some position of a table and one or more constants for equality, only the objects with equal values are candidates for further evaluation. The other objects are guaranteed to make the filter return false, but never fail. The (small) set of candidate objects can often be found by looking up some key values of the table or by traversing the table using a match specification. It is necessary to place the guard filters immediately after the table's generator, otherwise the candidate objects will not be restricted to a small set. The reason is that objects that could make the query evaluation fail must not be excluded by looking up a key or running a match specification.

The qlc module supports fast join of two query handles. Fast join is possible if some position P1 of one query handler and some position P2 of another query handler are tested for equality. Two fast join methods have been implemented:

Lookup join traverses all objects of one query handle and finds objects of the other handle (a QLC table) such that the values at P1 and P2 match or compare equal. The qlc module does not create any indices but looks up values using the key position and the indexed positions of the QLC table.
Merge join sorts the objects of each query handle if necessary and filters out objects where the values at P1 and P2 do not compare equal. If there are many objects with the same value of P2 a temporary file will be used for the equivalence classes.

The qlc module warns at compile time if a QLC combines query handles in such a way that more than one join is possible. In other words, there is no query planner that can choose a good order between possible join operations. It is up to the user to order the joins by introducing query handles.

The join is to be expressed as a guard filter. The filter must be placed immediately after the two joined generators, possibly after guard filters that use variables from no other generators but the two joined generators. The qlc module inspects the operands of =:=/2, ==/2, is_record/2, element/2, and logical operators (and/2, or/2, andalso/2, orelse/2, xor/2) when determining which joins to consider.

The following options are accepted by cursor/2, eval/2, fold/4, and info/2:

{cache_all, Cache} where Cache is equal to ets or list adds a {cache, Cache} option to every list expression of the query except tables and lists. Default is {cache_all, no}. The option cache_all is equivalent to {cache_all, ets}.
{max_list_size, MaxListSize} where MaxListSize is the size in bytes of terms on the external format. If the accumulated size of collected objects exceeds MaxListSize the objects are written onto a temporary file. This option is used by the {cache, list} option as well as by the merge join method. Default is 512*1024 bytes.
{tmpdir_usage, TmpFileUsage} determines the action taken when qlc is about to create temporary files on the directory set by the tmpdir option. If the value is not_allowed an error tuple is returned, otherwise temporary files are created as needed. Default is allowed which means that no further action is taken. The values info_msg, warning_msg, and error_msg mean that the function with the corresponding name in the module error_logger is called for printing some information (currently the stacktrace).
{tmpdir, TempDirectory} sets the directory used by merge join for temporary files and by the {cache, list} option. The option also overrides the tmpdir option of keysort/3 and sort/2. The default value is "" which means that the directory returned by file:get_cwd() is used.
{unique_all, true} adds a {unique, true} option to every list expression of the query. Default is {unique_all, false}. The option unique_all is equivalent to {unique_all, true}.

QueryCursor = {qlc_cursor, term()}
QueryHandle = {qlc_handle, term()}
QueryHandleOrList = QueryHandle | list()
Answers = [Answer]
Answer = term()
AbstractExpression = - parse trees for Erlang expressions, see the abstract format documentation in the ERTS User's Guide -
MatchExpression = - match specifications, see the match specification documentation in the ERTS User's Guide and ms_transform(3) -
SpawnOptions = default | spawn_options()
SortOptions = [SortOption] | SortOption
SortOption = {compressed, bool()} | {no_files, NoFiles} | {order, Order} | {size, Size} | {tmpdir, TempDirectory} | {unique, bool()} - see file_sorter(3) -
Order = ascending | descending | OrderFun
OrderFun = fun(term(), term()) -> bool()
TempDirectory = "" | filename()
Size = int() > 0
NoFiles = int() > 1
KeyPos = int() > 0 | [int() > 0]
MaxListSize = int() >= 0
bool() = true | false
Cache = ets | list | no
TmpFileUsage = allowed | not_allowed | info_msg | warning_msg | error_msg
filename() = - see filename(3) -
spawn_options() = - see erlang(3) -

As already mentioned queries are stated in the list comprehension syntax as described in the Erlang Reference Manual. In the following some familiarity with list comprehensions is assumed. There are examples in Programming Examples that can get you started. It should be stressed that list comprehensions do not add any computational power to the language; anything that can be done with list comprehensions can also be done without them. But they add a syntax for expressing simple search problems which is compact and clear once you get used to it.

Many list comprehension expressions can be evaluated by the qlc module. Exceptions are expressions such that variables introduced in patterns (or filters) are used in some generator later in the list comprehension. As an example consider an implementation of lists:append(L): [X ||Y <- L, X <- Y]. Y is introduced in the first generator and used in the second. The ordinary list comprehension is normally to be preferred when there is a choice as to which to use. One difference is that qlc:eval/1,2 collects answers in a list which is finally reversed, while list comprehensions collect answers on the stack which is finally unwound.

What the qlc module primarily adds to list comprehensions is that data can be read from QLC tables in small chunks. A QLC table is created by calling qlc:table/2. Usually qlc:table/2 is not called directly from the query but via an interface function of some data structure. There are a few examples of such functions in Erlang/OTP: mnesia:table/1,2, ets:table/1,2, and dets:table/1,2. For a given data structure there can be several functions that create QLC tables, but common for all these functions is that they return a query handle created by qlc:table/2. Using the QLC tables provided by OTP is probably sufficient in most cases, but for the more advanced user the section Implementing a QLC table describes the implementation of a function calling qlc:table/2.

Besides qlc:table/2 there are other functions that return query handles. They might not be used as often as tables, but are useful from time to time. qlc:append traverses objects from several tables or lists after each other. If, for instance, you want to traverse all answers to a query QH and then finish off by a term {finished}, you can do that by calling qlc:append(QH, [{finished}]). append first returns all objects of QH, then {finished}. If there is one tuple {finished} among the answers to QH it will be returned twice from append.

As another example, consider concatenating the answers to two queries QH1 and QH2 while removing all duplicates. The means to accomplish this is to use the unique option:

qlc:q([X || X <- qlc:append(QH1, QH2)], {unique, true})

The cost is substantial: every returned answer will be stored in an ETS table. Before returning an answer it is looked up in the ETS table to check if it has already been returned. Without the unique options all answers to QH1 would be returned followed by all answers to QH2. The unique options keeps the order between the remaining answers.

If the order of the answers is not important there is the alternative to sort the answers uniquely:

qlc:sort(qlc:q([X || X <- qlc:append(QH1, QH2)], {unique, true})).

This query also removes duplicates but the answers will be sorted. If there are many answers temporary files will be used. Note that in order to get the first unique answer all answers have to be found and sorted. Both alternatives find duplicates by comparing answers, that is, if A1 and A2 are answers found in that order, then A2 is a removed if A1 == A2.

To return just a few answers cursors can be used. The following code returns no more than five answers using an ETS table for storing the unique answers:

C = qlc:cursor(qlc:q([X || X <- qlc:append(QH1, QH2)],{unique,true})),
R = qlc:next_answers(C, 5),
ok = qlc:delete_cursor(C),
R.

Query list comprehensions are convenient for stating constraints on data from two or more tables. An example that does a natural join on two query handles on position 2:

qlc:q([{X1,X2,X3,Y1} || 
          {X1,X2,X3} <- QH1, 
          {Y1,Y2} <- QH2, 
          X2 =:= Y2])

The qlc module will evaluate this differently depending on the query handles QH1 and QH2. If, for example, X2 is matched against the key of a QLC table the lookup join method will traverse the objects of QH2 while looking up key values in the table. On the other hand, if neither X2 nor Y2 is matched against the key or an indexed position of a QLC table, the merge join method will make sure that QH1 and QH2 are both sorted on position 2 and next do the join by traversing the objects one by one.

The join option can be used to force the qlc module to use a certain join method. For the rest of this section it is assumed that the excessively slow join method called "nested loop" has been chosen:

qlc:q([{X1,X2,X3,Y1} || 
          {X1,X2,X3} <- QH1, 
          {Y1,Y2} <- QH2, 
          X2 =:= Y2],
      {join, nested_loop})

In this case the filter will be applied to every possible pair of answers to QH1 and QH2, one at a time. If there are M answers to QH1 and N answers to QH2 the filter will be run M*N times.

If QH2 is a call to the function for gb_trees as defined in the Implementing a QLC table section, gb_table:table/1, the iterator for the gb-tree will be initiated for each answer to QH1 after which the objects of the gb-tree will be returned one by one. This is probably the most efficient way of traversing the table in that case since it takes minimal computational power to get the following object. But if QH2 is not a table but a more complicated QLC, it can be more efficient use some RAM memory for collecting the answers in a cache, particularly if there are only a few answers. It must then be assumed that evaluating QH2 has no side effects so that the meaning of the query does not change if QH2 is evaluated only once. One way of caching the answers is to evaluate QH2 first of all and substitute the list of answers for QH2 in the query. Another way is to use the cache option. It is stated like this:

QH2' = qlc:q([X || X <- QH2], {cache, ets})

or just

QH2' = qlc:q([X || X <- QH2], cache)

The effect of the cache option is that when the generator QH2' is run the first time every answer is stored in an ETS table. When next answer of QH1 is tried, answers to QH2' are copied from the ETS table which is very fast. As for the unique option the cost is a possibly substantial amount of RAM memory. The {cache, list} option offers the possibility to store the answers in a list on the process heap. While this has the potential of being faster than ETS tables since there is no need to copy answers from the table it can often result in slower evaluation due to more garbage collections of the process' heap as well as increased RAM memory consumption due to larger heaps. Another drawback with cache lists is that if the size of the list exceeds a limit a temporary file will be used. Reading the answers from a file is very much slower than copying them from an ETS table. But if the available RAM memory is scarce setting the limit to some low value is an alternative.

There is an option cache_all that can be set to ets or list when evaluating a query. It adds a cache or {cache, list} option to every list expression except QLC tables and lists on all levels of the query. This can be used for testing if caching would improve efficiency at all. If the answer is yes further testing is needed to pinpoint the generators that should be cached.

As an example of how to use the qlc:table/2 function the implementation of a QLC table for the gb_trees module is given:

-module(gb_table).

-export([table/1]).

table(T) ->
    TF = fun() -> qlc_next(gb_trees:next(gb_trees:iterator(T))) end,
    InfoFun = fun(num_of_objects) -> gb_trees:size(T);
                 (keypos) -> 1;
                 (is_sorted_key) -> true;
                 (is_unique_objects) -> true;
                 (_) -> undefined
              end,
    LookupFun =
        fun(1, Ks) ->
                lists:flatmap(fun(K) ->
                                      case gb_trees:lookup(K, T) of
                                          {value, V} -> [{K,V}];
                                          none -> []
                                      end
                              end, Ks)
        end,
    FormatFun =
        fun({all, NElements, ElementFun}) ->
                ValsS = io_lib:format("gb_trees:from_orddict(~w)",
                                      [gb_nodes(T, NElements, ElementFun)]),
                io_lib:format("gb_table:table(~s)", [ValsS]);
           ({lookup, 1, KeyValues, _NElements, ElementFun}) ->
                ValsS = io_lib:format("gb_trees:from_orddict(~w)",
                                      [gb_nodes(T, infinity, ElementFun)]),
                io_lib:format("lists:flatmap(fun(K) -> "
                              "case gb_trees:lookup(K, ~s) of "
                              "{value, V} -> [{K,V}];none -> [] end "
                              "end, ~w)",
                              [ValsS, [ElementFun(KV) || KV <- KeyValues]])
        end,
    qlc:table(TF, [{info_fun, InfoFun}, {format_fun, FormatFun},
                   {lookup_fun, LookupFun},{key_equality,'=='}]).

qlc_next({X, V, S}) ->
    [{X,V} | fun() -> qlc_next(gb_trees:next(S)) end];
qlc_next(none) ->
    [].

gb_nodes(T, infinity, ElementFun) ->
    gb_nodes(T, -1, ElementFun);
gb_nodes(T, NElements, ElementFun) ->
    gb_iter(gb_trees:iterator(T), NElements, ElementFun).

gb_iter(_I, 0, _EFun) ->
    '...';
gb_iter(I0, N, EFun) ->
    case gb_trees:next(I0) of
        {X, V, I} ->
            [EFun({X,V}) | gb_iter(I, N-1, EFun)];
        none ->
            []
    end.

TF is the traversal function. The qlc module requires that there is a way of traversing all objects of the data structure; in gb_trees there is an iterator function suitable for that purpose. Note that for each object returned a new fun is created. As long as the list is not terminated by [] it is assumed that the tail of the list is a nullary function and that calling the function returns further objects (and functions).

The lookup function is optional. It is assumed that the lookup function always finds values much faster than it would take to traverse the table. The first argument is the position of the key. Since qlc_next returns the objects as {Key, Value} pairs the position is 1. Note that the lookup function should return {Key, Value} pairs, just as the traversal function does.

The format function is also optional. It is called by qlc:info to give feedback at runtime of how the query will be evaluated. One should try to give as good feedback as possible without showing too much details. In the example at most 7 objects of the table are shown. The format function handles two cases: all means that all objects of the table will be traversed; {lookup, 1, KeyValues} means that the lookup function will be used for looking up key values.

Whether the whole table will be traversed or just some keys looked up depends on how the query is stated. If the query has the form

qlc:q([T || P <- LE, F])

and P is a tuple, the qlc module analyzes P and F in compile time to find positions of the tuple P that are tested for equality to constants. If such a position at runtime turns out to be the key position, the lookup function can be used, otherwise all objects of the table have to be traversed. It is the info function InfoFun that returns the key position. There can be indexed positions as well, also returned by the info function. An index is an extra table that makes lookup on some position fast. Mnesia maintains indices upon request, thereby introducing so called secondary keys. The qlc module prefers to look up objects using the key before secondary keys regardless of the number of constants to look up.

In Erlang there are two operators for testing term equality, namely ==/2 and =:=/2. The difference between them is all about the integers that can be represented by floats. For instance, 2 == 2.0 evaluates to true while 2 =:= 2.0 evaluates to false. Normally this is a minor issue, but the qlc module cannot ignore the difference, which affects the user's choice of operators in QLCs.

If the qlc module can find out at compile time that some constant is free of integers, it does not matter which one of ==/2 or =:=/2 is used:

1> E1 = ets:new(t, [set]), % uses =:=/2 for key equality
Q1 = qlc:q([K ||
{K} <- ets:table(E1),
K == 2.71 orelse K == a]),
io:format("~s~n", [qlc:info(Q1)]).
ets:match_spec_run(lists:flatmap(fun(V) ->
                                        ets:lookup(20493, V)
                                 end,
                                 [a,2.71]),
                   ets:match_spec_compile([{{'$1'},[],['$1']}]))

In the example the ==/2 operator has been handled exactly as =:=/2 would have been handled. On the other hand, if it cannot be determined at compile time that some constant is free of integers and the table uses =:=/2 when comparing keys for equality (see the option key_equality), the qlc module will not try to look up the constant. The reason is that there is in the general case no upper limit on the number of key values that can compare equal to such a constant; every combination of integers and floats has to be looked up:

2> E2 = ets:new(t, [set]),
true = ets:insert(E2, [{{2,2},a},{{2,2.0},b},{{2.0,2},c}]),
F2 = fun(I) ->
qlc:q([V || {K,V} <- ets:table(E2), K == I])
end,
Q2 = F2({2,2}),
io:format("~s~n", [qlc:info(Q2)]).
ets:table(53264,
          [{traverse,
            {select,[{{'$1','$2'},[{'==','$1',{const,{2,2}}}],['$2']}]}}])
3> lists:sort(qlc:e(Q2)).
[a,b,c]

Looking up just {2,2} would not return b and c.

If the table uses ==/2 when comparing keys for equality, the qlc module will look up the constant regardless of which operator is used in the QLC. However, ==/2 is to be preferred:

4> E3 = ets:new(t, [ordered_set]), % uses ==/2 for key equality
true = ets:insert(E3, [{{2,2.0},b}]),
F3 = fun(I) ->
qlc:q([V || {K,V} <- ets:table(E3), K == I])
end,
Q3 = F3({2,2}),
io:format("~s~n", [qlc:info(Q3)]).
ets:match_spec_run(ets:lookup(86033, {2,2}),
                   ets:match_spec_compile([{{'$1','$2'},[],['$2']}]))
5> qlc:e(Q3).
[b]

Lookup join is handled analogously to lookup of constants in a table: if the join operator is ==/2 and the table where constants are to be looked up uses =:=/2 when testing keys for equality, the qlc module will not consider lookup join for that table.

Types:

QHL = [QueryHandleOrList]

QH = QueryHandle

Returns a query handle. When evaluating the query handle QH all answers to the first query handle in QHL is returned followed by all answers to the rest of the query handles in QHL.

Types:

QH1 = QH2 = QueryHandleOrList

QH3 = QueryHandle

Returns a query handle. When evaluating the query handle QH3 all answers to QH1 are returned followed by all answers to QH2.

append(QH1, QH2) is equivalent to append([QH1, QH2]).

Types:

Options = [Option] | Option

Option = {cache_all, Cache} | cache_all | {max_list_size, MaxListSize} | {spawn_options, SpawnOptions} | {tmpdir_usage, TmpFileUsage} | {tmpdir, TempDirectory} | {unique_all, bool()} | unique_all

Creates a query cursor and makes the calling process the owner of the cursor. The cursor is to be used as argument to next_answers/1,2 and (eventually) delete_cursor/1. Calls erlang:spawn_opt to spawn and link a process which will evaluate the query handle. The value of the option spawn_options is used as last argument when calling spawn_opt. The default value is [link].

1> QH = qlc:q([{X,Y} || X <- [a,b], Y <- [1,2]]),
QC = qlc:cursor(QH),
qlc:next_answers(QC, 1).
[{a,1}]
2> qlc:next_answers(QC, 1).
[{a,2}]
3> qlc:next_answers(QC, all_remaining).
[{b,1},{b,2}]
4> qlc:delete_cursor(QC).
ok

Deletes a query cursor. Only the owner of the cursor can delete the cursor.

Types:

Options = [Option] | Option

Option = {cache_all, Cache} | cache_all | {max_list_size, MaxListSize} | {tmpdir_usage, TmpFileUsage} | {tmpdir, TempDirectory} | {unique_all, bool()} | unique_all

Error = {error, module(), Reason}

Reason = - as returned by file_sorter(3) -

Evaluates a query handle in the calling process and collects all answers in a list.

1> QH = qlc:q([{X,Y} || X <- [a,b], Y <- [1,2]]),
qlc:eval(QH).
[{a,1},{a,2},{b,1},{b,2}]

Types:

Function = fun(Answer, AccIn) -> AccOut

Acc0 = Acc1 = AccIn = AccOut = term()

Options = [Option] | Option

Option = {cache_all, Cache} | cache_all | {max_list_size, MaxListSize} | {tmpdir_usage, TmpFileUsage} | {tmpdir, TempDirectory} | {unique_all, bool()} | unique_all

Error = {error, module(), Reason}

Reason = - as returned by file_sorter(3) -

Calls Function on successive answers to the query handle together with an extra argument AccIn. The query handle and the function are evaluated in the calling process. Function must return a new accumulator which is passed to the next call. Acc0 is returned if there are no answers to the query handle.

1> QH = [1,2,3,4,5,6],
qlc:fold(fun(X, Sum) -> X + Sum end, 0, QH).
21

Types:

Error = {error, module(), term()}

Chars = [char() | Chars]

Returns a descriptive string in English of an error tuple returned by some of the functions of the qlc module or the parse transform. This function is mainly used by the compiler invoking the parse transform.

Types:

Options = [Option] | Option

Option = EvalOption | ReturnOption

EvalOption = {cache_all, Cache} | cache_all | {max_list_size, MaxListSize} | {tmpdir_usage, TmpFileUsage} | {tmpdir, TempDirectory} | {unique_all, bool()} | unique_all

ReturnOption = {depth, Depth} | {flat, bool()} | {format, Format} | {n_elements, NElements}

Depth = infinity | int() >= 0

Format = abstract_code | string

NElements = infinity | int() > 0

Info = AbstractExpression | string()

Returns information about a query handle. The information describes the simplifications and optimizations that are the results of preparing the query for evaluation. This function is probably useful mostly during debugging.

The information has the form of an Erlang expression where QLCs most likely occur. Depending on the format functions of mentioned QLC tables it may not be absolutely accurate.

The default is to return a sequence of QLCs in a block, but if the option {flat, false} is given, one single QLC is returned. The default is to return a string, but if the option {format, abstract_code} is given, abstract code is returned instead. In the abstract code port identifiers, references, and pids are represented by strings. The default is to return all elements in lists, but if the {n_elements, NElements} option is given, only a limited number of elements are returned. The default is to show all of objects and match specifications, but if the {depth, Depth} option is given, parts of terms below a certain depth are replaced by '...'.

1> QH = qlc:q([{X,Y} || X <- [x,y], Y <- [a,b]]),
io:format("~s~n", [qlc:info(QH, unique_all)]).
begin
    V1 =
        qlc:q([ 
               SQV ||
                   SQV <- [x,y]
              ],
              [{unique,true}]),
    V2 =
        qlc:q([ 
               SQV ||
                   SQV <- [a,b]
              ],
              [{unique,true}]),
    qlc:q([ 
           {X,Y} ||
               X <- V1,
               Y <- V2
          ],
          [{unique,true}])
end

In this example two simple QLCs have been inserted just to hold the {unique, true} option.

1> E1 = ets:new(e1, []),
E2 = ets:new(e2, []),
true = ets:insert(E1, [{1,a},{2,b}]),
true = ets:insert(E2, [{a,1},{b,2}]),
Q = qlc:q([{X,Z,W} ||
{X, Z} <- ets:table(E1),
{W, Y} <- ets:table(E2),
X =:= Y]),
io:format("~s~n", [qlc:info(Q)]).
begin
    V1 =
        qlc:q([ 
               P0 ||
                   P0 = {W,Y} <- ets:table(17)
              ]),
    V2 =
        qlc:q([ 
               [G1|G2] ||
                   G2 <- V1,
                   G1 <- ets:table(16),
                   element(2, G1) =:= element(1, G2)
              ],
              [{join,lookup}]),
    qlc:q([ 
           {X,Z,W} ||
               [{X,Z}|{W,Y}] <- V2
          ])
end

In this example the query list comprehension V2 has been inserted to show the joined generators and the join method chosen. A convention is used for lookup join: the first generator (G2) is the one traversed, the second one (G1) is the table where constants are looked up.

Types:

QH1 = QueryHandleOrList

QH2 = QueryHandle

Returns a query handle. When evaluating the query handle QH2 the answers to the query handle QH1 are sorted by file_sorter:keysort/4 according to the options.

The sorter will use temporary files only if QH1 does not evaluate to a list and the size of the binary representation of the answers exceeds Size bytes, where Size is the value of the size option.

Types:

NumberOfAnswers = all_remaining | int() > 0

Error = {error, module(), Reason}

Reason = - as returned by file_sorter(3) -

Returns some or all of the remaining answers to a query cursor. Only the owner of Cursor can retrieve answers.

The optional argument NumberOfAnswersdetermines the maximum number of answers returned. The default value is 10. If less than the requested number of answers is returned, subsequent calls to next_answers will return [].

Types:

QueryListComprehension = - literal query listcomprehension -

Options = [Option] | Option

Option = {max_lookup, MaxLookup} | {cache, Cache} | cache | {join, Join} | {lookup, Lookup} | {unique, bool()} | unique

MaxLookup = int() >= 0 | infinity

Join = any | lookup | merge | nested_loop

Lookup = bool() | any

Returns a query handle for a query list comprehension. The query list comprehension must be the first argument to qlc:q/1,2 or it will be evaluated as an ordinary list comprehension. It is also necessary to add the line

-include_lib("stdlib/include/qlc.hrl").

to the source file. This causes a parse transform to substitute a fun for the query list comprehension. The (compiled) fun will be called when the query handle is evaluated.

When calling qlc:q/1,2 from the Erlang shell the parse transform is automatically called. When this happens the fun substituted for the query list comprehension is not compiled but will be evaluated by erl_eval(3). This is also true when expressions are evaluated by means of file:eval/1,2 or in the debugger.

To be very explicit, this will not work:

...
A = [X || {X} <- [{1},{2}]], 
QH = qlc:q(A),
...

The variable A will be bound to the evaluated value of the list comprehension ([1,2]). The compiler complains with an error message ("argument is not a query list comprehension"); the shell process stops with a badarg reason.

The {cache, ets} option can be used to cache the answers to a query list comprehension. The answers are stored in one ETS table for each cached query list comprehension. When a cached query list comprehension is evaluated again, answers are fetched from the table without any further computations. As a consequence, when all answers to a cached query list comprehension have been found, the ETS tables used for caching answers to the query list comprehension's qualifiers can be emptied. The option cache is equivalent to {cache, ets}.

The {cache, list} option can be used to cache the answers to a query list comprehension just like {cache, ets}. The difference is that the answers are kept in a list (on the process heap). If the answers would occupy more than a certain amount of RAM memory a temporary file is used for storing the answers. The option max_list_size sets the limit in bytes and the temporary file is put on the directory set by the tmpdir option.

The cache option has no effect if it is known that the query list comprehension will be evaluated at most once. This is always true for the top-most query list comprehension and also for the list expression of the first generator in a list of qualifiers. Note that in the presence of side effects in filters or callback functions the answers to query list comprehensions can be affected by the cache option.

The {unique, true} option can be used to remove duplicate answers to a query list comprehension. The unique answers are stored in one ETS table for each query list comprehension. The table is emptied every time it is known that there are no more answers to the query list comprehension. The option unique is equivalent to {unique, true}. If the unique option is combined with the {cache, ets} option, two ETS tables are used, but the full answers are stored in one table only. If the unique option is combined with the {cache, list} option the answers are sorted twice using keysort/3; once to remove duplicates, and once to restore the order.

The cache and unique options apply not only to the query list comprehension itself but also to the results of looking up constants, running match specifications, and joining handles.

1> Q = qlc:q([{A,X,Z,W} ||
A <- [a,b,c],
{X,Z} <- [{a,1},{b,4},{c,6}],
{W,Y} <- [{2,a},{3,b},{4,c}],
X =:= Y],
{cache, list}),
io:format("~s~n", [qlc:info(Q)]).
begin
    V1 =
        qlc:q([ 
               P0 ||
                   P0 = {X,Z} <-
                       qlc:keysort(1, [{a,1},{b,4},{c,6}], [])
              ]),
    V2 =
        qlc:q([ 
               P0 ||
                   P0 = {W,Y} <-
                       qlc:keysort(2, [{2,a},{3,b},{4,c}], [])
              ]),
    V3 =
        qlc:q([ 
               [G1|G2] ||
                   G1 <- V1,
                   G2 <- V2,
                   element(1, G1) == element(2, G2)
              ],
              [{join,merge},{cache,list}]),
    qlc:q([ 
           {A,X,Z,W} ||
               A <- [a,b,c],
               [{X,Z}|{W,Y}] <- V3,
               X =:= Y
          ])
end

In this example the cached results of the merge join are traversed for each value of A. Note that without the cache option the join would have been carried out three times, once for each value of A

sort/1,2 and keysort/2,3 can also be used for caching answers and for removing duplicates. When sorting answers are cached in a list, possibly stored on a temporary file, and no ETS tables are used.

Sometimes (see qlc:table/2 below) traversal of tables can be done by looking up key values, which is assumed to be fast. Under certain (rare) circumstances it could happen that there are too many key values to look up. The {max_lookup, MaxLookup} option can then be used to limit the number of lookups: if more than MaxLookup lookups would be required no lookups are done but the table traversed instead. The default value is infinity which means that there is no limit on the number of keys to look up.

1> T = gb_trees:empty(),
QH = qlc:q([X || {{X,Y},_} <- gb_table:table(T),
((X == 1) or (X == 2)) andalso
((Y == a) or (Y == b) or (Y == c))]),
io:format("~s~n", [qlc:info(QH)]).
ets:match_spec_run(
       lists:flatmap(fun(K) ->
                            case
                                gb_trees:lookup(K,
                                                gb_trees:from_orddict([]))
                            of
                                {value,V} ->
                                    [{K,V}];
                                none ->
                                    []
                            end
                     end,
                     [{1,a},{1,b},{1,c},{2,a},{2,b},{2,c}]),
       ets:match_spec_compile([{{{'$1','$2'},'_'},[],['$1']}]))

In this example using the gb_table module from the Implementing a QLC table section there are six keys to look up: {1,a}, {1,b}, {1,c}, {2,a}, {2,b}, and {2,c}. The reason is that the two elements of the key {X, Y} are compared separately.

The {lookup, true} option can be used to ensure that the qlc module will look up constants in some QLC table. If there are more than one QLC table among the generators' list expressions, constants have to be looked up in at least one of the tables. The evaluation of the query fails if there are no constants to look up. This option is useful in situations when it would be unacceptable to traverse all objects in some table. Setting the lookup option to false ensures that no constants will be looked up ({max_lookup, 0} has the same effect). The default value is any which means that constants will be looked up whenever possible.

The {join, Join} option can be used to ensure that a certain join method will be used: {join, lookup} invokes the lookup join method; {join, merge} invokes the merge join method; and {join, nested_loop} invokes the method of matching every pair of objects from two handles. The last method is mostly very slow. The evaluation of the query fails if the qlc module cannot carry out the chosen join method. The default value is any which means that some fast join method will be used if possible.

Types:

QH1 = QueryHandleOrList

QH2 = QueryHandle

Returns a query handle. When evaluating the query handle QH2 the answers to the query handle QH1 are sorted by file_sorter:sort/3 according to the options.

The sorter will use temporary files only if QH1 does not evaluate to a list and the size of the binary representation of the answers exceeds Size bytes, where Size is the value of the size option.

Types:

QueryString = string()

Options = [Option] | Option

Option = {max_lookup, MaxLookup} | {cache, Cache} | cache | {join, Join} | {lookup, Lookup} | {unique, bool()} | unique

MaxLookup = int() >= 0 | infinity

Join = any | lookup | merge | nested_loop

Lookup = bool() | any

Bindings = - as returned by erl_eval:bindings/1 -

Error = {error, module(), Reason}

Reason = - ErrorInfo as returned by erl_scan:string/1 or erl_parse:parse_exprs/1 -

A string version of qlc:q/1,2. When the query handle is evaluated the fun created by the parse transform is interpreted by erl_eval(3). The query string is to be one single query list comprehension terminated by a period.

1> L = [1,2,3],
Bs = erl_eval:add_binding('L', L, erl_eval:new_bindings()),
QH = qlc:string_to_handle("[X+1 || X <- L].", [], Bs),
qlc:eval(QH).
[2,3,4]

This function is probably useful mostly when called from outside of Erlang, for instance from a driver written in C.

Types:

TraverseFun = TraverseFun0 | TraverseFun1

TraverseFun0 = fun() -> TraverseResult

TraverseFun1 = fun(MatchExpression) -> TraverseResult

TraverseResult = Objects | term()

Objects = [] | [term() | ObjectList]

ObjectList = TraverseFun0 | Objects

Options = [Option] | Option

Option = {format_fun, FormatFun} | {info_fun, InfoFun} | {lookup_fun, LookupFun} | {parent_fun, ParentFun} | {post_fun, PostFun} | {pre_fun, PreFun} | {key_equality, KeyComparison}

FormatFun = undefined | fun(SelectedObjects) -> FormatedTable

SelectedObjects = all | {all, NElements, DepthFun} | {match_spec, MatchExpression} | {lookup, Position, Keys} | {lookup, Position, Keys, NElements, DepthFun}

NElements = infinity | int() > 0

DepthFun = fun(term()) -> term()

FormatedTable = {Mod, Fun, Args} | AbstractExpression | character_list()

InfoFun = undefined | fun(InfoTag) -> InfoValue

InfoTag = indices | is_unique_objects | keypos | num_of_objects

InfoValue = undefined | term()

LookupFun = undefined | fun(Position, Keys) -> LookupResult

LookupResult = [term()] | term()

ParentFun = undefined | fun() -> ParentFunValue

PostFun = undefined | fun() -> void()

PreFun = undefined | fun([PreArg]) -> void()

PreArg = {parent_value, ParentFunValue} | {stop_fun, StopFun}

ParentFunValue = undefined | term()

StopFun = undefined | fun() -> void()

KeyComparison = '=:=' | '=='

Position = int() > 0

Keys = [term()]

Mod = Fun = atom()

Args = [term()]

Returns a query handle for a QLC table. In Erlang/OTP there is support for ETS, Dets and Mnesia tables, but it is also possible to turn many other data structures into QLC tables. The way to accomplish this is to let function(s) in the module implementing the data structure create a query handle by calling qlc:table/2. The different ways to traverse the table as well as properties of the table are handled by callback functions provided as options to qlc:table/2.

The callback function TraverseFun is used for traversing the table. It is to return a list of objects terminated by either [] or a nullary fun to be used for traversing the not yet traversed objects of the table. Any other return value is immediately returned as value of the query evaluation. Unary TraverseFuns are to accept a match specification as argument. The match specification is created by the parse transform by analyzing the pattern of the generator calling qlc:table/2 and filters using variables introduced in the pattern. If the parse transform cannot find a match specification equivalent to the pattern and filters, TraverseFun will be called with a match specification returning every object. Modules that can utilize match specifications for optimized traversal of tables should call qlc:table/2 with a unary TraverseFun while other modules can provide a nullary TraverseFun. ets:table/2 is an example of the former; gb_table:table/1 in the Implementing a QLC table section is an example of the latter.

PreFun is a unary callback function that is called once before the table is read for the first time. If the call fails, the query evaluation fails. Similarly, the nullary callback function PostFun is called once after the table was last read. The return value, which is caught, is ignored. If PreFun has been called for a table, PostFun is guaranteed to be called for that table, even if the evaluation of the query fails for some reason. The order in which pre (post) functions for different tables are evaluated is not specified. Other table access than reading, such as calling InfoFun, is assumed to be OK at any time. The argument PreArgs is a list of tagged values. Currently there are two tags, parent_value and stop_fun, used by Mnesia for managing transactions. The value of parent_value is the value returned by ParentFun, or undefined if there is no ParentFun. ParentFun is called once just before the call of PreFun in the context of the process calling eval, fold, or cursor. The value of stop_fun is a nullary fun that deletes the cursor if called from the parent, or undefined if there is no cursor.

The binary callback function LookupFun is used for looking up objects in the table. The first argument Position is the key position or an indexed position and the second argument Keys is a sorted list of unique values. The return value is to be a list of all objects (tuples) such that the element at Position is a member of Keys. Any other return value is immediately returned as value of the query evaluation. LookupFun is called instead of traversing the table if the parse transform at compile time can find out that the filters match and compare the element at Position in such a way that only Keys need to be looked up in order to find all potential answers. The key position is obtained by calling InfoFun(keypos) and the indexed positions by calling InfoFun(indices). If the key position can be used for lookup it is always chosen, otherwise the indexed position requiring the least number of lookups is chosen. If there is a tie between two indexed positions the one occurring first in the list returned by InfoFun is chosen. Positions requiring more than max_lookup lookups are ignored.

The unary callback function InfoFun is to return information about the table. undefined should be returned if the value of some tag is unknown:

indices. Returns a list of indexed positions, a list of positive integers.
is_unique_objects. Returns true if the objects returned by TraverseFun are unique.
keypos. Returns the position of the table's key, a positive integer.
is_sorted_key. Returns true if the objects returned by TraverseFun are sorted on the key.
num_of_objects. Returns the number of objects in the table, a non-negative integer.

The unary callback function FormatFun is used by qlc:info/1,2 for displaying the call that created the table's query handle. The default value, undefined, means that info/1,2 displays a call to '$MOD':'$FUN'/0. It is up to FormatFun to present the selected objects of the table in a suitable way. However, if a character list is chosen for presentation it must be an Erlang expression that can be scanned and parsed (a trailing dot will be added by qlc:info though). FormatFun is called with an argument that describes the selected objects based on optimizations done as a result of analyzing the filters of the QLC where the call to qlc:table/2 occurs. The possible values of the argument are:

{lookup, Position, Keys, NElements, DepthFun}. LookupFun is used for looking up objects in the table.
{match_spec, MatchExpression}. No way of finding all possible answers by looking up keys was found, but the filters could be transformed into a match specification. All answers are found by calling TraverseFun(MatchExpression).
{all, NElements, DepthFun}. No optimization was found. A match specification matching all objects will be used if TraverseFun is unary.

NElements is the value of the info/1,2 option n_elements, and DepthFun is a function that can be used for limiting the size of terms; calling DepthFun(Term) substitutes '...' for parts of Term below the depth specified by the info/1,2 option depth. If calling FormatFun with an argument including NElements and DepthFun fails, FormatFun is called once again with an argument excluding NElements and DepthFun ({lookup, Position, Keys} or all).

The value of key_equality is to be '=:=' if the table considers two keys equal if they match, and to be '==' if two keys are equal if they compare equal. The default is '=:='.

See ets(3), dets(3) and mnesia(3) for the various options recognized by table/1,2 in respective module.

dets(3), Erlang Reference Manual, erl_eval(3), erlang(3), ets(3), file(3), error_logger(3), file_sorter(3), mnesia(3), Programming Examples, shell(3)

qlc

MODULE

MODULE SUMMARY

DESCRIPTION

Overview

Syntax

Evaluation

Join

Common options

Common data types

Getting started

Implementing a QLC table

Key equality

EXPORTS

See Also