# `file_sorter` [🔗](https://github.com/garazdawi/otp/blob/lukas/shell_docs/fix-bugs/lib/stdlib/src/file_sorter.erl#L22) File sorter. This module contains functions for sorting terms on files, merging already sorted files, and checking files for sortedness. Chunks containing binary terms are read from a sequence of files, sorted internally in memory and written on temporary files, which are merged producing one sorted file as output. Merging is provided as an optimization; it is faster when the files are already sorted, but it always works to sort instead of merge. On a file, a term is represented by a header and a binary. Two options define the format of terms on files: - **`{header, HeaderLength}`** - `HeaderLength` determines the number of bytes preceding each binary and containing the length of the binary in bytes. Defaults to 4. The order of the header bytes is defined as follows: if `B` is a binary containing a header only, size `Size` of the binary is calculated as `<> = B`. - **`{format, Format}`** - Option `Format` determines the function that is applied to binaries to create the terms to be sorted. Defaults to `binary_term`, which is equivalent to `fun binary_to_term/1`. Value `binary` is equivalent to `fun(X) -> X end`, which means that the binaries are sorted as they are. This is the fastest format. If `Format` is `term`, `io:read/2` is called to read terms. In that case, only the default value of option `header` is allowed. Option `format` also determines what is written to the sorted output file: if `Format` is `term`, then `io:format/3` is called to write each term, otherwise the binary prefixed by a header is written. Notice that the binary written is the same binary that was read; the results of applying function `Format` are thrown away when the terms have been sorted. Reading and writing terms using the `io` module is much slower than reading and writing binaries. Other options are: - **`{order, Order}`** - The default is to sort terms in ascending order, but that can be changed by value `descending` or by specifying an ordering function `Fun`. An ordering function is antisymmetric, transitive, and total. `Fun(A, B)` is to return `true` if `A` comes before `B` in the ordering, otherwise `false`. An example of a typical ordering function is less than or equal to, `= {ok, _} = disk_log:open([{name,Log}, {mode,read_only}]), Input = input(Log, start), Output = output([]), Reply = file_sorter:sort(Input, Output, {format,term}), ok = disk_log:close(Log), Reply. input(Log, Cont) -> fun(close) -> ok; (read) -> case disk_log:chunk(Log, Cont) of {error, Reason} -> {error, Reason}; {Cont2, Terms} -> {Terms, input(Log, Cont2)}; {Cont2, Terms, _Badbytes} -> {Terms, input(Log, Cont2)}; eof -> end_of_input end end. output(L) -> fun(close) -> lists:append(lists:reverse(L)); (Terms) -> output([Terms | L]) end. ``` For more examples of functions as input and output, see the end of the `file_sorter` module; the `term` format is implemented with functions. The possible values of `Reason` returned when an error occurs are: - `bad_object`, `{bad_object, FileName}` \- Applying the format function failed for some binary, or the key(s) could not be extracted from some term. - `{bad_term, FileName}` \- `io:read/2` failed to read some term. - `{file_error, FileName, file:posix()}` \- For an explanation of [`file:posix()`](`t:file:posix/0`), see `m:file`. - `{premature_eof, FileName}` \- End-of-file was encountered inside some binary term. # `file_name` *not exported* ```erlang -type file_name() :: file:name(). ``` # `file_names` *not exported* ```erlang -type file_names() :: [file:name()]. ``` # `format` *not exported* ```erlang -type format() :: binary_term | term | binary | format_fun(). ``` # `format_fun` *not exported* ```erlang -type format_fun() :: fun((binary()) -> term()). ``` # `header_length` *not exported* ```erlang -type header_length() :: pos_integer(). ``` # `i_command` *not exported* ```erlang -type i_command() :: read | close. ``` # `i_reply` *not exported* ```erlang -type i_reply() :: end_of_input | {end_of_input, value()} | {[object()], infun()} | input_reply(). ``` # `infun` *not exported* ```erlang -type infun() :: fun((i_command()) -> i_reply()). ``` # `input` *not exported* ```erlang -type input() :: file_names() | infun(). ``` # `input_reply` *not exported* ```erlang -type input_reply() :: term(). ``` # `key_pos` *not exported* ```erlang -type key_pos() :: pos_integer() | [pos_integer()]. ``` # `no_files` *not exported* ```erlang -type no_files() :: pos_integer(). ``` # `o_command` *not exported* ```erlang -type o_command() :: {value, value()} | [object()] | close. ``` # `o_reply` *not exported* ```erlang -type o_reply() :: outfun() | output_reply(). ``` # `object` *not exported* ```erlang -type object() :: term() | binary(). ``` # `option` *not exported* ```erlang -type option() :: {compressed, boolean()} | {header, header_length()} | {format, format()} | {no_files, no_files()} | {order, order()} | {size, size()} | {tmpdir, tmp_directory()} | {unique, boolean()}. ``` # `options` *not exported* ```erlang -type options() :: [option()] | option(). ``` # `order` *not exported* ```erlang -type order() :: ascending | descending | order_fun(). ``` # `order_fun` *not exported* ```erlang -type order_fun() :: fun((term(), term()) -> boolean()). ``` # `outfun` *not exported* ```erlang -type outfun() :: fun((o_command()) -> o_reply()). ``` # `output` *not exported* ```erlang -type output() :: file_name() | outfun(). ``` # `output_reply` *not exported* ```erlang -type output_reply() :: term(). ``` # `reason` ```erlang -type reason() :: bad_object | {bad_object, file_name()} | {bad_term, file_name()} | {file_error, file_name(), file:posix() | badarg | system_limit} | {premature_eof, file_name()}. ``` # `size` *not exported* ```erlang -type size() :: non_neg_integer(). ``` # `tmp_directory` *not exported* ```erlang -type tmp_directory() :: [] | file:name(). ``` # `value` *not exported* ```erlang -type value() :: term(). ``` # `check` ```erlang -spec check(FileName) -> Reply when FileName :: file_name(), Reply :: {ok, [Result]} | {error, reason()}, Result :: {FileName, TermPosition, term()}, TermPosition :: pos_integer(). ``` # `check` ```erlang -spec check(FileNames, Options) -> Reply when FileNames :: file_names(), Options :: options(), Reply :: {ok, [Result]} | {error, reason()}, Result :: {FileName, TermPosition, term()}, FileName :: file_name(), TermPosition :: pos_integer(). ``` Checks files for sortedness. If a file is not sorted, the first out-of-order element is returned. The first term on a file has position 1. # `keycheck` ```erlang -spec keycheck(KeyPos, FileName) -> Reply when KeyPos :: key_pos(), FileName :: file_name(), Reply :: {ok, [Result]} | {error, reason()}, Result :: {FileName, TermPosition, term()}, TermPosition :: pos_integer(). ``` # `keycheck` ```erlang -spec keycheck(KeyPos, FileNames, Options) -> Reply when KeyPos :: key_pos(), FileNames :: file_names(), Options :: options(), Reply :: {ok, [Result]} | {error, reason()}, Result :: {FileName, TermPosition, term()}, FileName :: file_name(), TermPosition :: pos_integer(). ``` Checks files for sortedness. If a file is not sorted, the first out-of-order element is returned. The first term on a file has position 1. # `keymerge` ```erlang -spec keymerge(KeyPos, FileNames, Output) -> Reply when KeyPos :: key_pos(), FileNames :: file_names(), Output :: output(), Reply :: ok | {error, reason()} | output_reply(). ``` # `keymerge` ```erlang -spec keymerge(KeyPos, FileNames, Output, Options) -> Reply when KeyPos :: key_pos(), FileNames :: file_names(), Output :: output(), Options :: options(), Reply :: ok | {error, reason()} | output_reply(). ``` Merges tuples on files. Each input file is assumed to be sorted on key(s). # `keysort` ```erlang -spec keysort(KeyPos, FileName) -> Reply when KeyPos :: key_pos(), FileName :: file_name(), Reply :: ok | {error, reason()} | input_reply() | output_reply(). ``` Sorts tuples on files. # `keysort` ```erlang -spec keysort(KeyPos, Input, Output) -> Reply when KeyPos :: key_pos(), Input :: input(), Output :: output(), Reply :: ok | {error, reason()} | input_reply() | output_reply(). ``` # `keysort` ```erlang -spec keysort(KeyPos, Input, Output, Options) -> Reply when KeyPos :: key_pos(), Input :: input(), Output :: output(), Options :: options(), Reply :: ok | {error, reason()} | input_reply() | output_reply(). ``` Sorts tuples on files. The sort is performed on the element(s) mentioned in `KeyPos`. If two tuples compare equal (`==`) on one element, the next element according to `KeyPos` is compared. The sort is stable. # `merge` ```erlang -spec merge(FileNames, Output) -> Reply when FileNames :: file_names(), Output :: output(), Reply :: ok | {error, reason()} | output_reply(). ``` # `merge` ```erlang -spec merge(FileNames, Output, Options) -> Reply when FileNames :: file_names(), Output :: output(), Options :: options(), Reply :: ok | {error, reason()} | output_reply(). ``` Merges terms on files. Each input file is assumed to be sorted. # `sort` ```erlang -spec sort(FileName) -> Reply when FileName :: file_name(), Reply :: ok | {error, reason()} | input_reply() | output_reply(). ``` Sorts terms on files. # `sort` ```erlang -spec sort(Input, Output) -> Reply when Input :: input(), Output :: output(), Reply :: ok | {error, reason()} | input_reply() | output_reply(). ``` # `sort` ```erlang -spec sort(Input, Output, Options) -> Reply when Input :: input(), Output :: output(), Options :: options(), Reply :: ok | {error, reason()} | input_reply() | output_reply(). ``` Sorts terms on files. --- *Consult [api-reference.md](api-reference.md) for complete listing*