[erlang-questions] External sorting for large files in Erlang
Joe Armstrong
erlang@REDACTED
Wed Aug 1 16:12:48 CEST 2012
On Tue, Jul 31, 2012 at 11:32 PM, Zabrane Mickael <zabrane3@REDACTED> wrote:
> Hi,
>
> I'm looking for something similar to this, but in Erlang:
> http://code.google.com/p/externalsortinginjava/
>
> I found an old post suggesting file_sorter:
> http://www.erlang.org/doc/man/file_sorter.html
> But file_sorter seems to only work on binary files.
This is one of my favorite modules - it is very fast.
file_sorter sorts binary encoded terms.
Each entry is a 4 byte length header followed by term_to_binary(Term)
Here's an example of how to encode some terms, write them to a file
sort the file and read them back.
-- example
-module(test1).
-compile(export_all).
test() ->
L = [encode(I) || I <- [{yes,6,1},no, {yes,1,2},
{hello,22},{yes,12,10}, {hello,12}]],
file:write_file("foo", L),
file_sorter:sort("foo"),
{ok, Bin} = file:read_file("foo"),
decode(Bin).
%% encode(Term) makes a 4 byte length header followed
%% by term_to_binary(Term)
encode(T) ->
B = term_to_binary(T),
Len = size(B),
<<Len:32,B/binary>>.
decode(<<Len:32, B:Len/binary, B2/binary>>) ->
T = binary_to_term(B),
[T|decode(B2)];
decode(<<>>) ->
[].
--- end
It happily sorts extremely large files .... well worth using
> In my cas, I need something more flexible.
>
> What about controlling the Unix sort command from Erlang?
os:cmd("sort <in >out").
/Joe
>
> Any hints, ideas, suggestions, code?
>
> Regards,
> Zabrane
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
More information about the erlang-questions
mailing list