[erlang-questions] External sorting for large files in Erlang

Joe Armstrong erlang@REDACTED
Wed Aug 1 16:12:48 CEST 2012


On Tue, Jul 31, 2012 at 11:32 PM, Zabrane Mickael <zabrane3@REDACTED> wrote:
> Hi,
>
> I'm looking for something similar to this, but in Erlang:
> http://code.google.com/p/externalsortinginjava/
>
> I found an old post suggesting file_sorter:
> http://www.erlang.org/doc/man/file_sorter.html
> But file_sorter seems to only  work on binary files.

This is one of my favorite modules - it is very fast.

file_sorter sorts binary encoded terms.
Each entry is a 4 byte length header followed by term_to_binary(Term)

Here's an example of how to encode some terms, write them to a file
sort the file and read them back.

-- example

-module(test1).
-compile(export_all).

test() ->
    L = [encode(I) || I <- [{yes,6,1},no, {yes,1,2},
			    {hello,22},{yes,12,10}, {hello,12}]],
    file:write_file("foo", L),
    file_sorter:sort("foo"),
    {ok, Bin} = file:read_file("foo"),
    decode(Bin).

%% encode(Term) makes a 4 byte length header followed
%% by term_to_binary(Term)

encode(T) ->
    B = term_to_binary(T),
    Len = size(B),
    <<Len:32,B/binary>>.

decode(<<Len:32, B:Len/binary, B2/binary>>) ->
    T = binary_to_term(B),
    [T|decode(B2)];
decode(<<>>) ->
    [].

--- end

It happily sorts extremely large files .... well worth using


> In my cas, I need something more flexible.
>
> What about controlling the Unix sort command from Erlang?

   os:cmd("sort <in >out").

/Joe


>
> Any hints, ideas, suggestions, code?
>
> Regards,
> Zabrane
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>



More information about the erlang-questions mailing list