[erlang-questions] Python Generators In Erlang and constant memory usage

Sun Mar 22 07:26:52 CET 2009

I'd certainly love having this functionality in Reia

2009/3/19 Frederic T-H <mononcqc@REDACTED>

> First time posting here, I've been reading this mailing lists for a few
> months now, programming Erlang for a few months too.
>
> Python and its most functionnal parts is what eventually brought me to
> Erlang. One feature that I loved in Python and instantly noticed was not
> available in Erlang was generators. Generators are basically normal
> functions, except they return an iterator, letting you fetch each result one
> by one, effectively letting you work with constant memory. An example of the
> way Python does it:
>
>   def gen(n):
>       i = 0
>       while i<n:
>           yield i
>           i+=1
>       return
>
>   x = gen(2)
>   x.next() # returns 0
>   x.next() # returns 1
>   x.next()  # throws an exception
>
> In python, using yield creates a 'next' method, which is automatically
> called by most functions that can iterate over text, lists, tuples,
> dictionaries, etc.
>
> The idea is to not load all the results from a function at once and pass it
> around, but to send each part individually.
>
> As an example, say I have a result set consisting of 1000x5kb tuples
> (whatever they contain) which I want to map a function over. The way Erlang
> works, you'd need to have a list that is 5000kb in size in memory and
> manipulate it every single time the mapped function iterates over it.
>
> The end goal is to attempt to bypass having to evaluate the whole result
> set at once (heavy) and instead get each element individually (lighter).
> This is especially good for smaller systems like phones, web tablets (Nokia)
> or really anything with limited amounts of RAM.
>
> While possible to implement in Erlang, generators are quite problematic
> because every list BIF at the moment only operates on complete lists. This
> means adding generators in Erlang would require one to do something like
> duplicate the lists module.
>
> This may not be the first time this whole idea is discussed, but fast
> googling didn't reveal much so I've let myself write a small naive and
> simple prototype of what the implementation could be.
>
> The idea is quite simple: Have a recursive function as a different spawned
> process waiting to receive {Pid, next} before returning data. I've attached
> the file demonstrating the implementation, but in case I got it wrong, I
> also copy/pasted it there: http://pastebin.com/f31ae9731
>
> What the implementation does:
>
>   - define tail-recursive fibonacci functions, a normal one and a
>   generator version
>   - define mockups for the sum and fold (foldl) functions from the
>   list module adapted to generators
>   - short unscientific tests to compare performances.
>
> What it doesn't do:
>
>   - consider any kind of safety in case there is an error in a generator
>   - contain anything past what you'd do for a proof of concept.
>
>
> The way it works, a normal tail-recursive way to get all the fibonacci
> numbers until the nth one could be done the following way:
>
>   t_fib(1) -> [1];
>   t_fib(2) -> [1,1];
>   t_fib(L) -> t_fib(3,L,[1,1]).
>
>   t_fib(L,L,[N1,N2|_]=F) ->
>       lists:reverse([N1+N2|F]);
>   t_fib(C, L, [N1,N2|_]=F) ->
>       t_fib(C+1,L,[N1+N2|F]).
>
> That's not the prettiest function in the world, but it still does the job.
> Here's how it would be as a generator (with the macros I have defined in the
> file):
>
>   y_fib(L) ->
>       y_fib(1, L, 1, 0).
>   y_fib(L, L, N1 ,N2) ->
>       ?last_yield(N1+N2);
>   y_fib(C, L, N1, N2) ->
>       N = N1+N2,
>       ?yield(N),
>       y_fib(C+1, L, N2, N).
>
> The code overhead one gets writing it that way is minimal and denies the
> need to use a list as an accumulator.
>
> The generator sum function is written the following way:
>
>   sum({M,Gen,Args}) ->
>       Pid = spawn(M, Gen, Args),
>       sum(Pid,0).
>   sum(Pid,S) ->
>       Pid ! {self(), next},
>       receive
>           {Pid, N} -> sum(Pid, S+N);
>           {Pid, N, done} -> S+N
>       end.
>
> Called with the custom sum function in the generators module (This could be
> made better, I agree):
>
>    > generators:sum({generators,y_fib,[100]}).     927372692193078999175
>
> Now for performance, I've tested it  with Erlang R13A. My computer:
>   AMD phenom 9950 quad-core 2.6ghz processor, 4gb RAM, Windows 7 Beta build
> 7000 (32 bits).
> batch_test/3 will compare both versions with a sequence/list of Fibonacci
> numbers of length 10000 and 50000, 20 times each, and round the results
> showing how much faster the generator is (1.0 is equal).
>
>   26> generators:batch_test(sum, 10000, 20).
>   0.9438942747654998
>   27> generators:batch_test(sum, 50000, 20).
>   2.406425935803285
>   28> generators:batch_test(fold, 10000, 20).
>   0.6706634763086374
>   29> generators:batch_test(fold, 50000, 20).
>   1.3918125332818239
>
> This seems to show that using a generator scales much better, while
> becoming faster as the list grows and becomes expansive to generate and
> carry around.
>
> The real advantage, though, is the fact that using a function that won't
> accumulate lots of data (sum, folds, foreach, all, any, nth, etc.) will be
> done with constant memory. This is especially good for small systems or
> scaling as a whole.
>
> The testing I've done is minimal, and while I don't expect the Erlang
> community to jump on the idea and adopt it, I'm one of these people who'd
> like to eventually use this in production code. What kind of stuff would you
> see missing, or likely to cause trouble in the current implementation?
> What ways would I have to test this giving better results?
>
> Any other comments, tips? Thanks in advance! (Have a nice day also.)
>
> -module(generators).
> -author([{name,"Fred T-H"},{email, "mononcqc@REDACTED"}]).
> -description("An example of generators implemented in erlang,
>  allowing to generate the equivalent of lists with constant memory.").
> -vsn(0.0).
> -export([t_fib/1, y_fib/1, sum/1, fold/3, test/2, batch_test/3]).
>
> % macros to make the creation of generators a tad more user-friendly.
> -define(yield(X), receive {Pid, next} -> Pid ! {self(), X} end).
> -define(last_yield(X), receive {Pid, next} -> Pid ! {self(), X, done} end).
> %-compile(export_all).
>
> %% Tail-recursive fibonacci function.
> t_fib(1) -> [1];
> t_fib(2) -> [1,1];
> t_fib(L) -> t_fib(3,L,[1,1]).
>
> t_fib(L,L,[N1,N2|_]=F) ->
>    lists:reverse([N1+N2|F]);
> t_fib(C, L, [N1,N2|_]=F) ->
>    t_fib(C+1,L,[N1+N2|F]).
>
> %% Generator-like tail-recursive fibonacci function.
> y_fib(L) ->
>    y_fib(1,L,1,0).
>
> y_fib(L,L,N1,N2) ->
>    ?last_yield(N1+N2);
> y_fib(C, L, N1,N2) ->
>    N = N1+N2,
>    ?yield(N),
>    y_fib(C+1,L,N2,N).
>
> %% what the lists:sum version for generators could look like
> sum({M,Gen,Args}) ->
>    Pid = spawn(M, Gen, Args),
>    sum(Pid,0).
> sum(Pid,S) ->
>    Pid ! {self(), next},
>    receive
>        {Pid, N} -> sum(Pid, S+N);
>        {Pid, N, done} -> S+N
>    end.
>
> %% ... and lists:foldl/3 (no foldr possible with generators).
> fold(F, Acc,{M,Gen,Args}) when is_function(F,2) ->
>    Pid = spawn(M,Gen,Args),
>    fold(F, Acc, Pid);
> fold(F, Acc, Pid) ->
>    Pid ! {self(), next},
>    receive
>        {Pid, N} -> fold(F, F(N,Acc), Pid);
>        {Pid, N, done} -> F(N,Acc)
>    end.
>
> %% Comparing execution times of lists vs generators
> test(fold,N) ->
>    erlang:statistics(wall_clock),
>    F = fun(X, Acc) ->
>          if X rem 2 == 0 -> Acc+1;
>             true -> Acc
>          end
>        end,
>    ?MODULE:fold(F, 0, {?MODULE, y_fib, [N]}),
>    io:format("generator ok~n"),
>    {_, T1} = erlang:statistics(wall_clock),
>    lists:foldl(F, 0, t_fib(N)),
>    io:format("list ok~n"),
>    {_, T2} = erlang:statistics(wall_clock),
>    if T1>0 -> {T1,T2, T2/T1};
>       T2 == T1, T1 == 0 -> {T1,T2,1};
>       true -> {T1,T2, '?'}
>    end;
> test(sum, N) ->
>    erlang:statistics(wall_clock),
>    ?MODULE:sum({?MODULE, y_fib, [N]}),
>    io:format("generator ok~n"),
>    {_, T1} = erlang:statistics(wall_clock),
>    lists:sum(t_fib(N)),
>    io:format("list ok~n"),
>    {_, T2} = erlang:statistics(wall_clock),
>    if T1>0 -> {T1,T2, T2/T1};
>       T2 == T1, T1 == 0 -> {T1,T2,1};
>       true -> {T1,T2, '?'}
>    end.
>
> %% Testing lists vs generators many times.
> batch_test(Type, N, Lim) ->
>    batch_test(Type, N, 0, Lim, 0).
> batch_test(_Type,_N,Lim,Lim,Acc) ->
>    Acc/Lim;
> batch_test(Type,N,Ct,Lim,Acc) ->
>    {_,_,X} = test(Type,N),
>    batch_test(Type,N,Ct+1,Lim,Acc+X).
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>


-- 
Tony Arcieri
medioh.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20090322/fa97976e/attachment.htm>