[erlang-questions] Python Generators In Erlang and constant memory usage

Fri Mar 20 09:21:31 CET 2009

I admit not read throughfully your post, but I think you're looking for continuations.

I've done continuous based fibonacci iterator in Python and Erlang so it should be working solution.

But your generator in continuations:

-module(continuation).

-export([gen/1]).

gen(N) ->
  {0, fun() -> gen_cont(0, N) end}.

gen_cont(I, N) ->
  case I < N of
    true -> {I + 1, fun() -> gen_cont(I + 1, N) end};
    false -> ok
  end.

How it works...
7> {_, Fun1} = continuation:gen(3).
{0,#Fun<continuation.0.17007156>}
8> {_, Fun2} = Fun1().
{1,#Fun<continuation.1.15616427>}
9> {_, Fun3} = Fun2().
{2,#Fun<continuation.1.15616427>}
10> {_, Fun4} = Fun3().
{3,#Fun<continuation.1.15616427>}
11> {_, Fun5} = Fun4().

----- Alkuperäinen viesti -----
Lähettäjä: Frederic T-H <mononcqc@REDACTED>
Päiväys: perjantai, maaliskuu 20, 2009 3:28 am
Aihe: [erlang-questions] Python Generators In Erlang and constant memory usage
Vastaanottaja: erlang-questions@REDACTED

> First time posting here, I've been reading this mailing lists 
> for a few 
> months now, programming Erlang for a few months too.
> 
> Python and its most functionnal parts is what eventually brought 
> me to 
> Erlang. One feature that I loved in Python and instantly noticed 
> was not 
> available in Erlang was generators. Generators are basically 
> normal 
> functions, except they return an iterator, letting you fetch 
> each result 
> one by one, effectively letting you work with constant memory. 
> An 
> example of the way Python does it:
> 
>     def gen(n):
>         i = 0
>         while i<n:
>             yield i
>             i+=1
>         return
> 
>     x = gen(2)
>     x.next() # returns 0
>     x.next() # returns 1
>     x.next()  # throws an exception
> 
> In python, using yield creates a 'next' method, which is 
> automatically 
> called by most functions that can iterate over text, lists, 
> tuples, 
> dictionaries, etc.
> 
> The idea is to not load all the results from a function at once 
> and pass 
> it around, but to send each part individually.
> 
> As an example, say I have a result set consisting of 1000x5kb 
> tuples 
> (whatever they contain) which I want to map a function over. The 
> way 
> Erlang works, you'd need to have a list that is 5000kb in size 
> in memory 
> and manipulate it every single time the mapped function iterates 
> over it.
> 
> The end goal is to attempt to bypass having to evaluate the 
> whole result 
> set at once (heavy) and instead get each element individually 
> (lighter). 
> This is especially good for smaller systems like phones, web 
> tablets 
> (Nokia) or really anything with limited amounts of RAM.
> 
> While possible to implement in Erlang, generators are quite 
> problematic 
> because every list BIF at the moment only operates on complete 
> lists. 
> This means adding generators in Erlang would require one to do 
> something 
> like duplicate the lists module.
> 
> This may not be the first time this whole idea is discussed, but 
> fast 
> googling didn't reveal much so I've let myself write a small 
> naive and 
> simple prototype of what the implementation could be.
> 
> The idea is quite simple: Have a recursive function as a 
> different 
> spawned process waiting to receive {Pid, next} before returning 
> data. 
> I've attached the file demonstrating the implementation, but in 
> case I 
> got it wrong, I also copy/pasted it there: 
> http://pastebin.com/f31ae9731
> What the implementation does:
> 
>     - define tail-recursive fibonacci functions, 
> a normal one and a
>     generator version
>     - define mockups for the sum and fold (foldl) 
> functions from the
>     list module adapted to generators
>     - short unscientific tests to compare performances.
> 
> What it doesn't do:
> 
>     - consider any kind of safety in case there 
> is an error in a generator
>     - contain anything past what you'd do for a 
> proof of concept.
> 
> 
> The way it works, a normal tail-recursive way to get all the 
> fibonacci 
> numbers until the nth one could be done the following way:
> 
>     t_fib(1) -> [1];
>     t_fib(2) -> [1,1];
>     t_fib(L) -> t_fib(3,L,[1,1]).
> 
>     t_fib(L,L,[N1,N2|_]=F) ->
>         lists:reverse([N1+N2|F]);
>     t_fib(C, L, [N1,N2|_]=F) ->
>         t_fib(C+1,L,[N1+N2|F]).
> 
> That's not the prettiest function in the world, but it still 
> does the 
> job. Here's how it would be as a generator (with the macros I 
> have 
> defined in the file):
> 
>     y_fib(L) ->
>         y_fib(1, L, 1, 0).
>     y_fib(L, L, N1 ,N2) ->
>         ?last_yield(N1+N2);
>     y_fib(C, L, N1, N2) ->
>         N = N1+N2,
>         ?yield(N),
>         y_fib(C+1, L, N2, N).
> 
> The code overhead one gets writing it that way is minimal and 
> denies the 
> need to use a list as an accumulator.
> 
> The generator sum function is written the following way:
> 
>     sum({M,Gen,Args}) ->
>         Pid = spawn(M, Gen, Args),
>         sum(Pid,0).
>     sum(Pid,S) ->
>         Pid ! {self(), next},
>         receive
>             {Pid, N} -> sum(Pid, S+N);
>             {Pid, N, done} -> S+N
>         end.
> 
> Called with the custom sum function in the generators module 
> (This could 
> be made better, I agree):
> 
>      > 
> generators:sum({generators,y_fib,[100]}).  
>     927372692193078999175
> 
> Now for performance, I've tested it  with Erlang R13A. My 
> computer:    AMD phenom 9950 quad-core 2.6ghz 
> processor, 4gb RAM, Windows 7 Beta 
> build 7000 (32 bits).
> batch_test/3 will compare both versions with a sequence/list of 
> Fibonacci numbers of length 10000 and 50000, 20 times each, and 
> round 
> the results showing how much faster the generator is (1.0 is equal).
> 
>     26> generators:batch_test(sum, 10000, 20).
>     0.9438942747654998
>     27> generators:batch_test(sum, 50000, 20).
>     2.406425935803285
>     28> generators:batch_test(fold, 10000, 20).
>     0.6706634763086374
>     29> generators:batch_test(fold, 50000, 20).
>     1.3918125332818239
> 
> This seems to show that using a generator scales much better, 
> while 
> becoming faster as the list grows and becomes expansive to 
> generate and 
> carry around.
> 
> The real advantage, though, is the fact that using a function 
> that won't 
> accumulate lots of data (sum, folds, foreach, all, any, nth, 
> etc.) will 
> be done with constant memory. This is especially good for small 
> systems 
> or scaling as a whole.
> 
> The testing I've done is minimal, and while I don't expect the 
> Erlang 
> community to jump on the idea and adopt it, I'm one of these 
> people 
> who'd like to eventually use this in production code. What kind 
> of stuff 
> would you see missing, or likely to cause trouble in the current 
> implementation?
> What ways would I have to test this giving better results?
> 
> Any other comments, tips? Thanks in advance! (Have a nice day also.)
>