[erlang-questions] Python Generators In Erlang and constant memory usage
Frederic T-H
mononcqc@REDACTED
Fri Mar 20 02:06:35 CET 2009
First time posting here, I've been reading this mailing lists for a few
months now, programming Erlang for a few months too.
Python and its most functionnal parts is what eventually brought me to
Erlang. One feature that I loved in Python and instantly noticed was not
available in Erlang was generators. Generators are basically normal
functions, except they return an iterator, letting you fetch each result
one by one, effectively letting you work with constant memory. An
example of the way Python does it:
def gen(n):
i = 0
while i<n:
yield i
i+=1
return
x = gen(2)
x.next() # returns 0
x.next() # returns 1
x.next() # throws an exception
In python, using yield creates a 'next' method, which is automatically
called by most functions that can iterate over text, lists, tuples,
dictionaries, etc.
The idea is to not load all the results from a function at once and pass
it around, but to send each part individually.
As an example, say I have a result set consisting of 1000x5kb tuples
(whatever they contain) which I want to map a function over. The way
Erlang works, you'd need to have a list that is 5000kb in size in memory
and manipulate it every single time the mapped function iterates over it.
The end goal is to attempt to bypass having to evaluate the whole result
set at once (heavy) and instead get each element individually (lighter).
This is especially good for smaller systems like phones, web tablets
(Nokia) or really anything with limited amounts of RAM.
While possible to implement in Erlang, generators are quite problematic
because every list BIF at the moment only operates on complete lists.
This means adding generators in Erlang would require one to do something
like duplicate the lists module.
This may not be the first time this whole idea is discussed, but fast
googling didn't reveal much so I've let myself write a small naive and
simple prototype of what the implementation could be.
The idea is quite simple: Have a recursive function as a different
spawned process waiting to receive {Pid, next} before returning data.
I've attached the file demonstrating the implementation, but in case I
got it wrong, I also copy/pasted it there: http://pastebin.com/f31ae9731
What the implementation does:
- define tail-recursive fibonacci functions, a normal one and a
generator version
- define mockups for the sum and fold (foldl) functions from the
list module adapted to generators
- short unscientific tests to compare performances.
What it doesn't do:
- consider any kind of safety in case there is an error in a generator
- contain anything past what you'd do for a proof of concept.
The way it works, a normal tail-recursive way to get all the fibonacci
numbers until the nth one could be done the following way:
t_fib(1) -> [1];
t_fib(2) -> [1,1];
t_fib(L) -> t_fib(3,L,[1,1]).
t_fib(L,L,[N1,N2|_]=F) ->
lists:reverse([N1+N2|F]);
t_fib(C, L, [N1,N2|_]=F) ->
t_fib(C+1,L,[N1+N2|F]).
That's not the prettiest function in the world, but it still does the
job. Here's how it would be as a generator (with the macros I have
defined in the file):
y_fib(L) ->
y_fib(1, L, 1, 0).
y_fib(L, L, N1 ,N2) ->
?last_yield(N1+N2);
y_fib(C, L, N1, N2) ->
N = N1+N2,
?yield(N),
y_fib(C+1, L, N2, N).
The code overhead one gets writing it that way is minimal and denies the
need to use a list as an accumulator.
The generator sum function is written the following way:
sum({M,Gen,Args}) ->
Pid = spawn(M, Gen, Args),
sum(Pid,0).
sum(Pid,S) ->
Pid ! {self(), next},
receive
{Pid, N} -> sum(Pid, S+N);
{Pid, N, done} -> S+N
end.
Called with the custom sum function in the generators module (This could
be made better, I agree):
> generators:sum({generators,y_fib,[100]}).
927372692193078999175
Now for performance, I've tested it with Erlang R13A. My computer:
AMD phenom 9950 quad-core 2.6ghz processor, 4gb RAM, Windows 7 Beta
build 7000 (32 bits).
batch_test/3 will compare both versions with a sequence/list of
Fibonacci numbers of length 10000 and 50000, 20 times each, and round
the results showing how much faster the generator is (1.0 is equal).
26> generators:batch_test(sum, 10000, 20).
0.9438942747654998
27> generators:batch_test(sum, 50000, 20).
2.406425935803285
28> generators:batch_test(fold, 10000, 20).
0.6706634763086374
29> generators:batch_test(fold, 50000, 20).
1.3918125332818239
This seems to show that using a generator scales much better, while
becoming faster as the list grows and becomes expansive to generate and
carry around.
The real advantage, though, is the fact that using a function that won't
accumulate lots of data (sum, folds, foreach, all, any, nth, etc.) will
be done with constant memory. This is especially good for small systems
or scaling as a whole.
The testing I've done is minimal, and while I don't expect the Erlang
community to jump on the idea and adopt it, I'm one of these people
who'd like to eventually use this in production code. What kind of stuff
would you see missing, or likely to cause trouble in the current
implementation?
What ways would I have to test this giving better results?
Any other comments, tips? Thanks in advance! (Have a nice day also.)
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: generators.erl
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20090319/a45c746e/attachment.ksh>
More information about the erlang-questions
mailing list