[erlang-questions] Erlang shows its slow face!

Morten Krogh mk@REDACTED
Tue Nov 16 10:30:48 CET 2010


I couldn't wrap my head around why my prime number algorithm was so slow and had complexity N^2 or worse.
I realized that it was simply because I forgot to terminate some recursions as early as possible, hence doing an increasing number of irrelevant computations. 

Theoretically I had expected N*log^2(N) behavior or so, because the relevant powers of the prime numbers falls of like log(P), which should maybe give N*log(N) behavior, 
and then there is also the sorting in the end because the prime number algorithm will produce the triples in another order.

Anyway, I corrected the error, and see something like N*log^alpha(N) behavior.

The runtime for N = 300 is 300 microseconds, and for N = 1,000,000 8 seconds.

The final sorting of the result is actually a major overhead.


pythag6(N) when is_integer(N) ->
  Primes = sieve(N div 2),
  M1M2s = incorporate_primes([{1,1}], [], N, Primes),
  lists:usort(lists:flatten([ [{A,B,C}, {B,A,C}] || {M1, M2} <- M1M2s, M1 > M2, A <- [M1-M2], B <- [2*round(math:sqrt(M1*M2))], C <- [M1+M2], A+B+C =< N])).
incorporate_primes(Active_all, Passive_all, _N, []) ->
  Active_all ++ Passive_all;
incorporate_primes(Active_all, Passive_all, N, [P|Primes]) ->
  {Active, Passive} = incorporate_prime(Active_all, [], [], N, P),
  incorporate_primes(Active, Passive ++ Passive_all, N, Primes).

incorporate_prime([], Active, Passive, _N, _P) ->
  {Active, Passive};
incorporate_prime([{M1, M2}|M1M2s], Active, Passive, N, P) ->
  {EvensM1, OddsM1} = m_times_powers_of_p(M1, N div 2, P, even, [], []),
  {EvensM2, OddsM2} = m_times_powers_of_p(M2, N div 2, P, even, [], []),
  case {EvensM1, OddsM1, EvensM2, OddsM2} of
    {[_], [], [_], []} ->
      incorporate_prime(M1M2s, Active, [{M1,M2}| Passive], N, P);
    {[_], [_], [_], []} ->
      incorporate_prime(M1M2s, Active, [{M1,M2}| Passive], N, P);
    {[_], [], [_], [_]} ->
      incorporate_prime(M1M2s, Active, [{M1,M2}| Passive], N, P);
    _ ->
      Evens = [{X, Y} || X <- EvensM1, Y <- EvensM2],
      Odds = [{X, Y} || X <- OddsM1, Y <- OddsM2],
      incorporate_prime(M1M2s, Odds ++ Evens ++ Active, Passive, N, P)

m_times_powers_of_p(M, L, _P, _Parity, Evens, Odds) when M > L ->
  {Evens, Odds};
m_times_powers_of_p(M, L, P, even, Evens, Odds) ->
  m_times_powers_of_p(M*P, L, P, odd, [M|Evens], Odds);
m_times_powers_of_p(M, L, P, odd, Evens, Odds) ->
  m_times_powers_of_p(M*P, L, P, even, Evens, [M|Odds]).

sieve(N) when is_integer(N) ->

sieve(N, K) when K >= N ->
 [X || X <- lists:seq(2, N), erase(X) == undefined];
sieve(N, K) ->
  cross_off(K, K, N div K - 1),
  sieve(N, find_next_in_sieve(K + 1)).
cross_off(_K, _Current, 0) ->
cross_off(K, Current, Left) ->
  Next = Current + K,
  put(Next, out),
  cross_off(K, Next, Left - 1).

find_next_in_sieve(K) ->
  case get(K) of
    undefined ->
    _ ->

On Nov 16, 2010, at 2:55 AM, Richard O'Keefe wrote:

> On 16/11/2010, at 4:02 AM, Edmond Begumisa wrote:
>> Hello Richard,
>> I have a question...
>>> Erlang is a fine tool for jobs like this one.
>> Interesting... I would have never thought this. I would have instinctively reached for a language with mutable memory, thinking that this would make the most sense for combinatorial* work (I don't know this for fact, and I could be very wrong, but that's what instinct would tell me.) 
>> * I've been calling this permutation but I don't know if that's accurate.
> It isn't.  "Permutation" has a very specific meaning, which does not apply here.
> Let me quibble slightly:  *memoisation* can help a lot for some combinatorial
> algorithms, but mutable memory as such doesn't strike me as obviously useful.
> (Indeed, since mutable memory makes memoisation invalid, it can seriously get
> in your way.)  Good support for large integers helps a lot.
> Different algorithms may well have different needs.
>>> There is little point in optimising a bad algorithm.
>> Well put. But say you have an 'ok' algorithm. Not the best, but not blatantly flawed either.
> I have a C program to compute various things from a data set which consists of
> a hundred million lines, each containing four non-negative integers.  My
> program had to read it several times.  (The first pass determines the sizes
> of major areas; the second pass determines the size of minor areas within
> major areas; the third pass loads the data.)  It gets the effect of sorting
> without actually sorting (or more precisely, by count-sorting).  The first
> version of the program took 68 seconds per pass on the fastest machine I
> had access to.  Considering that wc takes 20 seconds to _not_ process the
> numbers in the file, that didn't seem too bad.  But bypassing stdio and using
> my own code let me get the time down to 6 seconds per pass, which means that
> I can afford to run the program a lot more often.
> That was a case where there was a _good_ algorithm, but a horrible library.
> (Time is linear in the size of the input; results depend on all of the input;
> up to a constant factor the algorithm is as good as it gets.)
> But in the case we are discussing here, the initial algorithm was O(N**3)
> when an O(N**2) algorithm -- in fact, several of them, there's even a
> Wikipedia article on enumerating Pythagorean triples -- was not hard to find.
>> So my question is: if version 1 isn't performing "reasonably" acceptably for Garcia's purpose, and version 1 isn't blatantly flawed.
> To me, version 1 *IS* blatantly flawed.
> In a pythagorean triple (a,b,c), a and b uniquely determine c,
> so it seems *obvious* that a quadratic algorithm should be sought.
> (It is not obvious that one *exists*, just that it should be sought.)
> Make no mistake:  I have used brute force algorithms quite happily myself.
> But I never blame the language for the performance I get when I do so!
>> Isn't this a strong indicator that he's using the wrong tool?
> Ah, but what precisely is "the tool" in question?
> - functional languages as such?
> - Erlang?
> - list comprehensions as such?
> - Erlang's current implementation of list comprehensions?
> Here are some more times.  Erlang R14A, MacBook Pro.
> 8> pyth:t(p3, 300).
> {p3,300,{length,146},{msec,10}}
> 9> pyth:t(p2, 300). 
> {p2,300,{length,146},{msec,210}}
> 10> pyth:t(p1, 300). 
> {p1,300,{length,146},{msec,970}}
> 11> pyth:t(p0, 300). 
> {p0,300,{length,146},{msec,420}}
> What's p0?
> It's the utterly naive cubic loop nest, with the loops written
> as function calls working on integers, with the only list in
> existence being the one that's being built.
> The time for p0 was 420 milliseconds in Erlang.
> On the same machine, the same function, the time was
> 91 milliseconds in SML/NJ.
>            SML/NJ  MLton   Erlang
> n=100 L= 34 s=0.004 m=0.004 e=0.020
> n=200 L= 86 s=0.028 m=0.027 e=0.130
> n=300 L=146 s=0.090 m=0.086 e=0.410
> n=400 L=210 s=0.213 m=0.202 e=0.980
> n=500 L=274 s=0.415 m=0.394 e=1.900
> We see that Erlang is about 5 times slower than SML.
> Here's the SML code.
> fun p0 n =
>    let fun fa     0 l = l
>          | fa     a l = fb n a l
>        and fb   0 a l = fa (a-1) l
>          | fb   b a l = fc n b a l
>        and fc 0 b a l = fb (b-1) a l
>          | fc c b a l = fc (c-1) b a (
>                            if a+b+c <= n andalso a*a+b*b = c*c
>                            then (a,b,c)::l else l)
>    in fa n []
>    end
> It's completely tail recursive, building the list from right to
> left.  *THIS* is the version that is strictly comparable to any
> C or C# code one might plausibly write; it allocates no
> storage not part of the result, all looping is done with "while"
> loops that compare and increment integers.  What's more, SML is
> strongly typed, and the Dialyzer should be able to handle this,
> so the compiler *knows* it is dealing with integers.
> I grant you that this code is not immediately clear.
> Clear code in SML might look like this:
> fun cat_tab l u f =     (* concatenate + tabulate *)
>    let fun loop i r = if i < l then r else loop (i-1) (f i @ r)
>     in loop u []
>    end                 (* _happens_ not to be in List *)
> fun p1 n =
>    cat_tab 1 n (fn a =>
>        cat_tab 1 n (fn b =>
>            cat_tab 1 n (fn c =>
>                if a+b+c <= n andalso a*a+b*b = c*c
>                then [(a,b,c)] else [])))
> What happens to the times?
> 	    (----------direct-----) (------cat_tab------)   original!
>            SML/NJ  MLton   Erlang  SML/NJ  Mlton   Erlang  Erlang
> n=100 L= 34 s=0.004 m=0.004 e=0.020 s=0.015 m=0.006 e=0.040 o=0.040
> n=200 L= 86 s=0.028 m=0.027 e=0.130 s=0.120 m=0.039 e=0.300 o=0.290
> n=300 L=146 s=0.090 m=0.086 e=0.410 s=0.403 m=0.128 e=0.950 o=0.980
> n=400 L=210 s=0.213 m=0.202 e=0.980 s=0.954 m=0.303 e=2.240 o=2.280
> n=500 L=274 s=0.415 m=0.394 e=1.900 s=1.857 m=0.591 e=4.300 o=4.400
> The MLton compiler is a whole-program compiler, and I've a
> strong suspicion that it is able to inline cat_tab, so it
> may be able to avoid building the answer twice.  I'm pretty
> sure that SML/NJ isn't inlining cat_tab, and is building the
> answer twice.  While the Erlang compiler _could_ be taught
> about cat_tab, it's not generally wonderful at inlining recursive
> procedures, which is why the rightmost column shows list
> comprehension not doing so well.
> There are two basic issues here:
> (1) The Erlang compiler doesn't really handle list comprehension as
>    native loops; it compiles list comprehensions to recursive
>    functions.  lists:seq(1, N) is not special syntax -- unlike Haskell --
>    and almost surely builds an actual list.
>    This could be changed, and maybe some day it will change.
> (2) Erlang is compiled incrementally to support hot loading and
>    (again in order to support hot loading) does not do much if any
>    cross-module inlining.
>    MLton is a whole-program optimising compiler, it can SML _are_
>    allowed to do cross-module inlining by the nature of the language.
>    This *can't* be changed without making Erlang less useful for its job.    
> Ah, but there's one more wrinkle!
>            SML/NJ  Erlang  SML/NJ     Smalltalk
> n=100 L= 34 s=0.004 e=0.020 s= 0.250   p1=0.016
> n=200 L= 86 s=0.028 e=0.130 s= 1.904   p1=0.109
> n=300 L=146 s=0.090 e=0.410 s= 6.567   p1=0.369
> n=400 L=210 s=0.213 e=0.980 s=15.581   p1=0.873
> n=500 L=274 s=0.415 e=1.900 s=30.218   p1=1.705 
> (3) Erlang integer arithmetic is *not* limited to some ridiculous
>    size like 32 bits, it is _semantically_ unbounded integer
>    arithmetic, with a space-efficient representation and fast
>    calculation path for small integers.  To get something
>    comparable in SML, you have to use the type IntInf.int
>    instead of Int.int.  And when you do *that*, Erlang really
>    really shines.
>    The difference here is that if you use int in C or Java or C#
>    or SML, then you'll find a positive integer n such that n+1 < 0.
>    That kind of nonsense simply doesn't happen in Erlang.
>    It comes with a price, but the price of *not* using a Lispy
>    (or Smalltalky, or Erlangy) approach to integers is the price
>    of getting wrong answers without knowing.
>    The Smalltalk column was done using my Smalltalk-via-C compiler.
>    It might be possible to go a factor of 2 faster using machine-
>    specific assembly code, but what we seem to be looking at in
>    the Erlang column really does seem to be no more than the
>    price of arithmetic that gives correct results.
> And by the way, no, I am not advocating writing code like p0 above
> *UNTIL* it has been determined that the function is a performance
> problem and that the algorithm is good.  Like I said, you should
> stick with the clean high level code until you understand the
> problem space better.
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED

More information about the erlang-questions mailing list