[erlang-questions] Erlang shows its slow face!

Tue Nov 16 14:52:54 CET 2010

Thanks very much for your insight. You lost me at some point but I'm  
basically with you (I hope!)

This thread has been very, very useful and educational for me. I'm trying  
some of the algo's presented in C for further study and comparision. It's  
clearer now that my assumptions about what should and should not be  
attempted in Erlang were completely wrong.

Although that does introduce some confusion since I have to now think  
harder! The naive decisions were much easier to make :)

There is a particular side project I wanted to do where I instinctively  
decided certain parts would be out-sourced to C/C++. As a result of this  
thread, I'm not so sure anymore. Venturing outside the emulator might not  
be worth it. I'll have to look much closer.

One more question though! From real experiences, are there specific  
problems that you would just NEVER EVER attempt in Erlang? (I know of the  
list on the FAQ but even that seems shaky now.)

- Edmond -

PS: When this thread first appeared, I chose to ignore it, thinking  
"that's a flame thread with nothing worth reading." This mailing list is  
unique in resisting flame temptation and sticking to the issues. Other  
lists are not like that :)

On Tue, 16 Nov 2010 12:55:01 +1100, Richard O'Keefe <ok@REDACTED>  
wrote:

>
> On 16/11/2010, at 4:02 AM, Edmond Begumisa wrote:
>
>> Hello Richard,
>>
>> I have a question...
>>
>>> Erlang is a fine tool for jobs like this one.
>>
>> Interesting... I would have never thought this. I would have  
>> instinctively reached for a language with mutable memory, thinking that  
>> this would make the most sense for combinatorial* work (I don't know  
>> this for fact, and I could be very wrong, but that's what instinct  
>> would tell me.)
>
>> * I've been calling this permutation but I don't know if that's  
>> accurate.
>
> It isn't.  "Permutation" has a very specific meaning, which does not  
> apply here.
>
> Let me quibble slightly:  *memoisation* can help a lot for some  
> combinatorial
> algorithms, but mutable memory as such doesn't strike me as obviously  
> useful.
> (Indeed, since mutable memory makes memoisation invalid, it can  
> seriously get
> in your way.)  Good support for large integers helps a lot.
>
> Different algorithms may well have different needs.
>>
>>> There is little point in optimising a bad algorithm.
>>
>> Well put. But say you have an 'ok' algorithm. Not the best, but not  
>> blatantly flawed either.
>
> I have a C program to compute various things from a data set which  
> consists of
> a hundred million lines, each containing four non-negative integers.  My
> program had to read it several times.  (The first pass determines the  
> sizes
> of major areas; the second pass determines the size of minor areas within
> major areas; the third pass loads the data.)  It gets the effect of  
> sorting
> without actually sorting (or more precisely, by count-sorting).  The  
> first
> version of the program took 68 seconds per pass on the fastest machine I
> had access to.  Considering that wc takes 20 seconds to _not_ process the
> numbers in the file, that didn't seem too bad.  But bypassing stdio and  
> using
> my own code let me get the time down to 6 seconds per pass, which means  
> that
> I can afford to run the program a lot more often.
>
> That was a case where there was a _good_ algorithm, but a horrible  
> library.
> (Time is linear in the size of the input; results depend on all of the  
> input;
> up to a constant factor the algorithm is as good as it gets.)
>
> But in the case we are discussing here, the initial algorithm was O(N**3)
> when an O(N**2) algorithm -- in fact, several of them, there's even a
> Wikipedia article on enumerating Pythagorean triples -- was not hard to  
> find.
>
>>
>> So my question is: if version 1 isn't performing "reasonably"  
>> acceptably for Garcia's purpose, and version 1 isn't blatantly flawed.
>
> To me, version 1 *IS* blatantly flawed.
> In a pythagorean triple (a,b,c), a and b uniquely determine c,
> so it seems *obvious* that a quadratic algorithm should be sought.
> (It is not obvious that one *exists*, just that it should be sought.)
>
> Make no mistake:  I have used brute force algorithms quite happily  
> myself.
> But I never blame the language for the performance I get when I do so!
>
>
>> Isn't this a strong indicator that he's using the wrong tool?
>
> Ah, but what precisely is "the tool" in question?
> - functional languages as such?
> - Erlang?
> - list comprehensions as such?
> - Erlang's current implementation of list comprehensions?
>
> Here are some more times.  Erlang R14A, MacBook Pro.
>
> 8> pyth:t(p3, 300).
> {p3,300,{length,146},{msec,10}}
> 9> pyth:t(p2, 300).
> {p2,300,{length,146},{msec,210}}
> 10> pyth:t(p1, 300).
> {p1,300,{length,146},{msec,970}}
> 11> pyth:t(p0, 300).
> {p0,300,{length,146},{msec,420}}
>
> What's p0?
> It's the utterly naive cubic loop nest, with the loops written
> as function calls working on integers, with the only list in
> existence being the one that's being built.
>
> The time for p0 was 420 milliseconds in Erlang.
> On the same machine, the same function, the time was
> 91 milliseconds in SML/NJ.
>             SML/NJ  MLton   Erlang
> n=100 L= 34 s=0.004 m=0.004 e=0.020
> n=200 L= 86 s=0.028 m=0.027 e=0.130
> n=300 L=146 s=0.090 m=0.086 e=0.410
> n=400 L=210 s=0.213 m=0.202 e=0.980
> n=500 L=274 s=0.415 m=0.394 e=1.900
>
> We see that Erlang is about 5 times slower than SML.
> Here's the SML code.
>
> fun p0 n =
>     let fun fa     0 l = l
>           | fa     a l = fb n a l
>         and fb   0 a l = fa (a-1) l
>           | fb   b a l = fc n b a l
>         and fc 0 b a l = fb (b-1) a l
>           | fc c b a l = fc (c-1) b a (
>                             if a+b+c <= n andalso a*a+b*b = c*c
>                             then (a,b,c)::l else l)
>     in fa n []
>     end
>
> It's completely tail recursive, building the list from right to
> left.  *THIS* is the version that is strictly comparable to any
> C or C# code one might plausibly write; it allocates no
> storage not part of the result, all looping is done with "while"
> loops that compare and increment integers.  What's more, SML is
> strongly typed, and the Dialyzer should be able to handle this,
> so the compiler *knows* it is dealing with integers.
>
> I grant you that this code is not immediately clear.
> Clear code in SML might look like this:
>
> fun cat_tab l u f =     (* concatenate + tabulate *)
>     let fun loop i r = if i < l then r else loop (i-1) (f i @ r)
>      in loop u []
>     end                 (* _happens_ not to be in List *)
>
> fun p1 n =
>     cat_tab 1 n (fn a =>
>         cat_tab 1 n (fn b =>
>             cat_tab 1 n (fn c =>
>                 if a+b+c <= n andalso a*a+b*b = c*c
>                 then [(a,b,c)] else [])))
>
> What happens to the times?
> 	    (----------direct-----) (------cat_tab------)   original!
>             SML/NJ  MLton   Erlang  SML/NJ  Mlton   Erlang  Erlang
> n=100 L= 34 s=0.004 m=0.004 e=0.020 s=0.015 m=0.006 e=0.040 o=0.040
> n=200 L= 86 s=0.028 m=0.027 e=0.130 s=0.120 m=0.039 e=0.300 o=0.290
> n=300 L=146 s=0.090 m=0.086 e=0.410 s=0.403 m=0.128 e=0.950 o=0.980
> n=400 L=210 s=0.213 m=0.202 e=0.980 s=0.954 m=0.303 e=2.240 o=2.280
> n=500 L=274 s=0.415 m=0.394 e=1.900 s=1.857 m=0.591 e=4.300 o=4.400
>
> The MLton compiler is a whole-program compiler, and I've a
> strong suspicion that it is able to inline cat_tab, so it
> may be able to avoid building the answer twice.  I'm pretty
> sure that SML/NJ isn't inlining cat_tab, and is building the
> answer twice.  While the Erlang compiler _could_ be taught
> about cat_tab, it's not generally wonderful at inlining recursive
> procedures, which is why the rightmost column shows list
> comprehension not doing so well.
>
> There are two basic issues here:
> (1) The Erlang compiler doesn't really handle list comprehension as
>     native loops; it compiles list comprehensions to recursive
>     functions.  lists:seq(1, N) is not special syntax -- unlike Haskell  
> --
>     and almost surely builds an actual list.
>
>     This could be changed, and maybe some day it will change.
>
> (2) Erlang is compiled incrementally to support hot loading and
>     (again in order to support hot loading) does not do much if any
>     cross-module inlining.
>     MLton is a whole-program optimising compiler, it can SML _are_
>     allowed to do cross-module inlining by the nature of the language.
>
>     This *can't* be changed without making Erlang less useful for its  
> job.
>
> Ah, but there's one more wrinkle!
>
>             SML/NJ  Erlang  SML/NJ     Smalltalk
> n=100 L= 34 s=0.004 e=0.020 s= 0.250   p1=0.016
> n=200 L= 86 s=0.028 e=0.130 s= 1.904   p1=0.109
> n=300 L=146 s=0.090 e=0.410 s= 6.567   p1=0.369
> n=400 L=210 s=0.213 e=0.980 s=15.581   p1=0.873
> n=500 L=274 s=0.415 e=1.900 s=30.218   p1=1.705
>
> (3) Erlang integer arithmetic is *not* limited to some ridiculous
>     size like 32 bits, it is _semantically_ unbounded integer
>     arithmetic, with a space-efficient representation and fast
>     calculation path for small integers.  To get something
>     comparable in SML, you have to use the type IntInf.int
>     instead of Int.int.  And when you do *that*, Erlang really
>     really shines.
>
>     The difference here is that if you use int in C or Java or C#
>     or SML, then you'll find a positive integer n such that n+1 < 0.
>     That kind of nonsense simply doesn't happen in Erlang.
>
>     It comes with a price, but the price of *not* using a Lispy
>     (or Smalltalky, or Erlangy) approach to integers is the price
>     of getting wrong answers without knowing.
>
>     The Smalltalk column was done using my Smalltalk-via-C compiler.
>     It might be possible to go a factor of 2 faster using machine-
>     specific assembly code, but what we seem to be looking at in
>     the Erlang column really does seem to be no more than the
>     price of arithmetic that gives correct results.
> And by the way, no, I am not advocating writing code like p0 above
> *UNTIL* it has been determined that the function is a performance
> problem and that the algorithm is good.  Like I said, you should
> stick with the clean high level code until you understand the
> problem space better.
>
>
>
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/