[erlang-questions] Why erlang's computing performance is enormously less than c++
Richard A. O'Keefe
ok@REDACTED
Mon Nov 14 01:14:49 CET 2016
On 11/11/16 8:26 PM, 谈广云 wrote:
> i campare the erlang's computing with c++
>
> erlang run 100000000 time the test_sum_0
>
> test_sum_0(N) ->
> bp_eva_delta([1,2,3,4],[3,4,5,6],[]),
> test_sum_0(N-1).
>
>
> bp_eva_delta([],_,L) ->
> lists:reverse(L);
> bp_eva_delta([O|Output],[S|Sigma],L) ->
> bp_eva_delta(Output,Sigma,[S * O * (1-O) |L]).
Here are some times I got.
Erlang (native compilation) : 10.1 seconds.
Erlang (unrolled loop) : 2.8 seconds.
Standard ML : 2.7 seconds.
Clean (default lazy lists) : 8.3 seconds.
Clean (unrolled strict data) : 3.0 seconds.
A fair comparison in C : 118.4 seconds.
The thing is that the C and Erlang code may be computing
the same function (technically they aren't), but they are
not doing it the same WAY, so the comparison is not a
comparison of LANGUAGES but a comparison of
*list processing* in one language with
*array processing* in another language.
When you compare Erlang with statically typed languages
doing the same thing (well, not quite) the same *way*
you find the numbers pleasantly close.
A list is made up of pairs.
A fairer analogue of this in C would be
struct Node {
struct Node *next;
int item;
};
struct Node dummy = {0,0};
struct Node *revloop(
struct Node *L,
struct Node *R
) {
while (L != &dummy) {
struct Node *N = malloc(sizeof *N);
N->next = R, N->item = L->item;
R = N, L = L->next;
}
return R;
}
struct Node *reverse(
struct Node *L
) {
return revloop(L, &dummy);
}
struct Node *bp_eva_delta(
struct Node *Output,
struct Node *Sigma
) {
struct Node *L = &dummy;
while (Output != &dummy && Sigma != &dummy) {
int O = Output->item, S = Sigma->item;
struct Node *N = malloc(sizeof *N);
N->next = L,
N->item = S * O * (1 - O);
L = N;
}
return reverse(L);
}
struct Node *cons(
int item,
struct Node *next
) {
struct Node *N = malloc(sizeof *N);
N->next = next, N->item = item;
return N;
}
void test_sum_0(
void
) {
struct Node *Output =
cons(1, cons(2, cons(3, cons(4, &dummy))));
struct Node *Sigma =
cons(3, cons(4, cons(5, cons(6, &dummy))));
struct Node *R;
int N;
for (N = 100*1000*1000; N > 0; N--) {
R = bp_eva_delta(Output, Sigma);
}
}
int main(void) {
clock_t t0, t1;
t0 = clock();
test_sum_0();
t1 = clock();
printf("%g\n", (t1-t0)/(double)CLOCKS_PER_SEC);
return 0;
}
> the erlang spend 29's , and c++ spend 2.78's.
>
> why the erlang is so slower than c++?
On the contrary, why is C so staggeringly slow compared
with Erlang, Clean, and SML? (On my desktop machine, that
is. On my laptop, it ran for a LONG time and then other
things started dying. Hint: no GC.)
There are at least five differences between your C++
and Erlang examples:
(1) List processing vs array processing.
(2) Memory allocation costs (malloc() can be S L O W).
(3) Static type system.
(4) Truncating arithmetic.
(5) Loop unrolling.
and there may be an issue of
(6) native code compilation vs emulated code.
The SML, Clean, C, and C++ programs use *truncating*
integer arithmetic. The Erlang program uses unbounded
integer arithmetic, with no prospect of overflow. It
takes extra time to be ready for that.
The fast Erlang code doesn't use a list, it uses an
*unrolled* list:
-type urlist(T) :: {T,T,T,T,urlist(T)}
| {T,T,T} | {T,T} | {T} | {}.
For example, {1,2,3,4,{5,6,7,8,{}}}.
The Erlang (unrolled data) code does this, with manual
loop unrolling. I have library code for unrolled
strict lists in Haskell, Clean, and SML, but not (yet)
for Erlang.
Thinking about unrolling is fair because this is something
that C and C++ compilers routinely do these days.
>
> Or I do not configure the right parameter?
Assuming we are using similar machines, it is possible that
your Erlang code was running emulated, not native.
More information about the erlang-questions
mailing list