surprising result with hipe compilation

Wed Nov 5 17:37:53 CET 2003

Ulf Wiger wrote:
  > I called timer:tc(...) several times and picked the fastest one.
  > 
  > >If one repeats the timer:tc call, the runtime for both BEAM and
  > >native code is reduced to normal levels, and native code is
  > >consistently (for your code) faster than BEAM.
  > 
  > This is not what happens on my machine (a 400 MHz Ultra 10):
  > .... DELETED ....
  > When compiled with hipe, the code runs significantly slower.

Ulf, I have looked at your program and have trouble obtaining the
behaviour that you are observing.  When compiled to native code,
the code is consistently 20-40 % faster than BEAM (and arguably
more than that).  There is indeed a variation in the times that
are reported; see below.

What I get here is:

1. ON SPARC
-----------
@hamberg [~/HiPE/tests/uffe] uname -a
SunOS hamberg.it.uu.se 5.9 Generic_112233-08 sun4u sparc SUNW,Ultra-80
@hamberg [~/HiPE/tests/uffe] ~/HiPE/otp/bin/erlc *.erl
@hamberg [~/HiPE/tests/uffe] ~/HiPE/otp/bin/erl
Erlang (BEAM) emulator version 5.4.2003.10.26 [source] [hipe]

Eshell V5.4.2003.10.26  (abort with ^G)
1> lists:sort([element(1,timer:tc(test,run,[])) || _ <- lists:seq(1,15) ]).
[4722,4789,4791,4824,4835,4859,4873,4943,4958,4960,5035,5394,5444,5654,101111]
2> lists:sort([element(1,timer:tc(test,run,[])) || _ <- lists:seq(1,15) ]).
[4594,4699,4699,4761,4785,4816,4840,4847,4899,4900,4904,4982,5068,5569,6389]
3> lists:sort([element(1,timer:tc(test,run,[])) || _ <- lists:seq(1,15) ]).
[4591,4650,4711,4732,4782,4787,4792,4797,4838,4846,4871,4958,4980,5030,5179]
4> halt().

hmberg [~/HiPE/tests/uffe] ~/HiPE/otp/bin/erlc +native *.erl
@hamberg [~/HiPE/tests/uffe] ~/HiPE/otp/bin/erl
Erlang (BEAM) emulator version 5.4.2003.10.26 [source] [hipe]

Eshell V5.4.2003.10.26  (abort with ^G)
1> lists:sort([element(1,timer:tc(test,run,[])) || _ <- lists:seq(1,15) ]).
[3225,3265,3314,3354,3379,3405,3427,3475,3500,3557,3590,3638,3645,3987,295080]
2> lists:sort([element(1,timer:tc(test,run,[])) || _ <- lists:seq(1,15) ]).
[3194,3288,3328,3432,3480,3483,3502,3525,3532,3534,3540,3719,3884,3972,4086]
3> lists:sort([element(1,timer:tc(test,run,[])) || _ <- lists:seq(1,15) ]).
[3190,3231,3279,3348,3359,3365,3419,3461,3474,3574,3685,3779,3990,4213,4446]

Things to notice
  - loading native code takes 3x more time (295080 vs 101111)
  - one can argue that Ulf's benchmark is indeed a random-number
    generator, but one can more or less claim that:
       - Times for BEAM are in the range [4591 - 5000]
       - Times for HiPE are in the range [3190 - 3700]

2. ON x86
-----------
@fan [~/HiPE/tests/uffe] uname -a
Linux fan.it.uu.se 2.4.20-20.9custom #1 SMP Tue Nov 4 21:55:46 CET 2003 i686 i686 i386 GNU/Linux
@fan [~/HiPE/tests/uffe] ~/HiPE/otp-x86/bin/erlc *.erl
@fan [~/HiPE/tests/uffe] ~/HiPE/otp-x86/bin/erl       
Erlang (BEAM) emulator version 5.4.2003.10.26 [source] [hipe]

Eshell V5.4.2003.10.26  (abort with ^G)
1> lists:sort([element(1,timer:tc(test,run,[])) || _ <- lists:seq(1,15) ]).
[1304,1316,1319,1321,1328,1332,1332,1353,1356,1369,1392,1422,1439,1619,14387]
2> lists:sort([element(1,timer:tc(test,run,[])) || _ <- lists:seq(1,15) ]).
[1275,1276,1292,1316,1328,1338,1348,1360,1376,1380,1384,1409,1421,1461,1590]
3> lists:sort([element(1,timer:tc(test,run,[])) || _ <- lists:seq(1,15) ]).
[1260,1261,1270,1291,1304,1318,1327,1342,1348,1348,1359,1393,1413,1483,1522]
4> halt().

@fan [~/HiPE/tests/uffe] ~/HiPE/otp-x86/bin/erlc +native *.erl
@fan [~/HiPE/tests/uffe] ~/HiPE/otp-x86/bin/erl               
Erlang (BEAM) emulator version 5.4.2003.10.26 [source] [hipe]

Eshell V5.4.2003.10.26  (abort with ^G)
1> lists:sort([element(1,timer:tc(test,run,[])) || _ <- lists:seq(1,15) ]).
[843,865,866,873,880,894,897,902,908,917,919,923,951,979,52391]
2> lists:sort([element(1,timer:tc(test,run,[])) || _ <- lists:seq(1,15) ]).
[863,872,917,920,934,941,957,958,959,960,975,983,1043,1078,1357]
3> lists:sort([element(1,timer:tc(test,run,[])) || _ <- lists:seq(1,15) ]).
[857,859,870,871,881,884,895,906,911,917,931,936,950,962,1091]

More or less we get a similar picture here.
       - Times for BEAM are in the range [1260 - 1600]
       - Times for HiPE are in the range [ 850 - 1100]

Some more comments:
  - The benchmark reads data from a file which is handled as a stream
    Performing I/O can be give a big flactuation in times.  Ideally,
    the benchmark should be re-written so that the data is read once
    from the file (converted to a list or binary), and the time to
    process the data is reported.
  - timer:tc is NOT the best possible way to measure time;
    ideally, some more accurate time measurements should be used.

Kostis