[erlang-questions] Strange optimization result

Sun Oct 21 22:12:20 CEST 2007

On 10/21/07, Anders Nygren <anders.nygren@REDACTED> wrote:
>
> On 10/21/07, Steve Vinoski <vinoski@REDACTED> wrote:
> > On 21 Oct 2007 11:41:12 +0200, Bjorn Gustavsson <bjorn@REDACTED>
> > wrote:
> > > "Anders Nygren" <anders.nygren@REDACTED> writes:
> > > >
> > > > My only idea is that this is caused by some pathological CPU cache
> > > > behaviour. Does anyone have a better explanation?
> > >
> > > Could be. I have another GUESS: using a tuple instead of dict will
> > > result in a smaller heap size for the spawned process, which will lead
> > > to more frequent garbage collections. You could use the spawn_opt()
> > > BIF with the min_heap_size option.
> > >
> > > To get a similar heap size as when using a dict, use
> > >
> > >         erts_debug:size(wfbm3:init()).
> > >
> > > which is 1120.
> > >
> >
> > I didn't try this or Bob's suggestion of simulating a tuple, but I did
> > eliminate a dict lookup in the code that finds shift values, and also
> > changed the macros to use hard-coded string lengths (since the strings
> are
> > fixed) rather than recalculating them with length/1, and got another 25%
> > speedup:
> >
> > <http://steve.vinoski.net/blog/2007/10/21/faster-wf-still/>
> >
> >
>
> So I then took Steve's latest and
> - changed to using a tuple instead of dict and VICTORY :)

<snip/>

Anders sent me his code and I ran it on my 8-core Linux box, with the
following performance results. VICTORY is right! :-)

real    0m1.904s
user    0m7.917s
sys     0m1.185s

Like I mentioned to Anders in private email, it's nice to have someone more
experienced with Erlang finally taking a look at this; I'm still a relative
newbie.
One thing I've liked about this entire exercise is that the early attempts
at solving Tim Bray's Wide Finder in Erlang were taking minutes to execute
and were providing only partial answers. Several of us then started
whittling away at it, and because of the richness of the language, we had a
variety of different avenues to explore. Over time, we've vastly increased
the performance of our solutions. Anders's solution now beats Ruby on the
same machine by about 0.3s, and because of the way it uses multiple cores,
it will likely execute extremely quickly and efficiently when Tim gets a
chance to try it on his T5120.

Yes, fast solutions in other languages were quickly found, but those had
almost nowhere to go beyond their initial forms in terms of improvement, not
because they were already so fast, but because the languages ran out of
alternatives. This is especially true when it comes to taking advantage of
the T5120's many cores. I'm a fan of many languages, including Ruby, Python,
Perl, and C++, all of which have figured prominently in the collection of
various Wide Finder solutions. But for my money, Erlang has fulfilled Tim's
original wishes the best, which is to take the best possible advantage of
all those cores.

Anders, I look forward to even more speed when you finish those GC tweaks!
:-)

thanks,
--steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20071021/efb13931/attachment.htm>