[erlang-questions] Strange optimization result

Mon Oct 22 15:00:36 CEST 2007

On 10/22/07, Thomas Lindgren <thomasl_erlang@REDACTED> wrote:
>
> --- Steve Vinoski <vinoski@REDACTED> wrote:
>
> > Anders sent me his code and I ran it on my 8-core
> > Linux box, with the
> > following performance results. VICTORY is right! :-)
> >
> > real    0m1.904s
> > user    0m7.917s
> > sys     0m1.185s
> >
> > Like I mentioned to Anders in private email, it's
> > nice to have someone more
> > experienced with Erlang finally taking a look at
> > this; I'm still a relative
> > newbie.
> > One thing I've liked about this entire exercise is
> > that the early attempts
> > at solving Tim Bray's Wide Finder in Erlang were
> > taking minutes to execute
> > and were providing only partial answers. Several of
> > us then started
> > whittling away at it, and because of the richness of
> > the language, we had a
> > variety of different avenues to explore. Over time,
> > we've vastly increased
> > the performance of our solutions. Anders's solution
> > now beats Ruby on the
> > same machine by about 0.3s, and because of the way
> > it uses multiple cores,
> > it will likely execute extremely quickly and
> > efficiently when Tim gets a
> > chance to try it on his T5120.
> >
> > Yes, fast solutions in other languages were quickly
> > found, but those had
> > almost nowhere to go beyond their initial forms in
> > terms of improvement, not
> > because they were already so fast, but because the
> > languages ran out of
> > alternatives. This is especially true when it comes
> > to taking advantage of
> > the T5120's many cores. I'm a fan of many languages,
> > including Ruby, Python,
> > Perl, and C++, all of which have figured prominently
> > in the collection of
> > various Wide Finder solutions. But for my money,
> > Erlang has fulfilled Tim's
> > original wishes the best, which is to take the best
> > possible advantage of
> > all those cores.
>
> Well done, everyone. You've chewed this over pretty
> well, I'd say, and it has been interesting to see how
> things have improved over time. Here are a couple of
> thoughts on further improvements:
>
> 1. Native code compilation? It's a bit hit-and-miss,
> but this could be the sort of problem that gains from
> it.
>
> 2. The speedup is 4.15 on 8 cores (if I'm reading
> things right: user/real). What is the bottleneck? Too
> small input, too much I/O, or is there something that
> could be improved or tuned further?
>
> And for language fans, the language itself could be a
> bit more helpful. While we haven't emphasized regexp
> crunching, it still seems like things could be easier.
> Here are some quick thoughts.
>
> 0. Not having to write the Boyer-Moore stuff by hand
>
> 1. Working with binaries like strings should be easier
>
> 2. Reading lines from files
>
> 3. Appropriate data chunking for file processing
> (people tried all from a few KB to several MB per
> chunk -- could the system figure out an appropriate
> size on its own?)
>
> 4. Perhaps a streaming interface would be even better?
> Jay Nelson suggested one a few years ago.
>
> 5. When looking at the use of dictionaries, it struck
> me that ets:update_counter/3 could have avoided the
> use of dictionaries and merging altogether. But, alas,
> there is the well-known snag that if the key does not
> exist in the table, you need to insert it yourself.
> Which means you get a race condition that, as far as I
> can see, ets can't handle safely. (Or could I be more
> creative?)
>
> 6. More intuitive APIs -- it's been instructive, and a
> bit alarming, to see how people outside the "core"
> community have had to struggle with false starts on
> this. More and better documentation and tutorials.
>
> Best,
> Thomas
>

I did the change to ets:update_counter last night, on my dual
core laptop that made an improvement from my previously
reported
real    0m6.305s
user    0m10.149s
sys     0m0.380s

to
real    0m5.094s
user    0m8.781s
sys     0m0.352s

To avoid the update_counter race condition I have only one process
that all workers report their matches to.

I had some problems with native compilation but finally made it work
and
real    0m2.192s
user    0m3.260s
sys     0m0.336s

/Anders