[erlang-questions] Strange optimization result

Thomas Lindgren <>
Mon Oct 22 09:48:32 CEST 2007

--- Steve Vinoski <> wrote:

> Anders sent me his code and I ran it on my 8-core
> Linux box, with the
> following performance results. VICTORY is right! :-)
> real    0m1.904s
> user    0m7.917s
> sys     0m1.185s
> Like I mentioned to Anders in private email, it's
> nice to have someone more
> experienced with Erlang finally taking a look at
> this; I'm still a relative
> newbie.
> One thing I've liked about this entire exercise is
> that the early attempts
> at solving Tim Bray's Wide Finder in Erlang were
> taking minutes to execute
> and were providing only partial answers. Several of
> us then started
> whittling away at it, and because of the richness of
> the language, we had a
> variety of different avenues to explore. Over time,
> we've vastly increased
> the performance of our solutions. Anders's solution
> now beats Ruby on the
> same machine by about 0.3s, and because of the way
> it uses multiple cores,
> it will likely execute extremely quickly and
> efficiently when Tim gets a
> chance to try it on his T5120.
> Yes, fast solutions in other languages were quickly
> found, but those had
> almost nowhere to go beyond their initial forms in
> terms of improvement, not
> because they were already so fast, but because the
> languages ran out of
> alternatives. This is especially true when it comes
> to taking advantage of
> the T5120's many cores. I'm a fan of many languages,
> including Ruby, Python,
> Perl, and C++, all of which have figured prominently
> in the collection of
> various Wide Finder solutions. But for my money,
> Erlang has fulfilled Tim's
> original wishes the best, which is to take the best
> possible advantage of
> all those cores.

Well done, everyone. You've chewed this over pretty
well, I'd say, and it has been interesting to see how
things have improved over time. Here are a couple of
thoughts on further improvements:

1. Native code compilation? It's a bit hit-and-miss,
but this could be the sort of problem that gains from

2. The speedup is 4.15 on 8 cores (if I'm reading
things right: user/real). What is the bottleneck? Too
small input, too much I/O, or is there something that
could be improved or tuned further?

And for language fans, the language itself could be a
bit more helpful. While we haven't emphasized regexp
crunching, it still seems like things could be easier.
Here are some quick thoughts.

0. Not having to write the Boyer-Moore stuff by hand

1. Working with binaries like strings should be easier

2. Reading lines from files

3. Appropriate data chunking for file processing
(people tried all from a few KB to several MB per
chunk -- could the system figure out an appropriate
size on its own?)

4. Perhaps a streaming interface would be even better?
Jay Nelson suggested one a few years ago.

5. When looking at the use of dictionaries, it struck
me that ets:update_counter/3 could have avoided the
use of dictionaries and merging altogether. But, alas,
there is the well-known snag that if the key does not
exist in the table, you need to insert it yourself.
Which means you get a race condition that, as far as I
can see, ets can't handle safely. (Or could I be more

6. More intuitive APIs -- it's been instructive, and a
bit alarming, to see how people outside the "core"
community have had to struggle with false starts on
this. More and better documentation and tutorials.


Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 

More information about the erlang-questions mailing list