[erlang-questions] Strange optimization result

Mon Oct 22 09:48:32 CEST 2007

--- Steve Vinoski <vinoski@REDACTED> wrote:

> Anders sent me his code and I ran it on my 8-core
> Linux box, with the
> following performance results. VICTORY is right! :-)
> 
> real    0m1.904s
> user    0m7.917s
> sys     0m1.185s
> 
> Like I mentioned to Anders in private email, it's
> nice to have someone more
> experienced with Erlang finally taking a look at
> this; I'm still a relative
> newbie.
> One thing I've liked about this entire exercise is
> that the early attempts
> at solving Tim Bray's Wide Finder in Erlang were
> taking minutes to execute
> and were providing only partial answers. Several of
> us then started
> whittling away at it, and because of the richness of
> the language, we had a
> variety of different avenues to explore. Over time,
> we've vastly increased
> the performance of our solutions. Anders's solution
> now beats Ruby on the
> same machine by about 0.3s, and because of the way
> it uses multiple cores,
> it will likely execute extremely quickly and
> efficiently when Tim gets a
> chance to try it on his T5120.
> 
> Yes, fast solutions in other languages were quickly
> found, but those had
> almost nowhere to go beyond their initial forms in
> terms of improvement, not
> because they were already so fast, but because the
> languages ran out of
> alternatives. This is especially true when it comes
> to taking advantage of
> the T5120's many cores. I'm a fan of many languages,
> including Ruby, Python,
> Perl, and C++, all of which have figured prominently
> in the collection of
> various Wide Finder solutions. But for my money,
> Erlang has fulfilled Tim's
> original wishes the best, which is to take the best
> possible advantage of
> all those cores.

Well done, everyone. You've chewed this over pretty
well, I'd say, and it has been interesting to see how
things have improved over time. Here are a couple of
thoughts on further improvements:

1. Native code compilation? It's a bit hit-and-miss,
but this could be the sort of problem that gains from
it.

2. The speedup is 4.15 on 8 cores (if I'm reading
things right: user/real). What is the bottleneck? Too
small input, too much I/O, or is there something that
could be improved or tuned further?

And for language fans, the language itself could be a
bit more helpful. While we haven't emphasized regexp
crunching, it still seems like things could be easier.
Here are some quick thoughts.

0. Not having to write the Boyer-Moore stuff by hand

1. Working with binaries like strings should be easier

2. Reading lines from files

3. Appropriate data chunking for file processing
(people tried all from a few KB to several MB per
chunk -- could the system figure out an appropriate
size on its own?)

4. Perhaps a streaming interface would be even better?
Jay Nelson suggested one a few years ago.

5. When looking at the use of dictionaries, it struck
me that ets:update_counter/3 could have avoided the
use of dictionaries and merging altogether. But, alas,
there is the well-known snag that if the key does not
exist in the table, you need to insert it yourself.
Which means you get a race condition that, as far as I
can see, ets can't handle safely. (Or could I be more
creative?)

6. More intuitive APIs -- it's been instructive, and a
bit alarming, to see how people outside the "core"
community have had to struggle with false starts on
this. More and better documentation and tutorials.

Best,
Thomas

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com