<div><span class="gmail_quote">On 10/20/07, <b class="gmail_sendername">Anders Nygren</b> <<a href="mailto:email@example.com">firstname.lastname@example.org</a>> wrote:</span><blockquote class="gmail_quote" style="margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi<br>I am playing around with Tim Bray's widefinder problem.<br>Actually I am playing with Steve V's latest version<br>see <a href="http://steve.vinoski.net/blog">http://steve.vinoski.net/blog</a>/2007/10/18/ok-just-one-more
-wf/<br><br>In it he is using a dict for Boyer-Moore searching, the dict is just<br>a map from char code to shift amount i.e. the keys are 1 - 255<br>and the values are ints.<br>I figured that it would be faster to just have a tuple with the shift values.
<br>So I changed the code to generate a tuple instead of a dict. And the<br>lookups changed from<br>dict:fetch(C1, Tbl)<br>to<br>element(C1,Tbl)<br><br>To my big surprise the program now runs slower.<br>On a file with 1M rows approx. 192 M.
<br><br>Steve's original code, using dics, as reported by time<br>real 0m20.444s<br>user 0m37.718s<br>sys 0m0.612s<br><br>My modified, using a tuple<br>real 0m25.849s<br>user 0m44.203s<br>sys 0m3.332s
<br><br>Interesting to note is the high value for sys in my version.<br>Also the tuple version does not manage to max out the CPU use<br>it peaks at ~85%, while the dict version gets 100% CPU load.<br>The program starts 64 erlang processes on my dual core laptop.
<br><br>My only idea is that this is caused by some pathological CPU cache<br>behaviour. Does anyone have a better explanation?<br></blockquote></div><br><div>Hi Anders, I don't know the answer, but I agree that it seems strange. I wrote some tests that did simple lookups against dicts and tuples in a loop and timed them, and tuples were faster, but when I used them in my Wide Finder code, they were slower, just as you saw.