Performance Tests and Tuning

Tue Mar 15 14:06:54 CET 2011

Graeme,

Maybe it's just me, but it looks like the largest chunk of time is spent in
garbage collection, thus the natural way to optimize that is to make fewer
copies of things.  Also, it's natural that your list version of the Index
counter is slightly slower, tuples have faster access since they are of
fixed length.

Sean

On Tue, Mar 15, 2011 at 3:36 AM, Graeme Defty <graeme.defty@REDACTED>wrote:

> Sorry all. This is likely to be a biggish post (and may well turn out to be
> unreadable, if the tables dont reproduce well ;-)
>
> I would certainly appreciate someone commenting on the logic of my
> approach.
>
> OK - heres what I have done so far, using the Reia PEG as a test case....
>
> It seemed that fprof was the best tool to use, but it measures elapsed
> time, not CPU. I could not find a tool to give me processor time used.
> :-(    (Is that really so hard to do in a VM?!?)
>
> As a baseline, I used the last version of the memory branch that I posted.
> The top few lines of the result were as follows:
>
>   "<0.39.0>"
>
> 1940718 undefined 52034.028
>
>
> 13.2247
>
>
> Count Tot Own Unit %ageTotal
>
>  neotoma_parse p_all 4 138932 72683.399 3934.601 0.02832 7.56%
> 1.0000  neotoma_parse p_attempt 4 130960 69777.613 3884.261 0.02966 7.46%
> 0.9872  neotoma_parse p 5 51709 73383.129 3834.867 0.07416 7.37%
> 0.9747  neotoma_parse get_memo 1 51709 5872.097 2540.128 0.04912 4.88%
> 0.6456  lists foldl 3 95022 5179.214 2277.290 0.02397 4.38%
> 0.5788  neotoma_parse -p_charclass/1-fun-0-' 4 14558 6096.801 2265.258
> 0.15560 4.35%
> 0.5757  proplists lookup 2 137078 3229.624 2216.337 0.01617 4.26%
> 0.5633  neotoma_parse -p_string/1-fun-0-' 4 110587 4273.063 2200.807
> 0.01990 4.23%
> 0.5593  neotoma_parse p_advance_index 2 95022 10534.828 2080.734 0.02190
> 4.00%
> 0.5288  neotoma_parse memoize 2 42376 4736.719 1954.612 0.04613 3.76%
> 0.4968  neotoma_parse p_scan 4 53795 72608.579 1629.444 0.03029 3.13%
> 0.4141  unicode characters_to_list 1 47433 3260.534 1588.838 0.03350 3.05%
> 0.4038  neotoma_parse -p_not/1-fun-0-' 3 37517 3341.361 1537.576 0.04098
> 2.95%
> 0.3908  neotoma_parse -p_anything/0-fun-0-' 2 34785 9531.222 1458.524
> 0.04193 2.80%
> 0.3707  neotoma_parse memo_table_name 0 95476 1899.382 1354.767 0.01419
> 2.60%
> 0.3443  proplists is_defined 2 74994 1764.019 1212.677 0.01617 2.33%
> 0.3082  garbage_collect 537
> 1 1039.164 1039.164 1039.16400 2.00%
> 0.2641  re run 3 14558 1056.997 994.229 0.06829 1.91%
> 0.2527  ets lookup 2 52794 1155.080 950.885 0.01801 1.83%
> 0.2417  ets insert 2 42676 1240.016 877.454 0.02056 1.69%
> 0.2230
> For those unfamiliar with fprof, 'count' is the number of times a function
> is executed, Tot is the Total time spent in a function, including time in
> the called functions, and Own is the time spent in a function EXcluding
> called functions (the bit in which we are most interested)
>
> The figures on the top line level with <0,39,0> are the totals for the
> whole process.
>
> The 'unit' column I added, and shows the 'own' time for a function divided
> by the number of times it is executed (i.e. the unit time for an execution
> ;-) ) and the % of total column shows the Own time as a percentage of the
> total own time.
>
> The bad news, as can be seen from this %age column, is that the is no
> particular 'hotspot', so no obvious place to focus our efforts.
>
> SO . . . I just started at the top of the list. The function p_all seemed
> the biggest user, but looking at the code, t is not easy to see how to make
> it any slicker. The same is true of p_attempt and p.
>
> The first line of interest is that for get_memo. I have wondered for a
> while about whether it would be better to use a proplist to store the
> caching information, so that was my first experiment. My initial try showed
> an improvement of about 50% reduction in time, then I remembered this was
> elapsed time, and so highly sensitive to the resources being used on my
> machine at the time.
>
> To combat this variable, the best I could think of was to scale everything
> relative to the p_all time (which I did not expect to be improving any).
> This is what you see in the far right column, where the own time for each
> function is shown as a fraction of the p_all time.
>
> *Memoisation*
> Having changed the memoisation routines to use a proplist, I got the
> following results:
>   "<0.39.0>"
>
> 1940528 undefined 26105.719
> 50.17%
> 11.9094
>
>
> Count Tot Own Unit %ageTotal
>
>  neotoma_parse p_all 4 138932 49375.193 2192.023 0.01578 8.40%
> 1.0000  neotoma_parse p 5 51709 49669.187 1948.556 0.03768 7.46%
> 0.8889  neotoma_parse p_attempt 4 130960 47517.007 1896.901 0.01448 7.27%
> 0.8654  neotoma_parse get_memo 1 51709 4467.727 1271.97 0.02460 4.87%
> 0.5803  neotoma_parse memoize 2 42376 3598.339 1195.318 0.02821 4.58%
> 0.5453  neotoma_parse p_advance_index 2 95022 7174.674 1161.685 0.01223
> 4.45%
> 0.5300  lists foldl 3 95022 3465.169 1090.941 0.01148 4.18%
> 0.4977  neotoma_parse -p_string/1-fun-0-' 4 110587 2499.068 1051.826
> 0.00951 4.03%
> 0.4798  neotoma_parse p_scan 4 53795 49261.369 957.463 0.01780 3.67%
> 0.4368  proplists lookup 2 137078 2055.560 953.497 0.00696 3.65%
> 0.4350  re run 3 14558 1030.919 844.034 0.05798 3.23%
> 0.3850  unicode characters_to_list 1 47433 2274.426 708.93 0.01495 2.72%
> 0.3234  neotoma_parse memo_table_name 0 95476 1422.423 676.268 0.00708
> 2.59%
> 0.3085  garbage_collect 537
> 1 674.150 669.530 1.00000 2.56%
> 0.3054  neotoma_parse -p_not/1-fun-0-' 3 37517 2014.170 630.088 0.01679
> 2.41%
> 0.2874  ets lookup 2 52794 1037.220 611.673 0.01159 2.34%
> 0.2790  proplists is_defined 2 74994 1209.329 579.755 0.00773 2.22%
> 0.2645  ets insert 2 42676 936.285 577.493 0.01353 2.21%
> 0.2635  neotoma_parse -p_charclass/1-fun-0-' 4 14558 3818.591 564.022
> 0.03874 2.16%
> 0.2573  neotoma_parse -p_anything/0-fun-0-' 2 34785 6316.239 489.329
> 0.01407 1.87%
> 0.2232
>
> As can be seen, there is some improvement in get_memo (down from 0.6456 to
> 0.5803), sadly there is a corresponding INcrease in memoize (up from 0.4968
> to 0.5453) which reduces the gain to just 0.05 p_all execution-equivalents
> (if you will excuse my somewhat clumsy scale). Actually, the impacts are
> complex, because all sorts of other things have shifted, so I concentrated
> on the figure at the top which shows the total elapsed time in terms of
> p_all executions. This has reduced from 13.225 to 11.909, a reduction of
> about 10%. Not stunning, but a worthwhile improvement.
>
> The next obvious target was advance_index. It seemed to me that the index
> structure was somewhat over-engineered, so I tried replacing it with a
> simple dotted pair ([x|y]). HAS to produce some benefits, right?
>
> Well the results were as follows:
>
>   "<0.39.0>"
>
> 1940530 undefined 26095.127
> 50.15%
> 13.4288
>
>
> Count Tot Own Unit %ageTotal
>
>  neotoma_parse p_all 4 138932 51173.965 1943.224 0.01399 7.45%
> 1.0000  neotoma_parse p 5 51709 51657.472 1923.512 0.03720 7.37%
> 0.9899  neotoma_parse p_attempt 4 130960 49330.829 1855.634 0.01417 7.11%
> 0.9549  neotoma_parse get_memo 1 51709 4568.310 1265.543 0.02447 4.85%
> 0.6513  neotoma_parse p_advance_index 2 95022 7860.237 1165.753 0.01227
> 4.47%
> 0.5999  proplists lookup 2 137078 2279.512 1160.885 0.00847 4.45%
> 0.5974  lists foldl 3 95022 3769.014 1107.37 0.01165 4.24%
> 0.5699  neotoma_parse -p_string/1-fun-0-' 4 110587 2909.453 1047.395
> 0.00947 4.01%
> 0.5390  neotoma_parse memoize 2 42376 3365.578 962.101 0.02270 3.69%
> 0.4951  neotoma_parse p_scan 4 53795 51087.806 910.642 0.01693 3.49%
> 0.4686  re run 3 14558 1021.338 885.071 0.06080 3.39%
> 0.4555  neotoma_parse -p_charclass/1-fun-0-' 4 14558 4297.158 766.53
> 0.05265 2.94%
> 0.3945  garbage_collect 537
> 1 738.733 732.901 1.00000 2.81%
> 0.3772  unicode characters_to_list 1 47433 2487.362 723.41 0.01525 2.77%
> 0.3723  neotoma_parse memo_table_name 0 95476 1529.930 663.84 0.00695
> 2.54%
> 0.3416  ets lookup 2 52794 1100.060 629.197 0.01192 2.41%
> 0.3238  proplists is_defined 2 74994 1355.632 606.665 0.00809 2.32%
> 0.3122  ets insert 2 42676 876.211 588.44 0.01379 2.25%
> 0.3028  neotoma_parse -p_not/1-fun-0-' 3 37517 2047.480 547.493 0.01459
> 2.10%
> 0.2817  neotoma_parse -p_anything/0-fun-0-' 2 34785 6542.401 545.422
> 0.01568 2.09%
> 0.2807
> So the total has gone from 13.2247 to 13.4288. An INCREASE of about 1.5% !
> ! !
>
> Hmmm - Well maybe dotted pair handling is not so great after all. I will
> try a 2-item list next.
>
> But you can see that this does not look like it is going to be easy. One
> way to improve things is to tighten up the structure of the PEG. I am sure I
> could simplify it, especially around the definition of the lowest level
> elements, and thus reduce the number of times all these key functions are
> executed (p_all, p, etyc.)  but that seems a bit like putting the cart
> before the horse. I do not really want to bend my PEG (if you will excuse
> the expression) for good performance, but of course if that is what it takes
> . . .
>
> All ideas gratefully received.
>
> g
> _____________________________________________
> On 15 March 2011 03:56, Tony Arcieri <tony.arcieri@REDACTED> wrote:
>
>> Any plans for further performance improvements in the roadmap? :)
>>
>>
>> On Sun, Mar 13, 2011 at 4:25 PM, Sean Cribbs <seancribbs@REDACTED>wrote:
>>
>>> It's been a year coming, so I'm proud to announce a new version of
>>> Neotoma, the PEG/Packrat toolkit and parser-generator.  The primary
>>> feature of version 1.5 is that it uses significantly less memory and
>>> also supports UTF8 (via binaries), thanks to some awesome work by
>>> Graeme Defty, who has also converted Reia to use a PEG.
>>>
>>> You can get the latest source on Github:
>>> https://github.com/seancribbs/neotoma
>>>
>>> Neotoma is rebar-friendly as well, so adding it to a rebar project is
>>> a simple deps declaration like so:
>>>
>>>    {neotoma, "1.5", {git, "git://github.com/seancribbs/neotoma", {tag,
>>> "1.5"}}}
>>>
>>> Hope to see some of you at Erlang Factory SF next week.
>>>
>>> Cheers,
>>>
>>> Sean Cribbs
>>>
>>
>>
>>
>> --
>> Tony Arcieri
>> Medioh! Kudelski
>>
>
>