<div dir="ltr"><div class="gmail_quote"><div>On Wed, Jul 5, 2017 at 12:46 PM Kacper Mentel <<a href="mailto:mentel.kk@gmail.com">mentel.kk@gmail.com</a>> wrote:<br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div dir="ltr"><div>- mean acces time to the table in a unit of time</div><div><br></div></div></div></blockquote><br></div><div class="gmail_quote">This is a pet-peeve of mine: the mean access time is often misleading, and usually completely wrong.<br><br></div><div class="gmail_quote">1. You can only use the mean for anything once you know the statistical model in which the data fits. Say we *assume* the data is normally distributed. Then we can use the mean for something as long as we also report the variance. But a general rule of computer science is that data is rarely normally distributed. It is much more common that data is (bi-)modal: there is a fast case, and then a slow code path for some pathological case. Thus, any mention of the mean will report a number in between the fast and slow class: there will be no data here!<br><br></div><div class="gmail_quote">2. Reporting the median (50th percentile) is slightly better. But it signals "I don't care for half of my customers" in the sense you ignore half of the requests. I'm far more interested in the 90th, 95th, 99th, 99.9th, 99.99th, 99.999th, percentiles and the maximal value than the mean for anything I do. Tracking this is easily done with HdrHistogram (see Gil Tene's work - the idea is to make histogram buckets follow the structure of a floating point number representation with exponent and mantissa which keeps the resolution high around 0.0).<br><br></div><div class="gmail_quote">3. I'm interested in a histogram over the latencies. But since histograms require you to come up with the size of the bars, a kernel density plot is almost always what I go after for these.<br><br></div><div class="gmail_quote">One of the interesting things I've found is that if you plot the above, the conclusions tend to change quite a lot. For instance that the algorithm which is *really* fast in the common case is *really* slow when it hits the slow path. It may be so slow it is unusable. But if you report the mean, the system can "hide" the slow query by amortizing it over the fast ones. I don't find this to be fair.<br><br></div><div class="gmail_quote">Another takeaway is that improving the 99th percentile tend to improve the latency curve for the system as a whole. ETS is a system in which lookups should not take more than 1-2 microseconds. But this means it should also hold for the 99.99th percentile.<br><br></div><div class="gmail_quote">Finally, I have a hunch the {read_concurrency, true} options will have a far greater impact on parallel access to the table if you have a high amount of cores. Reporting the mean would allow the system to "hide" that it is stalling one core.<br><br></div><div class="gmail_quote">Aside: If you haven't, your work should have a section which describes how the test cases work around the problem of "coordinated omission" in which the test generator coordinates with the system to hide request latencies which are really higher than what they should be.<br><br></div><div class="gmail_quote">Have fun working on the project! Take or leave the above suggestions as you see fit!<br></div></div>