[erlang-questions] Erlang Cost Model

Thu Sep 17 19:04:32 CEST 2015

On 09/18, zxq9 wrote:
>These are costs specific to a particular operation happening on its own 
>time, but tell you nothing meaningful about your system as a whole 
>because all of this stuff is going on at once in different processes at 
>different times. Sometimes you are in the situation where doubling your 
>processing speed just means adding more cores. Sometimes not. You will 
>either *know* at the outset of a project or *have no idea whether this 
>is true* until you actually have something up and running that you can 
>measure. Usually it is the latter.
>

If you have any sizeable data set, you can improve speed by orders of 
magnitude, which is a lot more worthwhile than by core count (dividing 
by a constant factor only).

If you have an O(n²) program running on 48 cores, and a O(log n) one on 
one core, any significant input will leave your one core program going 
faster sooner or later.

Disregarding the time and memory complexity of your algorithms because 
~cores~ is just a plain terrible design mechanism. Hell, you also have 
to care for memory, network, etc. which are often addressed by the 
complexity of algorithms too.

All in all, you've got N units of work to do, and processors let you do 
M of them at once, all cores summed up. The best way to optimize is to 
find how to reduce the amount of work by a lot rather than augment the 
number of cores.

Not that you should spit on cores.

>Very often you will find yourself able to approach a concurrent ideal 
>after you've already got something implemented that does basically what 
>you want, but not before. This is true whether or not you've got months 
>of paid time to toy with an idea (HA HA! Like *that* ever happens!). It 
>is nearly always faster to experiment with a prototype in Erlang than 
>to just muse about it until the concept is perfect. Once you have a 
>prototype, tweaking it is easy, and when that exists you can already 
>measure stuff for real. This is why I am calling the *unqualified* 
>utility of cost models into question.
>

That is 100% unrelated to knowing which data structures already 
available exist and what their properties are. If you've got 300,000 
pieces of data to store, you don't need months of prototyping to realize 
orddicts are a bad idea and maps a better one. And that if you often 
want the smallest of these 300,000 elements, then maybe gb_trees are a 
better choice than maps because the doc says you can get them in O(log 
n) and maps require you to do an O(n) scan.

You may find out you need a fancy custom data structure after 
prototyping, but in no way do you find yourself in a bad situation for 
knowing the complexity of tools you already have available.