[erlang-questions] Retrieving "semi-constant" data from a function versus Mnesia

Joe Armstrong erlang@REDACTED
Wed May 13 12:26:54 CEST 2015

On Wed, May 13, 2015 at 1:00 AM, Peter Johansson <flexchap@REDACTED> wrote:
> Hi Joe !
> Thank you very much for a resolving answer/response indeed !
> This is my debut / first ever communication/message-interchange with you by
> the way   :-)
> ( but I have been aware of Erlang  ....you ..and some of the other
> community-active users for a few years now ).

Welcome to the Erlang mailing list, the place were we try to solve all the
world's programming problems :-)

> And also ... Thank you way ...way more for your contribution of/to a
> language being absolutely marvelously flexible & simplistic to use in real
> life prototyping/"testing out"-situations ( it even beats Python in this
> regard if you ask me ).
> It struck me yesterday that I actually have two other more "generic"
> Erlang-question as well .....but which happen to fit in completely naturally
> & closely under this current question-thread by their nature .....so I put
> them here too.
> 1:
> Are fixed data, defined & compiled inside functions, always stored inside &
> referenced from the constant-pool of the particular modules regardless of
> what Erlang term-types/structures they holds ....or does it exist special
> situations when such fixed data becomes partly or fully copied into the
> heap/stack of the calling process ?

Answer 1)

   You're not supposed to know. Your expectation should be that
fixed data will be compiled and implemented as efficiently as possible.
If this is not the case and your expectation is not met you should
complain loudly.

aside: this has happened on several occasions. If your expectations are
not met then tell us. Inefficient handling of fixed data is a bug.

Even if I told you the answer - then whatever you observe today might
not be true in a years time.

It seems to me to be crazy to optimise code on the basis of how it
performs *today* and expect these optimisation to be true in a few
years time. The hardware will change.

Optimisation should be the very last thing you do.

You should

    - write clean short easy to understand code

      (the goal is that *you* can understand your own code in a years time)


     How many of you can easily understand your own undocumented
     code a few months after you stopped working on it?

      I could have a book-long rant about this here, but the mail would
      take a year or so to compose.

      A quick 'ask-around' of my colleagues revealed that very few of them
     can  easily understand their own code a few months after they stop
     working on it.

     When you work on something it's in your cerebral cache - you write
     no documentation because it's "so obvious that it needs no explanation"

     You stop working on it - flush the cache - the next time you see
it you
     have to rebuild the cache - which takes a long time. Worse somebody
     else takes over and they have no cache to rebuild.

     I think it really takes years of practise to get the point where
you can write
     code, store it, and be reasonable confident that you will understand it
     in ten to twenty years time.

     40 years ago I wrote and distributed some code. A month
afterwards I got a bug report. I'd stopped working on the program.
There was no documentation and the code was *completely*
incomprehensible. A total rewrite with a far simpler program and
documentation was the result.

    You don't want to know how many projects I've seen that have ground into
the mud of incomprehensibility and been cancelled due to overwhelming

    Programming is all about *understanding*

    Once you understand things you *can* write efficient code - but when you
    *do* understand you won't want to.

    But I digress ...


    - measure, measure, measure
    - optimize if *absolutely* necessary

If you want your program to go a thousand times faster wait 10 years. Do
nothing and wait. This is by far the easiest way to optimize a program.

<aside> I stuck an SSD in my old macbook and doubled the memory now it
whizzes along - I can now throw away all my failed attempts to make
indexing software etc. go faster ..

It's hardware changes that makes software go faster (assuming
you've already found decent algorithms) - they that can first program
the million core NOC winds. We have tens to hundreds of billions
of computers sitting in idle loops - doing nothing for 99% of the time
and we still talk about making *this* computer that I have right now in
front of me go faster.

   For a computation to take place, data and computation power
   must meet at the same place in time and space.
   This is why the "cloud" is popular - we'll send all our data and
   computations to the same place (the "cloud") - problem solved.
   Only it's not. Why move GBytes of data to the computation
   when we should move the computation to the data?
   We need to figure out how to easily move computations
    and use all these computers that are sitting around doing nothing
   rather than optimising any single computer ...

 Bit off topic, but I was talking about optimization, and I do feel we
optimize the wrong things ...

 The only think I want to optimize is the time taken to write a program
 not the execution speed of that program (come back Prolog all is forgiven :-)



This is why having  small clean code base wins in the long run. Erlang
(today) is million of times faster than the mid 1980s version -
this speed up has *not* come from software optimisations but from hardware

The desire to resist optimize requires saintly dedication - you can
optimize if and only if your program becomes clearer shorter and more

Answer 2)

Measure measure measure

Answer 3)

Wait ten years

Answer 4)

Buy/Borrow a faster machine

Answer 5)

Yes (ish) it is my understanding that fixed data is special cased to keep
it off-stack and heap (or at least it should be)

> 2:
> In the current web-application project I work with (implemented on top of
> Yaws) I have the following type of function-call construct ( to achieve
> server-side method-dispatching )
> Variable_module_name:fixed_function_name(),
> According to the efficiency-guide of the Erlang-documentation this type of
> call-construct is more "expensive" (about six times) compare to a fully
> fixed name function-call.
> In what sense is it more expensive ?  ....is it about the time-lag between
> the point when the VM catch/discovers this call-construct and the point when
> the functional content (the prepared sub-routines) actually can be executed
> ?

measure ^ 3 (again)



> Once again ....thank you very much for contributing this language to the
> programmer-community.
> Sending my best regards !
> Peter , Lund Sverige
> 2015-05-11 14:32 GMT+02:00 Joe Armstrong <erlang@REDACTED>:
>> On Sun, May 10, 2015 at 10:32 PM, Benoit Chesneau <bchesneau@REDACTED>
>> wrote:
>> >
>> >
>> > On Sun, May 10, 2015 at 10:23 PM Joe Armstrong <erlang@REDACTED> wrote:
>> >>
>> >> How large is the total data?
>> >>
>> >> If it's small you could define this in a module and not use a database
>> >> or process at all.
>> >
>> >
>> > How small should it be? What if the compiled bean is about 888
>> > kilobytes?
>> It depends :-)
>> There are many factors involved:
>>    a) How much memory do you have
>>    b) How often do you want to change the configuration data
>>    c) How quick do you the change to be
>> Assume
>>   a) is a smallish number of GBytes (normal these days)
>>   b) is "relatively rarely" (I don't know what this means - (aside - this
>> is why
>>       I always ask people to provide numbers in their questions))
>>      Say once a day
>>   c) is a few seconds
>> Then it should be perfectly doable. Making a module containing all the
>> config data
>> and recompiling when necessary is certainly the fastest way to access the
>> data -
>> this has very fast reads but very slow writes - you could think of a
>> module as a database
>> optimized for reads but not writes :-)
>> /Joe
>> >
>> > - benoit

More information about the erlang-questions mailing list