[erlang-questions] pre-load large data files when the application start

Sat Mar 26 16:53:41 CET 2016

On 03/26/2016 04:37 PM, Benoit Chesneau wrote:
>
>
> On Sat, Mar 26, 2016 at 10:31 AM Roger Lipscombe <roger@REDACTED
> <mailto:roger@REDACTED>> wrote:
>
>     Is it the .erl -> .beam compilation step that runs out of memory? What
>     happens if you use compile:forms/1,2 instead? Maybe you can write a
>     relatively simple escript that turns your CSV into forms and compiles
>     those (and then writes out the beam file)? See, for example, a script
>     we use to embed binary resources as beam files:
>     https://gist.github.com/rlipscombe/770ce8fc75add11e16f1
>
>     On 25 March 2016 at 22:32, Benoit Chesneau <bchesneau@REDACTED
>     <mailto:bchesneau@REDACTED>> wrote:
>      >
>      >
>      > On Fri, Mar 25, 2016 at 11:19 PM Michael Truog <mjtruog@REDACTED
>     <mailto:mjtruog@REDACTED>> wrote:
>      >>
>      >> On 03/25/2016 02:33 PM, Benoit Chesneau wrote:
>      >>
>      >>
>      >>
>      >> On Friday, March 25, 2016, Michael Truog <mjtruog@REDACTED
>     <mailto:mjtruog@REDACTED>> wrote:
>      >>>
>      >>>
>      >>> Having the build process generate the module file and the beam
>     file seems
>      >>> decent.  There isn't a need to build the module dynamically
>     (during runtime,
>      >>> upon startup) or store the unicode data in global storage due
>     to the unicode
>      >>> changes being infrequent.   Then, if you do need to update due
>     to unicode
>      >>> changes, you can always hot-load a new version of the module,
>     during runtime
>      >>> and the usage of the module shouldn't have problems with that,
>     if it is kept
>      >>> as a simple utility/library module.  This problem reminds me of
>     the code at
>      >>> https://github.com/rambocoder/unistring and there might be
>     overlap in the
>      >>> goals of these two repositories.
>      >>
>      >>
>      >>
>      >> this is what the current release (1.2) does. But it doesn't
>     compile in
>      >> containers or machines =< 1GB. The build crash. This is why i'm
>     looking at
>      >> shipping a pre-compiled beam. or maybe include the data in a db.
>     but for now
>      >> my tests with a db file (ets) shows it's really slower 30-40ms
>     vs 6ms using
>      >> maps and a pre-compiled beam. Also maps use less storage
>     compared to simply
>      >> using function pattern matching in the beam.
>      >>
>      >> - benoît
>      >>
>      >> I think you need to switch to using function pattern matching, when
>      >> keeping it in a module to keep memory usage down.  Storing
>     everything in a
>      >> map has to deal with a big chunk of map data, but storing
>     everything in the
>      >> module as function pattern matching cases is just part of the
>     module data
>      >> (should be better for GC due to less heap usage and should be more
>      >> efficient).  You probably want to try and keep all the function
>     pattern
>      >> matching cases in-order, though it isn't mentioned as helpful at
>      >> http://erlang.org/doc/efficiency_guide/functions.html#id67975
>     (might affect
>      >> the compiler execution, if not the efficiency of the pattern
>     matching).  If
>      >> you used more formal processing of the unicode CSV data it will
>     be easier,
>      >> perhaps with a python script (instead of awk/shell-utilities, also
>      >> portability is better as a single script), to create the Erlang
>     module.  If
>      >> necessary, you could use more than a single Erlang module to
>     deal with
>      >> separate functions, but a single function should require a
>     single module to
>      >> keep its update atomic (not trying to split a function into
>     multiple modules
>      >> based on the input).
>      >
>      >
>      > I agree pattern matching should be probably better than the maps
>     for GC
>      > (they are only 1ms faster on lookup). But the problem is really not
>      > generating the module:
>      >
>     https://github.com/benoitc/erlang-idna/blob/v1.x/src/idna_unicode_data1.erl
>      >
>      > The current issue with above is the amount of RAM needed to
>     compile the
>      > beam. If the application is built on a machine with RAM => 1GB it
>     will fail.
>      > I guess I could just generate the beam with pattern matching and
>     ship it
>      > like  I do in the "precompiled" branch . Unless some come with a
>     better
>      > idea, i think i will go with it. WWhat do you think? The annoying
>     thing is
>      > having to do the `-on_load` hack (just cause i'm lazy). Using
>     rebar or
>      > erlang.mk <http://erlang.mk> i wouldjust generate and ship it in
>     ebin dir. But rebar3 doesn't
>      > copy any content from it to its _build directory :|
>      >
>      >
>      > - benoît
>      >
>      >
>
>
> Michael the idea of using integer is a good idea indded I am making the
> change so I won't have to make the transformations while running. Which
> already good.
>
> Roger I can do that but how would you link it to the build system? It
> seems for me I will need to provide a plugin for both rebar3 and
> erlang.mk <http://erlang.mk>. rebar3 is the most problematic there due
> to the "relative" arch it's using in the _build folder. I will have to
> find a way to move it in the right ebin.. Any idea?

If you follow his advice of writing an escript, then adding support for 
Erlang.mk should just be a matter of adding

   app:: ; escript my_escript.erl
   BEAM_FILES += ebin/generated.beam

before the include erlang.mk line. If that's not enough, ping me, 
because it's probably a bug.

Same also works with any other command or script.

Alternatively if you do it directly in the Makefile, same advice, except 
replace the first line with:

   app:: gen
   BEAM_FILES += ebin/generated.beam

And then define the gen target (before, after, doesn't matter). You 
could then have rebar just call "make gen".

-- 
Loïc Hoguin
http://ninenines.eu
Author of The Erlanger Playbook,
A book about software development using Erlang