[erlang-questions] pre-load large data files when the application start

Benoit Chesneau <>
Sat Mar 26 16:37:01 CET 2016


On Sat, Mar 26, 2016 at 10:31 AM Roger Lipscombe <>
wrote:

> Is it the .erl -> .beam compilation step that runs out of memory? What
> happens if you use compile:forms/1,2 instead? Maybe you can write a
> relatively simple escript that turns your CSV into forms and compiles
> those (and then writes out the beam file)? See, for example, a script
> we use to embed binary resources as beam files:
> https://gist.github.com/rlipscombe/770ce8fc75add11e16f1
>
> On 25 March 2016 at 22:32, Benoit Chesneau <> wrote:
> >
> >
> > On Fri, Mar 25, 2016 at 11:19 PM Michael Truog <>
> wrote:
> >>
> >> On 03/25/2016 02:33 PM, Benoit Chesneau wrote:
> >>
> >>
> >>
> >> On Friday, March 25, 2016, Michael Truog <> wrote:
> >>>
> >>>
> >>> Having the build process generate the module file and the beam file
> seems
> >>> decent.  There isn't a need to build the module dynamically (during
> runtime,
> >>> upon startup) or store the unicode data in global storage due to the
> unicode
> >>> changes being infrequent.   Then, if you do need to update due to
> unicode
> >>> changes, you can always hot-load a new version of the module, during
> runtime
> >>> and the usage of the module shouldn't have problems with that, if it
> is kept
> >>> as a simple utility/library module.  This problem reminds me of the
> code at
> >>> https://github.com/rambocoder/unistring and there might be overlap in
> the
> >>> goals of these two repositories.
> >>
> >>
> >>
> >> this is what the current release (1.2) does. But it doesn't compile in
> >> containers or machines =< 1GB. The build crash. This is why i'm looking
> at
> >> shipping a pre-compiled beam. or maybe include the data in a db. but
> for now
> >> my tests with a db file (ets) shows it's really slower 30-40ms vs 6ms
> using
> >> maps and a pre-compiled beam. Also maps use less storage compared to
> simply
> >> using function pattern matching in the beam.
> >>
> >> - benoît
> >>
> >> I think you need to switch to using function pattern matching, when
> >> keeping it in a module to keep memory usage down.  Storing everything
> in a
> >> map has to deal with a big chunk of map data, but storing everything in
> the
> >> module as function pattern matching cases is just part of the module
> data
> >> (should be better for GC due to less heap usage and should be more
> >> efficient).  You probably want to try and keep all the function pattern
> >> matching cases in-order, though it isn't mentioned as helpful at
> >> http://erlang.org/doc/efficiency_guide/functions.html#id67975 (might
> affect
> >> the compiler execution, if not the efficiency of the pattern
> matching).  If
> >> you used more formal processing of the unicode CSV data it will be
> easier,
> >> perhaps with a python script (instead of awk/shell-utilities, also
> >> portability is better as a single script), to create the Erlang
> module.  If
> >> necessary, you could use more than a single Erlang module to deal with
> >> separate functions, but a single function should require a single
> module to
> >> keep its update atomic (not trying to split a function into multiple
> modules
> >> based on the input).
> >
> >
> > I agree pattern matching should be probably better than the maps for GC
> > (they are only 1ms faster on lookup). But the problem is really not
> > generating the module:
> >
> https://github.com/benoitc/erlang-idna/blob/v1.x/src/idna_unicode_data1.erl
> >
> > The current issue with above is the amount of RAM needed to compile the
> > beam. If the application is built on a machine with RAM => 1GB it will
> fail.
> > I guess I could just generate the beam with pattern matching and ship it
> > like  I do in the "precompiled" branch . Unless some come with a better
> > idea, i think i will go with it. WWhat do you think? The annoying thing
> is
> > having to do the `-on_load` hack (just cause i'm lazy). Using rebar or
> > erlang.mk i wouldjust generate and ship it in ebin dir. But rebar3
> doesn't
> > copy any content from it to its _build directory :|
> >
> >
> > - benoît
> >
> >
>

Michael the idea of using integer is a good idea indded I am making the
change so I won't have to make the transformations while running. Which
already good.

Roger I can do that but how would you link it to the build system? It seems
for me I will need to provide a plugin for both rebar3 and erlang.mk.
rebar3 is the most problematic there due to the "relative" arch it's using
in the _build folder. I will have to find a way to move it in the right
ebin.. Any idea?

- benoit
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160326/6a89fed0/attachment.html>


More information about the erlang-questions mailing list