[erlang-questions] pre-load large data files when the application start
Loïc Hoguin
essen@REDACTED
Sat Mar 26 16:53:41 CET 2016
On 03/26/2016 04:37 PM, Benoit Chesneau wrote:
>
>
> On Sat, Mar 26, 2016 at 10:31 AM Roger Lipscombe <roger@REDACTED
> <mailto:roger@REDACTED>> wrote:
>
> Is it the .erl -> .beam compilation step that runs out of memory? What
> happens if you use compile:forms/1,2 instead? Maybe you can write a
> relatively simple escript that turns your CSV into forms and compiles
> those (and then writes out the beam file)? See, for example, a script
> we use to embed binary resources as beam files:
> https://gist.github.com/rlipscombe/770ce8fc75add11e16f1
>
> On 25 March 2016 at 22:32, Benoit Chesneau <bchesneau@REDACTED
> <mailto:bchesneau@REDACTED>> wrote:
> >
> >
> > On Fri, Mar 25, 2016 at 11:19 PM Michael Truog <mjtruog@REDACTED
> <mailto:mjtruog@REDACTED>> wrote:
> >>
> >> On 03/25/2016 02:33 PM, Benoit Chesneau wrote:
> >>
> >>
> >>
> >> On Friday, March 25, 2016, Michael Truog <mjtruog@REDACTED
> <mailto:mjtruog@REDACTED>> wrote:
> >>>
> >>>
> >>> Having the build process generate the module file and the beam
> file seems
> >>> decent. There isn't a need to build the module dynamically
> (during runtime,
> >>> upon startup) or store the unicode data in global storage due
> to the unicode
> >>> changes being infrequent. Then, if you do need to update due
> to unicode
> >>> changes, you can always hot-load a new version of the module,
> during runtime
> >>> and the usage of the module shouldn't have problems with that,
> if it is kept
> >>> as a simple utility/library module. This problem reminds me of
> the code at
> >>> https://github.com/rambocoder/unistring and there might be
> overlap in the
> >>> goals of these two repositories.
> >>
> >>
> >>
> >> this is what the current release (1.2) does. But it doesn't
> compile in
> >> containers or machines =< 1GB. The build crash. This is why i'm
> looking at
> >> shipping a pre-compiled beam. or maybe include the data in a db.
> but for now
> >> my tests with a db file (ets) shows it's really slower 30-40ms
> vs 6ms using
> >> maps and a pre-compiled beam. Also maps use less storage
> compared to simply
> >> using function pattern matching in the beam.
> >>
> >> - benoît
> >>
> >> I think you need to switch to using function pattern matching, when
> >> keeping it in a module to keep memory usage down. Storing
> everything in a
> >> map has to deal with a big chunk of map data, but storing
> everything in the
> >> module as function pattern matching cases is just part of the
> module data
> >> (should be better for GC due to less heap usage and should be more
> >> efficient). You probably want to try and keep all the function
> pattern
> >> matching cases in-order, though it isn't mentioned as helpful at
> >> http://erlang.org/doc/efficiency_guide/functions.html#id67975
> (might affect
> >> the compiler execution, if not the efficiency of the pattern
> matching). If
> >> you used more formal processing of the unicode CSV data it will
> be easier,
> >> perhaps with a python script (instead of awk/shell-utilities, also
> >> portability is better as a single script), to create the Erlang
> module. If
> >> necessary, you could use more than a single Erlang module to
> deal with
> >> separate functions, but a single function should require a
> single module to
> >> keep its update atomic (not trying to split a function into
> multiple modules
> >> based on the input).
> >
> >
> > I agree pattern matching should be probably better than the maps
> for GC
> > (they are only 1ms faster on lookup). But the problem is really not
> > generating the module:
> >
> https://github.com/benoitc/erlang-idna/blob/v1.x/src/idna_unicode_data1.erl
> >
> > The current issue with above is the amount of RAM needed to
> compile the
> > beam. If the application is built on a machine with RAM => 1GB it
> will fail.
> > I guess I could just generate the beam with pattern matching and
> ship it
> > like I do in the "precompiled" branch . Unless some come with a
> better
> > idea, i think i will go with it. WWhat do you think? The annoying
> thing is
> > having to do the `-on_load` hack (just cause i'm lazy). Using
> rebar or
> > erlang.mk <http://erlang.mk> i wouldjust generate and ship it in
> ebin dir. But rebar3 doesn't
> > copy any content from it to its _build directory :|
> >
> >
> > - benoît
> >
> >
>
>
> Michael the idea of using integer is a good idea indded I am making the
> change so I won't have to make the transformations while running. Which
> already good.
>
> Roger I can do that but how would you link it to the build system? It
> seems for me I will need to provide a plugin for both rebar3 and
> erlang.mk <http://erlang.mk>. rebar3 is the most problematic there due
> to the "relative" arch it's using in the _build folder. I will have to
> find a way to move it in the right ebin.. Any idea?
If you follow his advice of writing an escript, then adding support for
Erlang.mk should just be a matter of adding
app:: ; escript my_escript.erl
BEAM_FILES += ebin/generated.beam
before the include erlang.mk line. If that's not enough, ping me,
because it's probably a bug.
Same also works with any other command or script.
Alternatively if you do it directly in the Makefile, same advice, except
replace the first line with:
app:: gen
BEAM_FILES += ebin/generated.beam
And then define the gen target (before, after, doesn't matter). You
could then have rebar just call "make gen".
--
Loïc Hoguin
http://ninenines.eu
Author of The Erlanger Playbook,
A book about software development using Erlang
More information about the erlang-questions
mailing list