[erlang-questions] pre-load large data files when the application start

Roger Lipscombe <>
Sat Mar 26 10:31:13 CET 2016

Is it the .erl -> .beam compilation step that runs out of memory? What
happens if you use compile:forms/1,2 instead? Maybe you can write a
relatively simple escript that turns your CSV into forms and compiles
those (and then writes out the beam file)? See, for example, a script
we use to embed binary resources as beam files:

On 25 March 2016 at 22:32, Benoit Chesneau <> wrote:
> On Fri, Mar 25, 2016 at 11:19 PM Michael Truog <> wrote:
>> On 03/25/2016 02:33 PM, Benoit Chesneau wrote:
>> On Friday, March 25, 2016, Michael Truog <> wrote:
>>> Having the build process generate the module file and the beam file seems
>>> decent.  There isn't a need to build the module dynamically (during runtime,
>>> upon startup) or store the unicode data in global storage due to the unicode
>>> changes being infrequent.   Then, if you do need to update due to unicode
>>> changes, you can always hot-load a new version of the module, during runtime
>>> and the usage of the module shouldn't have problems with that, if it is kept
>>> as a simple utility/library module.  This problem reminds me of the code at
>>> https://github.com/rambocoder/unistring and there might be overlap in the
>>> goals of these two repositories.
>> this is what the current release (1.2) does. But it doesn't compile in
>> containers or machines =< 1GB. The build crash. This is why i'm looking at
>> shipping a pre-compiled beam. or maybe include the data in a db. but for now
>> my tests with a db file (ets) shows it's really slower 30-40ms vs 6ms using
>> maps and a pre-compiled beam. Also maps use less storage compared to simply
>> using function pattern matching in the beam.
>> - benoît
>> I think you need to switch to using function pattern matching, when
>> keeping it in a module to keep memory usage down.  Storing everything in a
>> map has to deal with a big chunk of map data, but storing everything in the
>> module as function pattern matching cases is just part of the module data
>> (should be better for GC due to less heap usage and should be more
>> efficient).  You probably want to try and keep all the function pattern
>> matching cases in-order, though it isn't mentioned as helpful at
>> http://erlang.org/doc/efficiency_guide/functions.html#id67975 (might affect
>> the compiler execution, if not the efficiency of the pattern matching).  If
>> you used more formal processing of the unicode CSV data it will be easier,
>> perhaps with a python script (instead of awk/shell-utilities, also
>> portability is better as a single script), to create the Erlang module.  If
>> necessary, you could use more than a single Erlang module to deal with
>> separate functions, but a single function should require a single module to
>> keep its update atomic (not trying to split a function into multiple modules
>> based on the input).
> I agree pattern matching should be probably better than the maps for GC
> (they are only 1ms faster on lookup). But the problem is really not
> generating the module:
> https://github.com/benoitc/erlang-idna/blob/v1.x/src/idna_unicode_data1.erl
> The current issue with above is the amount of RAM needed to compile the
> beam. If the application is built on a machine with RAM => 1GB it will fail.
> I guess I could just generate the beam with pattern matching and ship it
> like  I do in the "precompiled" branch . Unless some come with a better
> idea, i think i will go with it. WWhat do you think? The annoying thing is
> having to do the `-on_load` hack (just cause i'm lazy). Using rebar or
> erlang.mk i wouldjust generate and ship it in ebin dir. But rebar3 doesn't
> copy any content from it to its _build directory :|
> - benoît
> _______________________________________________
> erlang-questions mailing list
> http://erlang.org/mailman/listinfo/erlang-questions

More information about the erlang-questions mailing list