[erlang-questions] pre-load large data files when the application start

Fri Mar 25 23:32:59 CET 2016

On Fri, Mar 25, 2016 at 11:19 PM Michael Truog <mjtruog@REDACTED> wrote:

> On 03/25/2016 02:33 PM, Benoit Chesneau wrote:
>
>
>
> On Friday, March 25, 2016, Michael Truog <mjtruog@REDACTED> wrote:
>
>>
>> Having the build process generate the module file and the beam file seems
>> decent.  There isn't a need to build the module dynamically (during
>> runtime, upon startup) or store the unicode data in global storage due to
>> the unicode changes being infrequent.   Then, if you do need to update due
>> to unicode changes, you can always hot-load a new version of the module,
>> during runtime and the usage of the module shouldn't have problems with
>> that, if it is kept as a simple utility/library module.  This problem
>> reminds me of the code at https://github.com/rambocoder/unistring and
>> there might be overlap in the goals of these two repositories.
>>
>
>
> this is what the current release (1.2) does. But it doesn't compile in
> containers or machines =< 1GB. The build crash. This is why i'm looking at
> shipping a pre-compiled beam. or maybe include the data in a db. but for
> now my tests with a db file (ets) shows it's really slower 30-40ms vs 6ms
> using maps and a pre-compiled beam. Also maps use less storage compared
> to simply using function pattern matching in the beam.
>
> - benoît
>
> I think you need to switch to using function pattern matching, when
> keeping it in a module to keep memory usage down.  Storing everything in a
> map has to deal with a big chunk of map data, but storing everything in the
> module as function pattern matching cases is just part of the module data
> (should be better for GC due to less heap usage and should be more
> efficient).  You probably want to try and keep all the function pattern
> matching cases in-order, though it isn't mentioned as helpful at
> http://erlang.org/doc/efficiency_guide/functions.html#id67975 (might
> affect the compiler execution, if not the efficiency of the pattern
> matching).  If you used more formal processing of the unicode CSV data it
> will be easier, perhaps with a python script (instead of
> awk/shell-utilities, also portability is better as a single script), to
> create the Erlang module.  If necessary, you could use more than a single
> Erlang module to deal with separate functions, but a single function should
> require a single module to keep its update atomic (not trying to split a
> function into multiple modules based on the input).
>

I agree pattern matching should be probably better than the maps for GC
(they are only 1ms faster on lookup). But the problem is really not
generating the module:
https://github.com/benoitc/erlang-idna/blob/v1.x/src/idna_unicode_data1.erl

The current issue with above is the amount of RAM needed to compile the
beam. If the application is built on a machine with RAM => 1GB it will
fail.  I guess I could just generate the beam with pattern matching and
ship it like  I do in the "precompiled" branch . Unless some come with a
better idea, i think i will go with it. WWhat do you think? The annoying
thing is having to do the `-on_load` hack (just cause i'm lazy). Using
rebar or erlang.mk i wouldjust generate and ship it in ebin dir. But rebar3
doesn't copy any content from it to its _build directory :|

- benoît
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160325/1bae5c37/attachment.htm>