[erlang-questions] pre-load large data files when the application start

Fri Mar 25 19:49:12 CET 2016

On 03/25/2016 11:11 AM, Benoit Chesneau wrote:
>
>
> On Fri, Mar 25, 2016 at 7:06 PM Garrett Smith <g@REDACTED <mailto:g@REDACTED>> wrote:
>
>     On Fri, Mar 25, 2016 at 12:09 PM Benoit Chesneau <bchesneau@REDACTED <mailto:bchesneau@REDACTED>> wrote:
>
>         Hi all,
>
>         I have a large data file provided as comma separated values (unicode data) I need to load and parse it ASAP since it will be used by all the functions.
>
>
>     What's the interface?
>
>         The current implementation consists in parsing the file and generate either a source file or an include file that will be then compiled. My issue with it for now is that the compilation will use more than 1GB and then crash on small machines or containers.
>
>         Other solutions I tried:
>
>         - use merl + `-onload` to build a module on first call of the module (too long the first time)
>         - store an ets file and load it later, which can be an issue if you need to create an escript will all modules later
>         - load an parse in a gen_server (same result as using merl)
>
>         Thinks I have in mind:
>
>         - generate a DETS file or small binary tree on disk and cache the content on demand
>         - generate a beam and ship it
>
>         Is there anything else I can do?  I am curious how others are doing in that case.
>
>
>     I think this depends entirely on your interface :)
>
>     Do you have to scan the entire table? If so why? If not, why not treat this as a indexing problem and start from disk, assuming you can defer loading of any data until it's read?
>
>
>
> Sorry I should have just posted the code I was working on (the advantage of working on opensource stuffs).
>
> The code I'm referring is here : https://github.com/benoitc/erlang-idna
> and the recent change I describe: https://github.com/benoitc/erlang-idna/tree/precompile
>
> The table really need to be in memory somehow or need to be accessed very fast while reading it, since it will be used to encode any domain names used in a requests (can be xmpp, http..) .
>
> It basically check the code for each chars in a string and try to compose/decompose  it.
>
> - benoît
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

Having the build process generate the module file and the beam file seems decent.  There isn't a need to build the module dynamically (during runtime, upon startup) or store the unicode data in global storage due to the unicode changes being infrequent.   Then, if you do need to update due to unicode changes, you can always hot-load a new version of the module, during runtime and the usage of the module shouldn't have problems with that, if it is kept as a simple utility/library module.  This problem reminds me of the code at https://github.com/rambocoder/unistring and there might be overlap in the goals of these two repositories.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160325/7c2cb896/attachment.htm>