[erlang-questions] pre-load large data files when the application start

Benoit Chesneau bchesneau@REDACTED
Fri Mar 25 19:03:25 CET 2016


On Fri, Mar 25, 2016 at 6:50 PM zxq9 <zxq9@REDACTED> wrote:

> On 2016年3月25日 金曜日 17:08:49 Benoit Chesneau wrote:
> > Hi all,
> >
> > I have a large data file provided as comma separated values (unicode
> data)
> > I need to load and parse it ASAP since it will be used by all the
> > functions.
>
> ...snip...
>
> > Is there anything else I can do?  I am curious how others are doing in
> that
> > case.
>
> Does it all need to be in memory all the time?
>
> Based on whether or not this is true and the context of use, I opt for
>
> - generate a smaller, more Erlangish version of the dataset
>   (what you're doing with DETS, for example)
> - load it into a database that is a common resource
>   (not always an option)
> - write a routine that makes smarter use of file reads than loading
>   everything at once -- this can be surprisingly fast, even in Erlang,
>   and be made to utilize a fixed amount of memory
>   (but is not always a good fit for the problem)
>
> But use-case drives everything.
>
> Honestly, you're one of the guys I tend to grep posts from when looking for
> answers to my own questions, so I reckon my ideas above are things you have
> already considered.
>
> Also, with regard to datasets in general, if there is any way to rule out
> any of the data on load, a combination of a filter + a constant memory
> read-in can be a big win if you do need it all in memory, but have some
> criteria by which the data you need all at once can be reduced (again,
> though, not always the case).
>
> -Craig
>


Heh, thanks :)

 DETS sounds a good idea. I didn't try it yet. Doing it now.

In the mean time I tried another idea i still need to test on different
platforms:

- I build the beam first and place it in priv dir
- when the application is loaded, I load the beam in memory using
`code:load_binary/3`

The code is here:
https://github.com/benoitc/erlang-idna/tree/precompile

The beam is generated using a simple shell file for now:
https://github.com/benoitc/erlang-idna/blob/precompile/mkdata.sh

and loaded here:
https://github.com/benoitc/erlang-idna/blob/precompile/src/idna_unicode.erl#L3
https://github.com/benoitc/erlang-idna/blob/precompile/src/idna_unicode.erl#L216-L226

Tests pass. Right now the only annoying thing with it is the need to also
include that separated beam file when you escriptize but I guess I can live
with it.

I also need to check what is the impact of using such trick when you need
to upgrade the application (ie how is handle the -on_load attribute). Maybe
I don't need that. Just copying the beam to the correct ebin dir should be
enough. Doing it for all build tools around (erlang.mk, rebar3, rebar, ..)
makes it had though.

Thoughts?

- benoit
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160325/02b95493/attachment.htm>


More information about the erlang-questions mailing list