[erlang-questions] pre-load large data files when the application start
Benoit Chesneau
bchesneau@REDACTED
Fri Mar 25 19:03:25 CET 2016
On Fri, Mar 25, 2016 at 6:50 PM zxq9 <zxq9@REDACTED> wrote:
> On 2016年3月25日 金曜日 17:08:49 Benoit Chesneau wrote:
> > Hi all,
> >
> > I have a large data file provided as comma separated values (unicode
> data)
> > I need to load and parse it ASAP since it will be used by all the
> > functions.
>
> ...snip...
>
> > Is there anything else I can do? I am curious how others are doing in
> that
> > case.
>
> Does it all need to be in memory all the time?
>
> Based on whether or not this is true and the context of use, I opt for
>
> - generate a smaller, more Erlangish version of the dataset
> (what you're doing with DETS, for example)
> - load it into a database that is a common resource
> (not always an option)
> - write a routine that makes smarter use of file reads than loading
> everything at once -- this can be surprisingly fast, even in Erlang,
> and be made to utilize a fixed amount of memory
> (but is not always a good fit for the problem)
>
> But use-case drives everything.
>
> Honestly, you're one of the guys I tend to grep posts from when looking for
> answers to my own questions, so I reckon my ideas above are things you have
> already considered.
>
> Also, with regard to datasets in general, if there is any way to rule out
> any of the data on load, a combination of a filter + a constant memory
> read-in can be a big win if you do need it all in memory, but have some
> criteria by which the data you need all at once can be reduced (again,
> though, not always the case).
>
> -Craig
>
Heh, thanks :)
DETS sounds a good idea. I didn't try it yet. Doing it now.
In the mean time I tried another idea i still need to test on different
platforms:
- I build the beam first and place it in priv dir
- when the application is loaded, I load the beam in memory using
`code:load_binary/3`
The code is here:
https://github.com/benoitc/erlang-idna/tree/precompile
The beam is generated using a simple shell file for now:
https://github.com/benoitc/erlang-idna/blob/precompile/mkdata.sh
and loaded here:
https://github.com/benoitc/erlang-idna/blob/precompile/src/idna_unicode.erl#L3
https://github.com/benoitc/erlang-idna/blob/precompile/src/idna_unicode.erl#L216-L226
Tests pass. Right now the only annoying thing with it is the need to also
include that separated beam file when you escriptize but I guess I can live
with it.
I also need to check what is the impact of using such trick when you need
to upgrade the application (ie how is handle the -on_load attribute). Maybe
I don't need that. Just copying the beam to the correct ebin dir should be
enough. Doing it for all build tools around (erlang.mk, rebar3, rebar, ..)
makes it had though.
Thoughts?
- benoit
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160325/02b95493/attachment.htm>
More information about the erlang-questions
mailing list