[erlang-questions] pre-load large data files when the application start

Fri Mar 25 18:50:10 CET 2016

On 2016年3月25日 金曜日 17:08:49 Benoit Chesneau wrote:
> Hi all,
> 
> I have a large data file provided as comma separated values (unicode data)
> I need to load and parse it ASAP since it will be used by all the
> functions.

...snip...

> Is there anything else I can do?  I am curious how others are doing in that
> case.

Does it all need to be in memory all the time?

Based on whether or not this is true and the context of use, I opt for

- generate a smaller, more Erlangish version of the dataset
  (what you're doing with DETS, for example)
- load it into a database that is a common resource
  (not always an option)
- write a routine that makes smarter use of file reads than loading
  everything at once -- this can be surprisingly fast, even in Erlang,
  and be made to utilize a fixed amount of memory
  (but is not always a good fit for the problem)

But use-case drives everything.

Honestly, you're one of the guys I tend to grep posts from when looking for
answers to my own questions, so I reckon my ideas above are things you have
already considered.

Also, with regard to datasets in general, if there is any way to rule out
any of the data on load, a combination of a filter + a constant memory
read-in can be a big win if you do need it all in memory, but have some
criteria by which the data you need all at once can be reduced (again,
though, not always the case).

-Craig