[erlang-questions] pre-load large data files when the application start

Fri Mar 25 23:19:12 CET 2016

On 03/25/2016 02:33 PM, Benoit Chesneau wrote:
>
>
> On Friday, March 25, 2016, Michael Truog <mjtruog@REDACTED <mailto:mjtruog@REDACTED>> wrote:
>
>
>     Having the build process generate the module file and the beam file seems decent.  There isn't a need to build the module dynamically (during runtime, upon startup) or store the unicode data in global storage due to the unicode changes being infrequent.   Then, if you do need to update due to unicode changes, you can always hot-load a new version of the module, during runtime and the usage of the module shouldn't have problems with that, if it is kept as a simple utility/library module.  This problem reminds me of the code at https://github.com/rambocoder/unistring and there might be overlap in the goals of these two repositories.
>
>
>
> this is what the current release (1.2) does. But it doesn't compile in containers or machines =< 1GB. The build crash. This is why i'm looking at shipping a pre-compiled beam. or maybe include the data in a db. but for now my tests with a db file (ets) shows it's really slower 30-40ms vs 6ms using maps and a pre-compiled beam. Also maps use less storage compared to simply using function pattern matching in the beam.
>
> - benoît
>
I think you need to switch to using function pattern matching, when keeping it in a module to keep memory usage down.  Storing everything in a map has to deal with a big chunk of map data, but storing everything in the module as function pattern matching cases is just part of the module data (should be better for GC due to less heap usage and should be more efficient).  You probably want to try and keep all the function pattern matching cases in-order, though it isn't mentioned as helpful at http://erlang.org/doc/efficiency_guide/functions.html#id67975 (might affect the compiler execution, if not the efficiency of the pattern matching).  If you used more formal processing of the unicode CSV data it will be easier, perhaps with a python script (instead of awk/shell-utilities, also portability is better as a single script), to create the Erlang module.  If necessary, you could use more than a single Erlang module to deal with separate functions, but a single function should 
require a single module to keep its update atomic (not trying to split a function into multiple modules based on the input).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160325/5ff5d47f/attachment.htm>