<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Sat, Mar 26, 2016 at 10:31 AM Roger Lipscombe <<a href="mailto:roger@differentpla.net">roger@differentpla.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Is it the .erl -> .beam compilation step that runs out of memory? What<br>

happens if you use compile:forms/1,2 instead? Maybe you can write a<br>

relatively simple escript that turns your CSV into forms and compiles<br>

those (and then writes out the beam file)? See, for example, a script<br>

we use to embed binary resources as beam files:<br>

<a href="https://gist.github.com/rlipscombe/770ce8fc75add11e16f1" rel="noreferrer" target="_blank">https://gist.github.com/rlipscombe/770ce8fc75add11e16f1</a><br>

<br>

On 25 March 2016 at 22:32, Benoit Chesneau <<a href="mailto:bchesneau@gmail.com" target="_blank">bchesneau@gmail.com</a>> wrote:<br>

><br>

><br>

> On Fri, Mar 25, 2016 at 11:19 PM Michael Truog <<a href="mailto:mjtruog@gmail.com" target="_blank">mjtruog@gmail.com</a>> wrote:<br>

>><br>

>> On 03/25/2016 02:33 PM, Benoit Chesneau wrote:<br>

>><br>

>><br>

>><br>

>> On Friday, March 25, 2016, Michael Truog <<a href="mailto:mjtruog@gmail.com" target="_blank">mjtruog@gmail.com</a>> wrote:<br>

>>><br>

>>><br>

>>> Having the build process generate the module file and the beam file seems<br>

>>> decent.  There isn't a need to build the module dynamically (during runtime,<br>

>>> upon startup) or store the unicode data in global storage due to the unicode<br>

>>> changes being infrequent.   Then, if you do need to update due to unicode<br>

>>> changes, you can always hot-load a new version of the module, during runtime<br>

>>> and the usage of the module shouldn't have problems with that, if it is kept<br>

>>> as a simple utility/library module.  This problem reminds me of the code at<br>

>>> <a href="https://github.com/rambocoder/unistring" rel="noreferrer" target="_blank">https://github.com/rambocoder/unistring</a> and there might be overlap in the<br>

>>> goals of these two repositories.<br>

>><br>

>><br>

>><br>

>> this is what the current release (1.2) does. But it doesn't compile in<br>

>> containers or machines =< 1GB. The build crash. This is why i'm looking at<br>

>> shipping a pre-compiled beam. or maybe include the data in a db. but for now<br>

>> my tests with a db file (ets) shows it's really slower 30-40ms vs 6ms using<br>

>> maps and a pre-compiled beam. Also maps use less storage compared to simply<br>

>> using function pattern matching in the beam.<br>

>><br>

>> - benoît<br>

>><br>

>> I think you need to switch to using function pattern matching, when<br>

>> keeping it in a module to keep memory usage down.  Storing everything in a<br>

>> map has to deal with a big chunk of map data, but storing everything in the<br>

>> module as function pattern matching cases is just part of the module data<br>

>> (should be better for GC due to less heap usage and should be more<br>

>> efficient).  You probably want to try and keep all the function pattern<br>

>> matching cases in-order, though it isn't mentioned as helpful at<br>

>> <a href="http://erlang.org/doc/efficiency_guide/functions.html#id67975" rel="noreferrer" target="_blank">http://erlang.org/doc/efficiency_guide/functions.html#id67975</a> (might affect<br>

>> the compiler execution, if not the efficiency of the pattern matching).  If<br>

>> you used more formal processing of the unicode CSV data it will be easier,<br>

>> perhaps with a python script (instead of awk/shell-utilities, also<br>

>> portability is better as a single script), to create the Erlang module.  If<br>

>> necessary, you could use more than a single Erlang module to deal with<br>

>> separate functions, but a single function should require a single module to<br>

>> keep its update atomic (not trying to split a function into multiple modules<br>

>> based on the input).<br>

><br>

><br>

> I agree pattern matching should be probably better than the maps for GC<br>

> (they are only 1ms faster on lookup). But the problem is really not<br>

> generating the module:<br>

> <a href="https://github.com/benoitc/erlang-idna/blob/v1.x/src/idna_unicode_data1.erl" rel="noreferrer" target="_blank">https://github.com/benoitc/erlang-idna/blob/v1.x/src/idna_unicode_data1.erl</a><br>

><br>

> The current issue with above is the amount of RAM needed to compile the<br>

> beam. If the application is built on a machine with RAM => 1GB it will fail.<br>

> I guess I could just generate the beam with pattern matching and ship it<br>

> like  I do in the "precompiled" branch . Unless some come with a better<br>

> idea, i think i will go with it. WWhat do you think? The annoying thing is<br>

> having to do the `-on_load` hack (just cause i'm lazy). Using rebar or<br>

> <a href="http://erlang.mk" rel="noreferrer" target="_blank">erlang.mk</a> i wouldjust generate and ship it in ebin dir. But rebar3 doesn't<br>

> copy any content from it to its _build directory :|<br>

><br>

><br>

> - benoît<br>

><br>

><br></blockquote><div><br></div><div>Michael the idea of using integer is a good idea indded I am making the change so I won't have to make the transformations while running. Which already good. <br></div><div><br></div><div>Roger I can do that but how would you link it to the build system? It seems for me I will need to provide a plugin for both rebar3 and <a href="http://erlang.mk">erlang.mk</a>. rebar3 is the most problematic there due to the "relative" arch it's using in the _build folder. I will have to find a way to move it in the right ebin.. Any idea?<br></div><div><br></div><div>- benoit</div></div></div>