<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 03/25/2016 03:32 PM, Benoit Chesneau
wrote:<br>
</div>
<blockquote
cite="mid:CAJNb-9ous6av4YNF-PsxaRrWffjeZgKqY-O0rWXT_FGNXQRmvQ@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<br>
<div class="gmail_quote">
<div dir="ltr">On Fri, Mar 25, 2016 at 11:19 PM Michael Truog
<<a moz-do-not-send="true"
href="mailto:mjtruog@gmail.com">mjtruog@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>On 03/25/2016 02:33 PM, Benoit Chesneau wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<br>
On Friday, March 25, 2016, Michael Truog <<a
moz-do-not-send="true"
href="mailto:mjtruog@gmail.com" target="_blank">mjtruog@gmail.com</a>>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div><br>
</div>
<tt>Having the build process generate the module
file and the beam file seems decent. There
isn't a need to build the module dynamically
(during runtime, upon startup) or store the
unicode data in global storage due to the
unicode changes being infrequent. Then, if you
do need to update due to unicode changes, you
can always hot-load a new version of the module,
during runtime and the usage of the module
shouldn't have problems with that, if it is kept
as a simple utility/library module. This
problem reminds me of the code at </tt><a
moz-do-not-send="true"
href="https://github.com/rambocoder/unistring"
target="_blank">https://github.com/rambocoder/unistring</a>
and there might be overlap in the goals of these
two repositories.<br>
</div>
</blockquote>
<div><br>
</div>
<div><br>
</div>
<div>this is what the current release (1.2) does. But
it doesn't compile in containers or machines =<
1GB. The build crash. This is why i'm looking at
shipping a pre-compiled beam. or maybe include the
data in a db. but for now my tests with a db file
(ets) shows it's really slower 30-40ms vs 6ms using
maps<span></span> and a pre-compiled beam. Also maps
use less storage compared to simply using
function pattern matching in the beam.</div>
<div><br>
</div>
<div>- benoît</div>
<div><br>
</div>
</div>
</blockquote>
</div>
<div bgcolor="#FFFFFF" text="#000000"><tt>I think you need
to switch to using function pattern matching, when
keeping it in a module to keep memory usage down.
Storing everything in a map has to deal with a big chunk
of map data, but storing everything in the module as
function pattern matching cases is just part of the
module data (should be better for GC due to less heap
usage and should be more efficient).</tt> You probably
want to try and keep all the function pattern matching
cases in-order, though it isn't mentioned as helpful at <a
moz-do-not-send="true"
href="http://erlang.org/doc/efficiency_guide/functions.html#id67975"
target="_blank">http://erlang.org/doc/efficiency_guide/functions.html#id67975</a>
(might affect the compiler execution, if not the
efficiency of the pattern matching). If you used more
formal processing of the unicode CSV data it will be
easier, perhaps with a python script (instead of
awk/shell-utilities, also portability is better as a
single script), to create the Erlang module. If
necessary, you could use more than a single Erlang module
to deal with separate functions, but a single function
should require a single module to keep its update atomic
(not trying to split a function into multiple modules
based on the input).<br>
</div>
</blockquote>
<div><br>
</div>
<div>I agree pattern matching should be probably better than
the maps for GC (they are only 1ms faster on lookup). But
the problem is really not generating the module:</div>
<div><a moz-do-not-send="true"
href="https://github.com/benoitc/erlang-idna/blob/v1.x/src/idna_unicode_data1.erl">https://github.com/benoitc/erlang-idna/blob/v1.x/src/idna_unicode_data1.erl</a><br>
</div>
<div><br>
</div>
<div>The current issue with above is the amount of RAM needed
to compile the beam. If the application is built on a
machine with RAM => 1GB it will fail. I guess I could
just generate the beam with pattern matching and ship it
like I do in the "precompiled" branch . Unless some come
with a better idea, i think i will go with it. WWhat do you
think? The annoying thing is having to do the `-on_load`
hack (just cause i'm lazy). Using rebar or <a
moz-do-not-send="true" href="http://erlang.mk">erlang.mk</a>
i wouldjust generate and ship it in ebin dir. But rebar3
doesn't copy any content from it to its _build directory :| </div>
<div><br>
</div>
<div><br>
</div>
<div>- benoît</div>
<div> </div>
</div>
</div>
</blockquote>
<tt>You can split the returned tuple into 3 separate modules with
separate functions, each for a tuple element. That should reduce
the amount of memory necessary for compilation. I know that is a
bit odd, and would make the module update process less atomic
(which is a bad thing), but as long as a separate module is used
to call the 3 separate modules (their respective functions for
each element) it can handle an error when the data is inconsistent
between them. I don't think there is a problem with treating this
as a normal module, and I don't think you will be forced to use a
beam file for deployment.<br>
<br>
<br>
</tt>
</body>
</html>