<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 03/25/2016 11:11 AM, Benoit Chesneau
wrote:<br>
</div>
<blockquote
cite="mid:CAJNb-9qfC=Yy_Zb2r1n2prijqi3EZRcoUdYz7PQ7Khz+JJ8cCA@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<br>
<div class="gmail_quote">
<div dir="ltr">On Fri, Mar 25, 2016 at 7:06 PM Garrett Smith
<<a moz-do-not-send="true" href="mailto:g@rre.tt">g@rre.tt</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr">On Fri, Mar 25, 2016 at 12:09 PM Benoit
Chesneau <<a moz-do-not-send="true"
href="mailto:bchesneau@gmail.com" target="_blank">bchesneau@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Hi all,
<div><br>
</div>
<div>I have a large data file provided as comma
separated values (unicode data) I need to load and
parse it ASAP since it will be used by all the
functions. </div>
</div>
</blockquote>
<div><br>
</div>
</div>
</div>
<div dir="ltr">
<div class="gmail_quote">
<div>What's the interface?</div>
</div>
</div>
<div dir="ltr">
<div class="gmail_quote">
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div>The current implementation consists in parsing
the file and generate either a source file or an
include file that will be then compiled. My issue
with it for now is that the compilation will use
more than 1GB and then crash on small machines or
containers.</div>
<div><br>
</div>
<div>Other solutions I tried:</div>
<div><br>
</div>
<div>- use merl + `-onload` to build a module on
first call of the module (too long the first time)</div>
<div>- store an ets file and load it later, which
can be an issue if you need to create an escript
will all modules later</div>
<div>- load an parse in a gen_server (same result as
using merl)</div>
<div><br>
</div>
<div>Thinks I have in mind:</div>
<div><br>
</div>
<div>- generate a DETS file or small binary tree on
disk and cache the content on demand</div>
<div>- generate a beam and ship it</div>
<div><br>
</div>
<div>Is there anything else I can do? I am curious
how others are doing in that case. </div>
</div>
</blockquote>
<div><br>
</div>
</div>
</div>
<div dir="ltr">
<div class="gmail_quote">
<div>I think this depends entirely on your interface :)</div>
<div><br>
</div>
<div>Do you have to scan the entire table? If so why? If
not, why not treat this as a indexing problem and
start from disk, assuming you can defer loading of any
data until it's read?</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div><br>
</div>
<div>Sorry I should have just posted the code I was working on
(the advantage of working on opensource stuffs).</div>
<div><br>
</div>
<div>The code I'm referring is here : <a
moz-do-not-send="true"
href="https://github.com/benoitc/erlang-idna">https://github.com/benoitc/erlang-idna</a> </div>
<div>and the recent change I describe: <a
moz-do-not-send="true"
href="https://github.com/benoitc/erlang-idna/tree/precompile">https://github.com/benoitc/erlang-idna/tree/precompile</a></div>
<div><br>
</div>
<div>The table really need to be in memory somehow or need to
be accessed very fast while reading it, since it will be
used to encode any domain names used in a requests (can be
xmpp, http..) .</div>
<div><br>
</div>
<div>It basically check the code for each chars in a string
and try to compose/decompose it.</div>
<div><br>
</div>
<div>- benoît</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
erlang-questions mailing list
<a class="moz-txt-link-abbreviated" href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a>
<a class="moz-txt-link-freetext" href="http://erlang.org/mailman/listinfo/erlang-questions">http://erlang.org/mailman/listinfo/erlang-questions</a>
</pre>
</blockquote>
<br>
<tt>Having the build process generate the module file and the beam
file seems decent. There isn't a need to build the module
dynamically (during runtime, upon startup) or store the unicode
data in global storage due to the unicode changes being
infrequent. Then, if you do need to update due to unicode
changes, you can always hot-load a new version of the module,
during runtime and the usage of the module shouldn't have problems
with that, if it is kept as a simple utility/library module. This
problem reminds me of the code at </tt><a class="moz-txt-link-freetext" href="https://github.com/rambocoder/unistring">https://github.com/rambocoder/unistring</a>
and there might be overlap in the goals of these two repositories.<br>
</body>
</html>