<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 03/25/2016 11:11 AM, Benoit Chesneau

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAJNb-9qfC=Yy_Zb2r1n2prijqi3EZRcoUdYz7PQ7Khz+JJ8cCA@mail.gmail.com"

      type="cite">

      <div dir="ltr"><br>

        <br>

        <div class="gmail_quote">

          <div dir="ltr">On Fri, Mar 25, 2016 at 7:06 PM Garrett Smith

            <<a moz-do-not-send="true" href="mailto:g@rre.tt">g@rre.tt</a>>

            wrote:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div dir="ltr">

              <div class="gmail_quote">

                <div dir="ltr">On Fri, Mar 25, 2016 at 12:09 PM Benoit

                  Chesneau <<a moz-do-not-send="true"

                    href="mailto:bchesneau@gmail.com" target="_blank">bchesneau@gmail.com</a>>

                  wrote:<br>

                </div>

                <blockquote class="gmail_quote" style="margin:0 0 0

                  .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  <div dir="ltr">Hi all,

                    <div><br>

                    </div>

                    <div>I have a large data file provided as comma

                      separated values (unicode data) I need to load and

                      parse it ASAP since it will be used by all the

                      functions. </div>

                  </div>

                </blockquote>

                <div><br>

                </div>

              </div>

            </div>

            <div dir="ltr">

              <div class="gmail_quote">

                <div>What's the interface?</div>

              </div>

            </div>

            <div dir="ltr">

              <div class="gmail_quote">

                <div> <br>

                </div>

                <blockquote class="gmail_quote" style="margin:0 0 0

                  .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  <div dir="ltr">

                    <div>The current implementation consists in parsing

                      the file and generate either a source file or an

                      include file that will be then compiled. My issue

                      with it for now is that the compilation will use

                      more than 1GB and then crash on small machines or

                      containers.</div>

                    <div><br>

                    </div>

                    <div>Other solutions I tried:</div>

                    <div><br>

                    </div>

                    <div>- use merl + `-onload` to build a module on

                      first call of the module (too long the first time)</div>

                    <div>- store an ets file and load it later, which

                      can be an issue if you need to create an escript

                      will all modules later</div>

                    <div>- load an parse in a gen_server (same result as

                      using merl)</div>

                    <div><br>

                    </div>

                    <div>Thinks I have in mind:</div>

                    <div><br>

                    </div>

                    <div>- generate a DETS file or small binary tree on

                      disk and cache the content on demand</div>

                    <div>- generate a beam and ship it</div>

                    <div><br>

                    </div>

                    <div>Is there anything else I can do?  I am curious

                      how others are doing in that case. </div>

                  </div>

                </blockquote>

                <div><br>

                </div>

              </div>

            </div>

            <div dir="ltr">

              <div class="gmail_quote">

                <div>I think this depends entirely on your interface :)</div>

                <div><br>

                </div>

                <div>Do you have to scan the entire table? If so why? If

                  not, why not treat this as a indexing problem and

                  start from disk, assuming you can defer loading of any

                  data until it's read?</div>

              </div>

            </div>

          </blockquote>

          <div><br>

          </div>

          <div><br>

          </div>

          <div>Sorry I should have just posted the code I was working on

            (the advantage of working on opensource stuffs).</div>

          <div><br>

          </div>

          <div>The code I'm referring is here : <a

              moz-do-not-send="true"

              href="https://github.com/benoitc/erlang-idna">https://github.com/benoitc/erlang-idna</a> </div>

          <div>and the recent change I describe: <a

              moz-do-not-send="true"

              href="https://github.com/benoitc/erlang-idna/tree/precompile">https://github.com/benoitc/erlang-idna/tree/precompile</a></div>

          <div><br>

          </div>

          <div>The table really need to be in memory somehow or need to

            be accessed very fast while reading it, since it will be

            used to encode any domain names used in a requests (can be

            xmpp, http..) .</div>

          <div><br>

          </div>

          <div>It basically check the code for each chars in a string

            and try to compose/decompose  it.</div>

          <div><br>

          </div>

          <div>- benoît</div>

        </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

erlang-questions mailing list

<a class="moz-txt-link-abbreviated" href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a>

<a class="moz-txt-link-freetext" href="http://erlang.org/mailman/listinfo/erlang-questions">http://erlang.org/mailman/listinfo/erlang-questions</a>

</pre>

    </blockquote>

    <br>

    <tt>Having the build process generate the module file and the beam

      file seems decent.  There isn't a need to build the module

      dynamically (during runtime, upon startup) or store the unicode

      data in global storage due to the unicode changes being

      infrequent.   Then, if you do need to update due to unicode

      changes, you can always hot-load a new version of the module,

      during runtime and the usage of the module shouldn't have problems

      with that, if it is kept as a simple utility/library module.  This

      problem reminds me of the code at </tt><a class="moz-txt-link-freetext" href="https://github.com/rambocoder/unistring">https://github.com/rambocoder/unistring</a>

    and there might be overlap in the goals of these two repositories.<br>

  </body>

</html>