<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 03/25/2016 04:11 PM, Michael Truog

      wrote:<br>

    </div>

    <blockquote cite="mid:56F5C5BC.2010207@gmail.com" type="cite">

      <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

      <div class="moz-cite-prefix">On 03/25/2016 03:32 PM, Benoit

        Chesneau wrote:<br>

      </div>

      <blockquote

cite="mid:CAJNb-9ous6av4YNF-PsxaRrWffjeZgKqY-O0rWXT_FGNXQRmvQ@mail.gmail.com"

        type="cite">

        <div dir="ltr"><br>

          <br>

          <div class="gmail_quote">

            <div dir="ltr">On Fri, Mar 25, 2016 at 11:19 PM Michael

              Truog <<a moz-do-not-send="true"

                href="mailto:mjtruog@gmail.com">mjtruog@gmail.com</a>>

              wrote:<br>

            </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000">

                <div>On 03/25/2016 02:33 PM, Benoit Chesneau wrote:<br>

                </div>

                <blockquote type="cite">

                  <div dir="ltr"><br>

                    <br>

                    On Friday, March 25, 2016, Michael Truog <<a

                      moz-do-not-send="true"

                      href="mailto:mjtruog@gmail.com" target="_blank">mjtruog@gmail.com</a>>

                    wrote:<br>

                    <blockquote class="gmail_quote" style="margin:0 0 0

                      .8ex;border-left:1px #ccc solid;padding-left:1ex">

                      <div bgcolor="#FFFFFF" text="#000000">

                        <div><br>

                        </div>

                        <tt>Having the build process generate the module

                          file and the beam file seems decent.Â  There

                          isn't a need to build the module dynamically

                          (during runtime, upon startup) or store the

                          unicode data in global storage due to the

                          unicode changes being infrequent.Â Â  Then, if

                          you do need to update due to unicode changes,

                          you can always hot-load a new version of the

                          module, during runtime and the usage of the

                          module shouldn't have problems with that, if

                          it is kept as a simple utility/library

                          module.Â  This problem reminds me of the code

                          at </tt><a moz-do-not-send="true"

                          href="https://github.com/rambocoder/unistring"

                          target="_blank">https://github.com/rambocoder/unistring</a>

                        and there might be overlap in the goals of these

                        two repositories.<br>

                      </div>

                    </blockquote>

                    <div><br>

                    </div>

                    <div><br>

                    </div>

                    <div>this is what the current release (1.2) does.

                      But it doesn't compileÂ in containers or machines

                      =< 1GB. The build crash. This is why i'm

                      looking at shipping a pre-compiled beam. or maybe

                      include the data in a db. but for now my tests

                      with a db file (ets)Â shows it's really slower

                      30-40ms vs 6ms using maps<span></span>Â and a

                      pre-compiled beam. Also maps use less storage

                      compared to simply using functionÂ pattern matching

                      in the beam.</div>

                    <div><br>

                    </div>

                    <div>- benoÃ®t</div>

                    <div><br>

                    </div>

                  </div>

                </blockquote>

              </div>

              <div bgcolor="#FFFFFF" text="#000000"><tt>I think you need

                  to switch to using function pattern matching, when

                  keeping it in a module to keep memory usage down.Â 

                  Storing everything in a map has to deal with a big

                  chunk of map data, but storing everything in the

                  module as function pattern matching cases is just part

                  of the module data (should be better for GC due to

                  less heap usage and should be more efficient).</tt>Â 

                You probably want to try and keep all the function

                pattern matching cases in-order, though it isn't

                mentioned as helpful at <a moz-do-not-send="true"

                  href="http://erlang.org/doc/efficiency_guide/functions.html#id67975"

                  target="_blank">http://erlang.org/doc/efficiency_guide/functions.html#id67975</a>

                (might affect the compiler execution, if not the

                efficiency of the pattern matching).Â  If you used more

                formal processing of the unicode CSV data it will be

                easier, perhaps with a python script (instead of

                awk/shell-utilities, also portability is better as a

                single script), to create the Erlang module.Â  If

                necessary, you could use more than a single Erlang

                module to deal with separate functions, but a single

                function should require a single module to keep its

                update atomic (not trying to split a function into

                multiple modules based on the input).<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>I agree pattern matching should be probably better than

              the maps for GC (they are only 1ms faster on lookup). But

              the problem is really not generating the module:</div>

            <div><a moz-do-not-send="true"

href="https://github.com/benoitc/erlang-idna/blob/v1.x/src/idna_unicode_data1.erl">https://github.com/benoitc/erlang-idna/blob/v1.x/src/idna_unicode_data1.erl</a><br>

            </div>

            <div><br>

            </div>

            <div>The current issue with above is the amount of RAM

              needed to compile the beam. If the application is built on

              a machine with RAM => 1GB it will fail.Â  I guess I

              could just generate the beam with pattern matching and

              ship it like Â I do in the "precompiled" branch . Unless

              some come with a better idea, i think i will go with it.

              WWhat do you think? The annoying thing is having to do the

              `-on_load` hack (just cause i'm lazy). Using rebar or <a

                moz-do-not-send="true" href="http://erlang.mk">erlang.mk</a>

              i wouldjust generate and ship it in ebin dir. But rebar3

              doesn't copy any content from it to its _build directory

              :|Â </div>

            <div><br>

            </div>

            <div><br>

            </div>

            <div>- benoÃ®t</div>

            <div>Â </div>

          </div>

        </div>

      </blockquote>

      <tt>You can split the returned tuple into 3 separate modules with

        separate functions, each for a tuple element.Â  That should

        reduce the amount of memory necessary for compilation.Â  I know

        that is a bit odd, and would make the module update process less

        atomic (which is a bad thing), but as long as a separate module

        is used to call the 3 separate modules (their respective

        functions for each element) it can handle an error when the data

        is inconsistent between them.Â  I don't think there is a problem

        with treating this as a normal module, and I don't think you

        will be forced to use a beam file for deployment.<br>

        <br>

      </tt></blockquote>

    <tt>Also, forgot to mention, it is better if you deal with the data

      as integers rather than strings, and that approach saves memory

      while being more efficient.Â  That is a better change to do, before

      splitting up the tuple return type, then if necessary, split the

      tuple return type.Â  Just use the hexadecimal format for integers

      as necessary.Â  The result may need a list of hexadecimal integers,

      but that is better than a longer string.</tt><br>

    <br>

  </body>

</html>