<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 03/25/2016 03:32 PM, Benoit Chesneau

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAJNb-9ous6av4YNF-PsxaRrWffjeZgKqY-O0rWXT_FGNXQRmvQ@mail.gmail.com"

      type="cite">

      <div dir="ltr"><br>

        <br>

        <div class="gmail_quote">

          <div dir="ltr">On Fri, Mar 25, 2016 at 11:19 PM Michael Truog

            <<a moz-do-not-send="true"

              href="mailto:mjtruog@gmail.com">mjtruog@gmail.com</a>>

            wrote:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div bgcolor="#FFFFFF" text="#000000">

              <div>On 03/25/2016 02:33 PM, Benoit Chesneau wrote:<br>

              </div>

              <blockquote type="cite">

                <div dir="ltr"><br>

                  <br>

                  On Friday, March 25, 2016, Michael Truog <<a

                    moz-do-not-send="true"

                    href="mailto:mjtruog@gmail.com" target="_blank">mjtruog@gmail.com</a>>

                  wrote:<br>

                  <blockquote class="gmail_quote" style="margin:0 0 0

                    .8ex;border-left:1px #ccc solid;padding-left:1ex">

                    <div bgcolor="#FFFFFF" text="#000000">

                      <div><br>

                      </div>

                      <tt>Having the build process generate the module

                        file and the beam file seems decent.  There

                        isn't a need to build the module dynamically

                        (during runtime, upon startup) or store the

                        unicode data in global storage due to the

                        unicode changes being infrequent.   Then, if you

                        do need to update due to unicode changes, you

                        can always hot-load a new version of the module,

                        during runtime and the usage of the module

                        shouldn't have problems with that, if it is kept

                        as a simple utility/library module.  This

                        problem reminds me of the code at </tt><a

                        moz-do-not-send="true"

                        href="https://github.com/rambocoder/unistring"

                        target="_blank">https://github.com/rambocoder/unistring</a>

                      and there might be overlap in the goals of these

                      two repositories.<br>

                    </div>

                  </blockquote>

                  <div><br>

                  </div>

                  <div><br>

                  </div>

                  <div>this is what the current release (1.2) does. But

                    it doesn't compile in containers or machines =<

                    1GB. The build crash. This is why i'm looking at

                    shipping a pre-compiled beam. or maybe include the

                    data in a db. but for now my tests with a db file

                    (ets) shows it's really slower 30-40ms vs 6ms using

                    maps<span></span> and a pre-compiled beam. Also maps

                    use less storage compared to simply using

                    function pattern matching in the beam.</div>

                  <div><br>

                  </div>

                  <div>- benoît</div>

                  <div><br>

                  </div>

                </div>

              </blockquote>

            </div>

            <div bgcolor="#FFFFFF" text="#000000"><tt>I think you need

                to switch to using function pattern matching, when

                keeping it in a module to keep memory usage down. 

                Storing everything in a map has to deal with a big chunk

                of map data, but storing everything in the module as

                function pattern matching cases is just part of the

                module data (should be better for GC due to less heap

                usage and should be more efficient).</tt>  You probably

              want to try and keep all the function pattern matching

              cases in-order, though it isn't mentioned as helpful at <a

                moz-do-not-send="true"

                href="http://erlang.org/doc/efficiency_guide/functions.html#id67975"

                target="_blank">http://erlang.org/doc/efficiency_guide/functions.html#id67975</a>

              (might affect the compiler execution, if not the

              efficiency of the pattern matching).  If you used more

              formal processing of the unicode CSV data it will be

              easier, perhaps with a python script (instead of

              awk/shell-utilities, also portability is better as a

              single script), to create the Erlang module.  If

              necessary, you could use more than a single Erlang module

              to deal with separate functions, but a single function

              should require a single module to keep its update atomic

              (not trying to split a function into multiple modules

              based on the input).<br>

            </div>

          </blockquote>

          <div><br>

          </div>

          <div>I agree pattern matching should be probably better than

            the maps for GC (they are only 1ms faster on lookup). But

            the problem is really not generating the module:</div>

          <div><a moz-do-not-send="true"

href="https://github.com/benoitc/erlang-idna/blob/v1.x/src/idna_unicode_data1.erl">https://github.com/benoitc/erlang-idna/blob/v1.x/src/idna_unicode_data1.erl</a><br>

          </div>

          <div><br>

          </div>

          <div>The current issue with above is the amount of RAM needed

            to compile the beam. If the application is built on a

            machine with RAM => 1GB it will fail.  I guess I could

            just generate the beam with pattern matching and ship it

            like  I do in the "precompiled" branch . Unless some come

            with a better idea, i think i will go with it. WWhat do you

            think? The annoying thing is having to do the `-on_load`

            hack (just cause i'm lazy). Using rebar or <a

              moz-do-not-send="true" href="http://erlang.mk">erlang.mk</a>

            i wouldjust generate and ship it in ebin dir. But rebar3

            doesn't copy any content from it to its _build directory :| </div>

          <div><br>

          </div>

          <div><br>

          </div>

          <div>- benoît</div>

          <div> </div>

        </div>

      </div>

    </blockquote>

    <tt>You can split the returned tuple into 3 separate modules with

      separate functions, each for a tuple element.  That should reduce

      the amount of memory necessary for compilation.  I know that is a

      bit odd, and would make the module update process less atomic

      (which is a bad thing), but as long as a separate module is used

      to call the 3 separate modules (their respective functions for

      each element) it can handle an error when the data is inconsistent

      between them.  I don't think there is a problem with treating this

      as a normal module, and I don't think you will be forced to use a

      beam file for deployment.<br>

      <br>

      <br>

    </tt>

  </body>

</html>