<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 06/18/2016 03:54 AM, John Smith

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAGf2Rru6YFSjwgupiAyL1nnMKGeZO9izUGA=Xyth4kDEsEU+KA@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>For one of my systems in the financial area, I am in need

          of a disk-backed log that I could use as a backend for an

          Event Sourcing/CQRS store. Recently, I have read a bit about

          Kafka [1] and it seems like a good fit but, unfortunately, it

          is on JVM (written in Scala, to be exact) and depends heavily

          on ZooKeeper [2] for distribution, while I would prefer

          something similar for an Erlang ecosystem. Thus, ideally, I

          would like to have something that is:</div>

        <div><br>

        </div>

        <div>  * small,</div>

        <div>  * durable (checksummed, with a clear recovery procedure),</div>

        <div>  * pure Erlang/Elixir (maybe with some native code, but

          tightly integrated),</div>

        <div>  * (almost) not distributed - data fits on the single node

          (at least now; with replication for durability, though).</div>

        <div><br>

        </div>

        <div>Before jumping right into implementation, I have some

          questions:</div>

        <div><br>

        </div>

        <div>  1. Is there anything already available that fulfils above

          requirements?</div>

        <div>  2. Kafka uses different approach to persistence - instead

          of using in-process buffers and transferring data to disk, it

          writes straight to the filesystem which, actually, uses

          pagecache [3]. Can I achieve the same thing using Erlang or

          does it buffers writes in some other way?</div>

        <div>  3. ...also, Kafka has a log compaction [4] which can work

          not only in time but also in a key dimension - I need this, as

          I need to persist the last state for every key seen (user,

          transfer, etc.). As in Redis, Kafka uses the UNIX

          copy-on-write semantics (process fork) to avoid needless

          memory usage for log fragments (segments, in Kafka

          nomenclature) that have not changed. Can I mimick a similar

          behaviour in Erlang? Or if not, how can I deal with biggish

          (say, a couple of GB) logs that needs to be compacted?</div>

        <div><br>

        </div>

        <div>In other words, I would like to create something like a

          *Minimum Viable Log* (in Kafka style), only in Erlang/Elixir.

          I would be grateful for any kind of design/implementation

          hints.  </div>

        <div><br>

        </div>

        <div>[1] <a moz-do-not-send="true"

            href="http://kafka.apache.org/">http://kafka.apache.org/</a></div>

        <div>[2] <a moz-do-not-send="true"

            href="https://zookeeper.apache.org/">https://zookeeper.apache.org/</a></div>

        <div>[3] <a moz-do-not-send="true"

            href="http://kafka.apache.org/documentation.html#persistence">http://kafka.apache.org/documentation.html#persistence</a></div>

        <div>[4] <a moz-do-not-send="true"

            href="https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction">https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction</a></div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

erlang-questions mailing list

<a class="moz-txt-link-abbreviated" href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a>

<a class="moz-txt-link-freetext" href="http://erlang.org/mailman/listinfo/erlang-questions">http://erlang.org/mailman/listinfo/erlang-questions</a>

</pre>

    </blockquote>

    <tt><br>

      If you use </tt><a class="moz-txt-link-freetext" href="https://github.com/CloudI/cloudi_service_queue">https://github.com/CloudI/cloudi_service_queue</a>

    with <a class="moz-txt-link-freetext" href="https://github.com/CloudI/cloudi_core">https://github.com/CloudI/cloudi_core</a> it would satisfy those

    requirements.  However, its approach is simpler, it doesn't require

    checksums and the recovery is automatic (upon restart).  The normal

    filesystem writes are buffered but can be flushed in Erlang with

    <a class="moz-txt-link-freetext" href="file:datasync/1">file:datasync/1</a> which is used within cloudi_service_queue source

    code.  The logs are not compacted and the logs don't shrink to keep

    stuff efficient.  If necessary, the incoming request rate could

    always be limited through cloudi_service_queue service configuration

    options (any of queue_limit, queue_size, or rate_request_max), to

    control the size of the logs.<br>

    <br>

  </body>

</html>