<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 06/18/2016 03:54 AM, John Smith
wrote:<br>
</div>
<blockquote
cite="mid:CAGf2Rru6YFSjwgupiAyL1nnMKGeZO9izUGA=Xyth4kDEsEU+KA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>For one of my systems in the financial area, I am in need
of a disk-backed log that I could use as a backend for an
Event Sourcing/CQRS store. Recently, I have read a bit about
Kafka [1] and it seems like a good fit but, unfortunately, it
is on JVM (written in Scala, to be exact) and depends heavily
on ZooKeeper [2] for distribution, while I would prefer
something similar for an Erlang ecosystem. Thus, ideally, I
would like to have something that is:</div>
<div><br>
</div>
<div> * small,</div>
<div> * durable (checksummed, with a clear recovery procedure),</div>
<div> * pure Erlang/Elixir (maybe with some native code, but
tightly integrated),</div>
<div> * (almost) not distributed - data fits on the single node
(at least now; with replication for durability, though).</div>
<div><br>
</div>
<div>Before jumping right into implementation, I have some
questions:</div>
<div><br>
</div>
<div> 1. Is there anything already available that fulfils above
requirements?</div>
<div> 2. Kafka uses different approach to persistence - instead
of using in-process buffers and transferring data to disk, it
writes straight to the filesystem which, actually, uses
pagecache [3]. Can I achieve the same thing using Erlang or
does it buffers writes in some other way?</div>
<div> 3. ...also, Kafka has a log compaction [4] which can work
not only in time but also in a key dimension - I need this, as
I need to persist the last state for every key seen (user,
transfer, etc.). As in Redis, Kafka uses the UNIX
copy-on-write semantics (process fork) to avoid needless
memory usage for log fragments (segments, in Kafka
nomenclature) that have not changed. Can I mimick a similar
behaviour in Erlang? Or if not, how can I deal with biggish
(say, a couple of GB) logs that needs to be compacted?</div>
<div><br>
</div>
<div>In other words, I would like to create something like a
*Minimum Viable Log* (in Kafka style), only in Erlang/Elixir.
I would be grateful for any kind of design/implementation
hints. </div>
<div><br>
</div>
<div>[1] <a moz-do-not-send="true"
href="http://kafka.apache.org/">http://kafka.apache.org/</a></div>
<div>[2] <a moz-do-not-send="true"
href="https://zookeeper.apache.org/">https://zookeeper.apache.org/</a></div>
<div>[3] <a moz-do-not-send="true"
href="http://kafka.apache.org/documentation.html#persistence">http://kafka.apache.org/documentation.html#persistence</a></div>
<div>[4] <a moz-do-not-send="true"
href="https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction">https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction</a></div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
erlang-questions mailing list
<a class="moz-txt-link-abbreviated" href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a>
<a class="moz-txt-link-freetext" href="http://erlang.org/mailman/listinfo/erlang-questions">http://erlang.org/mailman/listinfo/erlang-questions</a>
</pre>
</blockquote>
<tt><br>
If you use </tt><a class="moz-txt-link-freetext" href="https://github.com/CloudI/cloudi_service_queue">https://github.com/CloudI/cloudi_service_queue</a>
with <a class="moz-txt-link-freetext" href="https://github.com/CloudI/cloudi_core">https://github.com/CloudI/cloudi_core</a> it would satisfy those
requirements. However, its approach is simpler, it doesn't require
checksums and the recovery is automatic (upon restart). The normal
filesystem writes are buffered but can be flushed in Erlang with
<a class="moz-txt-link-freetext" href="file:datasync/1">file:datasync/1</a> which is used within cloudi_service_queue source
code. The logs are not compacted and the logs don't shrink to keep
stuff efficient. If necessary, the incoming request rate could
always be limited through cloudi_service_queue service configuration
options (any of queue_limit, queue_size, or rate_request_max), to
control the size of the logs.<br>
<br>
</body>
</html>