[erlang-questions] Disk-backed log

Sat Jun 18 20:53:12 CEST 2016

On 06/18/2016 03:54 AM, John Smith wrote:
> For one of my systems in the financial area, I am in need of a disk-backed log that I could use as a backend for an Event Sourcing/CQRS store. Recently, I have read a bit about Kafka [1] and it seems like a good fit but, unfortunately, it is on JVM (written in Scala, to be exact) and depends heavily on ZooKeeper [2] for distribution, while I would prefer something similar for an Erlang ecosystem. Thus, ideally, I would like to have something that is:
>
>   * small,
>   * durable (checksummed, with a clear recovery procedure),
>   * pure Erlang/Elixir (maybe with some native code, but tightly integrated),
>   * (almost) not distributed - data fits on the single node (at least now; with replication for durability, though).
>
> Before jumping right into implementation, I have some questions:
>
>   1. Is there anything already available that fulfils above requirements?
>   2. Kafka uses different approach to persistence - instead of using in-process buffers and transferring data to disk, it writes straight to the filesystem which, actually, uses pagecache [3]. Can I achieve the same thing using Erlang or does it buffers writes in some other way?
>   3. ...also, Kafka has a log compaction [4] which can work not only in time but also in a key dimension - I need this, as I need to persist the last state for every key seen (user, transfer, etc.). As in Redis, Kafka uses the UNIX copy-on-write semantics (process fork) to avoid needless memory usage for log fragments (segments, in Kafka nomenclature) that have not changed. Can I mimick a similar behaviour in Erlang? Or if not, how can I deal with biggish (say, a couple of GB) logs that needs to be compacted?
>
> In other words, I would like to create something like a *Minimum Viable Log* (in Kafka style), only in Erlang/Elixir. I would be grateful for any kind of design/implementation hints.
>
> [1] http://kafka.apache.org/
> [2] https://zookeeper.apache.org/
> [3] http://kafka.apache.org/documentation.html#persistence
> [4] https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

If you use https://github.com/CloudI/cloudi_service_queue with https://github.com/CloudI/cloudi_core it would satisfy those requirements.  However, its approach is simpler, it doesn't require checksums and the recovery is automatic (upon restart).  The normal filesystem writes are buffered but can be flushed in Erlang with file:datasync/1 which is used within cloudi_service_queue source code.  The logs are not compacted and the logs don't shrink to keep stuff efficient.  If necessary, the incoming request rate could always be limited through cloudi_service_queue service configuration options (any of queue_limit, queue_size, or rate_request_max), to control the size of the logs.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160618/126b5f71/attachment.htm>