[erlang-questions] Disk-backed log

Sun Jun 19 07:34:19 CEST 2016

Can you shard your event log by aggregate type and thus avoid the
deletion/compaction issue altogether?  I've read some people suggest
sharding by aggregate ID if the aggregate type shard is not small enough
[1].

If you haven't already, you may want to write a simple benchmark that
append gobs of data to a file. I found a 2013 thread [2] on Erlang
Questions with a similar sequential append use case where the OP was not
happy with Erlang's speed.  But I can't follow his math: writing 5 Gigabits
in 104 seconds seems like a lot more that 504Hz.  I also found people
complaining that get_line was slow.  I guess parallel reads would be
possible inside an aggregate boundary ...

CQRS is a topic I am very interested in, I hope you post again!

[1] http://cqrs.nu/Faq/event-sourcing
[2] http://erlang.org/pipermail/erlang-questions/2013-June/074190.html

P.S. I've wondered why people don't treat a snapshot (or "compaction") as a
command. One that emits a "special" event with the current state of that
aggregate.  Write this event to a fast durable key/value store (again, one
per aggregate type) where the key is aggregate id and the value is the
aggregate state and an offset into the main log where you should pick up
reading from.

On Sat, Jun 18, 2016 at 6:54 AM, John Smith <4crzen62cwqszy68g7al@REDACTED>
wrote:

> For one of my systems in the financial area, I am in need of a disk-backed
> log that I could use as a backend for an Event Sourcing/CQRS store.
> Recently, I have read a bit about Kafka [1] and it seems like a good fit
> but, unfortunately, it is on JVM (written in Scala, to be exact) and
> depends heavily on ZooKeeper [2] for distribution, while I would prefer
> something similar for an Erlang ecosystem. Thus, ideally, I would like to
> have something that is:
>
>   * small,
>   * durable (checksummed, with a clear recovery procedure),
>   * pure Erlang/Elixir (maybe with some native code, but tightly
> integrated),
>   * (almost) not distributed - data fits on the single node (at least now;
> with replication for durability, though).
>
> Before jumping right into implementation, I have some questions:
>
>   1. Is there anything already available that fulfils above requirements?
>   2. Kafka uses different approach to persistence - instead of using
> in-process buffers and transferring data to disk, it writes straight to the
> filesystem which, actually, uses pagecache [3]. Can I achieve the same
> thing using Erlang or does it buffers writes in some other way?
>   3. ...also, Kafka has a log compaction [4] which can work not only in
> time but also in a key dimension - I need this, as I need to persist the
> last state for every key seen (user, transfer, etc.). As in Redis, Kafka
> uses the UNIX copy-on-write semantics (process fork) to avoid needless
> memory usage for log fragments (segments, in Kafka nomenclature) that have
> not changed. Can I mimick a similar behaviour in Erlang? Or if not, how can
> I deal with biggish (say, a couple of GB) logs that needs to be compacted?
>
> In other words, I would like to create something like a *Minimum Viable
> Log* (in Kafka style), only in Erlang/Elixir. I would be grateful for any
> kind of design/implementation hints.
>
> [1] http://kafka.apache.org/
> [2] https://zookeeper.apache.org/
> [3] http://kafka.apache.org/documentation.html#persistence
> [4] https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160619/01023b68/attachment.htm>