[erlang-questions] [ANN] ActorDB a distributed SQL database (Sergej Jurecko)

Fri Jan 24 22:31:29 CET 2014

On Jan 24, 2014, at 8:22 PM, Fred Hebert wrote:
> 
> Does this mean there's a risk of configuration changes not making it out
> properly, but nodes still connecting? i.e. this could potentially lead
> to your nodes not seeing a majority where there should be one, or the
> exact opposite?
> 
> In most of the papers I have read (and in systems like Riak), the
> opposite is often true: you actually want strong consistency when
> modifying the cluster, and eventual one is tolerable otherwise. This
> seems like a funny (and risky) choice.
> 

Nodes that have not claimed any shards are inactive. The only communication they will receive are periodic state broadcasts.  

If master does not get a majority confirmation of commit, it will itself not commit and the system will eventually correct itself. It does leave a chance of that commit to propagate to the rest of the cluster: master went down before committing itself, the node that has committed has highest rank of all once a majority of nodes are connected back and it becomes master.

Global state change guarantees that when it says state has been saved, it actually has been. If it returns error, it leaves a small window of chance that state will propagate once network corrects itself.

But that is actually ok. Because every save to global state is of the nature that it needs to succeed. If it does not, it will be tried periodically until it does. When a shard has been moved to another node and that node is ready to tell all other nodes it is new owner, there is no going back. It must eventually be succeed.

It is an unusual choice I admit. If we find fault in it, it will be changed.

> 
> How do you ensure that quorum consensus works with ACID? Is it doing
> majority reads? It would be important to describe it right.

Writes and reads are done by master. Master does not process any writes or reads until it has quorum. Requests will be queued if master needs to restore db from other nodes.

> 
> I can see how that works. An interestign question is what happens if all
> nodes go down, and need to be booted back up. Is there a way to reload
> from incompletely booted nodes? I assume that by default you only allow
> to copy from healthy nodes in the cluster, but if all of them became
> unhealthy, there has to be a way to restore state back -- otherwise all
> nodes will be waiting for all peers to come back up, and you're more or
> less deadlocked.

Global state contains name of master node. Once there are enough nodes online and if master is one of them, it will remain master. If master is not, it will be determined. The node that becomes master is considered to have valid state.

> 
> That is a timeout. The question is how long does it take for a timeout
> to be seen as a failure, more or less. You do not keep waiting forever,
> and therefore you have timeouts.

Heh whoops, yeah that was dumb of me.

> I'm not sure what you mean by 'all slaves are told to close' -- would
> that conflict with the idea that any of 3 nodes can fail? I'm going with
> the assumption that the slave nodes stay up if they are to re-elect a
> master.

Calls to actors are calls to gen_server. If result of gen_server call is that it died with normal exit, call will restart actor and continue to do it until it either receives some other answer other than normal exit. So actors are reactionary to calls requests. Once started anew, they will contact other nodes in cluster to figure out master.

> 
> A tricky issue there is that you may have asymetric netsplits. For
> example, it's possible that your client can talk to the master, but that
> the masters and slaves cannot communicate together. At that point you
> have a master node that gets requests from the client, and a set of
> slaves that elect a master among themselves. Depending on how things
> work, it is possible that both masters may receive queries at the same
> time.

Yes but writes will succeed only on the two nodes that have elected a new master because they have majority.

> 
> When the network issue is resolved, you currently have 3 nodes, two of
> which believe they are masters, one of the masters which may be lagging
> behind one of the nodes it expects to be a slave. That kind of conflict
> resolution should be considered.

Any node that receives a write request from the wrong node (what it thinks is master) will reinitialize itself to check other nodes. Which will cause it to figure out who is the right master. If it was master previously, it will back down. 

> 
> I think this is the same consideration as the previous point. I guess
> the question/recommendation here is to be able to make a clear
> distinction to the user of the database on whether the transaction
> clearly succeeded, clearly failed, or that you actually do not know, and
> to be careful not to report a timeout/failed proxy as a failure, in
> which case one could believe the transaction is safe to retry.

Yes you are right thank you. Timeout on proxy is an oversight. At the moment a proxy timeout will be considered a failure. We are limited by mysqls error messages however, hopefully it has something that applies.

> 
> Anything specific happens if the node it was transfering to dies? I'm
> guessing here the lock isn't there indefinitely?

It's there until tcp connection is open and it is waiting for response. Without a successful response nothing happens. The origin actor remains unchanged.

> 
> In which case, be careful because it could be one of them damn
> netsplits, and you suddenly have duplicated shards that both believe
> they are canonical!
> 

You've uncovered a bug. Thank you. A fix should be quite straightforward.

> Regards,
> Fred.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140124/340b76dd/attachment.htm>