[erlang-questions] what is the "race condition bug in core Erlang" mentioned by @damienkatz? (was: Re: erlang-questions Digest, Vol 95, Issue 9)

Michael Truog <>
Fri Jan 11 19:45:40 CET 2013


Thank you for providing an explanation of the problems and the 
information about the use of Erlang with databases.  Often these points 
for (or against) the usage of Erlang, gets lost within passionate 
arguments and sales pitches, but this provides some level-headed 
information.

On 01/11/2013 10:12 AM, Aliaksey Kandratsenka wrote:
> Sorry for replying for digest. I normally just passively listen here.
>
> I'm one of the guys directly involved in final stages of figuring out 
> that bug.
>
> Here's the story.
>
> As we approached finalization of Couchbase Server 2.0 we started 
> seeing http://www.couchbase.com/issues/browse/MB-6638. Given that we 
> have a bunch of custom nifs, we weren't sure until very last minute 
> whether it's erlang vm bug or ours. And, initially, even reliably 
> reproducing this was hard. And when we learned how to reproduce it 
> still required few hours of running our full stack. That's why we 
> never asked help in erlang MLs.
>
> Backtraces suggested something in efile driver related to async io 
> threads, so our folks tried to disable them and observed that crashes 
> were gone. They also tried to reproduce this problem in smaller scale, 
> but they only found some different bug. Which Filipe fixed recently: 
> https://github.com/erlang/otp/commit/5ddf4118617d7e5bac5b889025aa0f3903796a49
>
> We had to ship 2.0 without getting on top of this. So 2.0 _does not 
> have_ async io threads enabled. This means some heavy disk io (which 
> we do) can cause unpredictable delays for any erlang process and thus 
> some end-user badness.
>
> BTW, Why something as crucial as async io threads is off by default ? 
> When I was trying to argue for not disabling async io threads prior to 
> 2.0 and fighting this issue "to death", I've heard argument: "it's 
> experimental feature because it's off by default". Is it ?
>
> In the end we found that when process linked to raw file dies, it'll 
> stop linked file driver. And as part of that underlying os file or 
> gzip stream (depending on compressed option) will be closed. Without 
> taking into account any possible in-flight async call for that file. 
> It's somewhat harmless for plain files to try to read/write closed fd, 
> but it'll clearly cause crash if some code tries to read from closed 
> (and freed) gzip stream. And of course tiny possibility of 
> reading/writing to/from another file that happened to reuse same fd is 
> not fun either.
>
> We found that file_sorter is actually passing compressed option "just 
> case" all the time and we confirmed that indeed crashes happen because 
> of those "compressed by not really" raw file ports.
>
> Couchbase Server 2.0.1 will ship with workaround that replaces 
> file_sorter from stdlib with it's tiny fork that cuts compressed 
> option out. I've seen Filipe produced erlang vm patch for that issue 
> too, but what I've seen only covers closing compressed files. IMHO 
> right fix would be to cover both options.
>
> I'm also seeing some folks in this thread being unhappy and somewhat 
> angry. Apparently they seem to interpret Damien's opinion as bashing 
> of Erlang. Which is IMHO not the case. I think his arguments apply for 
> core database software.
>
> And In my humble opinion candid expressions like that should be 
> encouraged and studied with cold minds.
>
> It is true that we have found that getting performance out of Erlang 
> and in general understanding what happens inside VM is next to impossible.
>
> And, personally, even without knowing all I know about challenges of 
> getting performance out of Erlang VM I'd still say that doing core 
> database in erlang (or any other not C-like-low-level language) is 
> just crazy. IMHO.
>
> BTW, perhaps, not everybody here is aware that Damien has "erlanger of 
> the year" 2009 award. I guess for CouchDB. Indeed, it's very much like 
> love affair that's gone :) But hey, un-loved being is not necessarily 
> bad, right ?
>
> As for plans of using Erlang in Couchbase (which is former Membase). 
> We indeed plan to incrementally and gradually rewrite performance 
> sensitive pieces in C or C++. But there are no concrete plans of 
> getting rid of Erlang entirely, yet. It works ok for our cluster 
> management layer.
>
> And IMHO compared to some our competitors which either do all in 
> low-level language (mongo, rethinkdb) or high-level (riak) our 
> approach of combining low-level language for "moving bits around" and 
> high-level language for cluster management and orchestration seems to 
> work best.
>
> So even if we switch off Erlang completely, we'll very likely still 
> use something much higher level than C for cluster management and 
> other not performance sensitive but sometimes tricky pieces.
>
> On Fri, Jan 11, 2013 at 3:00 AM, < 
> <mailto:>> wrote:
>
>     Date: Fri, 11 Jan 2013 09:43:36 +0100
>     From: Henning Diedrich <
>     <mailto:>>
>     To: Anton Lebedevich < <mailto:>>
>     Cc: "
>     <mailto:>" <
>     <mailto:>>
>     Subject: Re: [erlang-questions] what is the "race condition bug in
>             core    Erlang" mentioned by @damienkatz?
>     Message-ID: <
>     <mailto:>>
>     Content-Type: text/plain; charset=windows-1252
>
>     I love that how languages can be love affairs etc.
>
>     A race condition in core Erlang, I am sure Damien will share his find.
>
>     In the meantime maybe it's worth looking at the political
>     circumstances.
>
>     Some might note not only that you fall out of love and then you're
>     irrationally deeply disappointed. You'll find all the feeling of
>     understanding was an illusion in the first place. And sometimes
>     you're even right. But that CouchDB surfed the Erlang hype, a
>     while ago Damien was able to close a deal, and for some reason I
>     don't know anyone quite understood announced that he'll reprogram
>     it all in C.
>
>     Maybe it was an astounding proposition to program a transactional,
>     local (!) database in the age of Big Data in a language that
>     happens to be transactional by nature but is really made for
>     distribution, and it's not too surprising when that premise is now
>     abandoned. CouchDB is great for certain things, I have no doubt
>     about that, how else could it be so successful.
>
>     But maybe one could ask, with the distribution layer of Couchbase
>     coming from Membase [1] (which means it would still be Erlang?)
>     but the local storage being in C (coming from memcached I
>     believe), was there simply a necessity in play because C would be
>     a better fit with the rest of the local part of Membase? Like
>     after renaming things, the CouchDB principle would be
>     reprogrammed, to replace or amend the memcached parts in Membase,
>     to become Coucbase, so it had to be in C? And dealing only with
>     the local storage parts, for a database, which was probably the
>     task ? I am not sure that's a natural for Erlang.
>
>     You wouldn't think someone could be talking himself publicly into
>     loving his partner in a forced marriage?
>
>     Me for instance, I love C. Erlang always makes me feel stupid. Who
>     wants that.
>
>     Henning
>
>
>     [1] old: http://blog.couchbase.com/why-membase-uses-erlang
>
>
>
>
>
> _______________________________________________
> erlang-questions mailing list
> 
> http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130111/898ac128/attachment.html>


More information about the erlang-questions mailing list