[erlang-questions] Is ets:insert/2 (with multiple objects) isolated with respect to concurrent readers?

Thu Jul 2 18:40:58 CEST 2009

Very sorry -- I somehow managed to get gmail to send an incomplete
message unintentionally

The unfinished part was:,

> e.g. Atomic+isolated operations on multiple objects are really
> emulating simple transactions.  What kind of isolation guarantee is
> offered?
>   - 'serializable' isolation would

I was about to say that to achieve full 'serializable' isolation may
require locking keys/objects that are currently absent (but which will
be inserted by the current insert operation).
(This is analogous to the 'phantom rows' problem in RDBMs).
i.e. It requires two-pass locking; accumulate all necessary locks,
including on the logical slots for any keys that are to be created.
If you don't do this then concurrent multi-object writes may appear to
be interleaved (not serializable) after the fact.
(Of course, acquiring a table-level lock achieves this but is
undesirable in other ways.)

There are of course other standard serialization levels to choose
from, e.g. 'repeatable-read'.

So whichever type of isolation you choose to support for ETS, it may
help users if the documentation describes it in standard vocabulary
(assuming one of the standard levels matches the semantics of ETS.)

Apologies again for the earlier incomplete message.

regards,

Chris

On Thu, Jul 2, 2009 at 8:02 AM, Chris Newcombe<chris.newcombe@REDACTED> wrote:
>>>Don't you think it's enough if stated about {write_concurrency,bool()} that is does
> not break any semantic promises about atomicy and isolation. Maybe
> note that operations
> that makes such promises will gain less (or nothing) from
> {write_concurrency,true}.
>
> Yes, that would be fine IMO.
>
>
>> I don't think we should need to describe the current internal locking
>> strategy.
>
> Sorry, I wasn't more clear.   I wasn't advocating exposing/documenting
> implementation details -- I realize that you need as much
> implementation freedom as possible.
> I was trying to show that some examples of potential implementation
> strategies might help users understand what is not guaranteed.
> But it was a bad suggestion -- it would be better to simply refer
> indirectly to the API semantics documented elsewhere, as you say.
>
>
>> Also, any guarantees about atomicy and isolation only applies to the data
>> that the function is operating on.
>
> Yes.  IMO the API semantics should describe this.  However, I think
> you need more detail.
> e.g. Atomic+isolated operations on multiple objects are really
> emulating simple transactions.  What kind of isolation guarantee is
> offered?
>   - 'serializable' isolation would
>
>
>
>
> On Thu, Jul 2, 2009 at 3:21 AM, Sverker
> Eriksson<sverker@REDACTED> wrote:
>> Chris Newcombe wrote:
>>>
>>> I just noticed that the new {write_concurrency, true} option says that
>>> write-locks might no-longer be taken at the table level.
>>>
>>>    "Different parts of the same table can be mutated (and read) by
>>> concurrent processes."
>>>
>>> (full text below)
>>>
>>> It does not say which write/read APIs are allowed to be concurrent.
>>>
>>>
>>
>> The idea with write_concurrency was that it should be pure performance
>> tuning
>> and not change any guarentees about API semantics.
>>
>>> So there's the usual natural tension between clean API semantics for
>>> compound write operations, and increased concurrency.  e.g. Some
>>> applications might want atomicity, but might care more about increased
>>> concurrency than full isolation.  Other applications (like my current
>>> one) might really need strong isolation.
>>>
>>> But I guess that backwards-compatibility reasons will dominate your
>>> decision (quite understandably).  Given the historic implicit behavior
>>> (strong isolation) for delete_all_objects, insert, and insert_new, it
>>> would be dangerous to change them now.   Also, strong isolation
>>> follows the principle of least surprise.
>>>
>>>
>>
>> True, backward-compatibility was the main reason for deciding now about
>> making
>> the atomic and isolated semantics of insert, insert_new and
>> delete_all_objects
>> to be guaranteed in the docs.
>> The introduction of write_concurrency was however about
>> backward-compatibility
>> with respect only to performance and not semantics.
>>
>>> It would be great if the updated documentation for the APIs
>>> specifically described the isolation semantics when
>>> {write_concurrency,true} is used.
>>>  And vice-versa too; e.g. it would good if the documentation for
>>> write_concurrency mentioned that compound-write operations will either
>>> acquire multiple fine-grain write-locks (i.e. acquire all necessary
>>> locks before modifying anything), or may choose to acquire a
>>> table-level lock, to ensure (in either case) that their historic
>>> isolation behavior is preserved.
>>> Therefore applications that make heavy use of compound-write
>>> operations might see less benefit from {write_concurrency, true}.
>>>
>>>
>>
>> Don't you think it's enough if stated about {write_concurrency,bool()} that
>> is does
>> not break any semantic promises about atomicy and isolation. Maybe note that
>> operations
>> that makes such promises will gain less (or nothing) from
>> {write_concurrency,true}.
>> I don't think we should need to describe the current internal locking
>> strategy.
>>
>> Also, any guarantees about atomicy and isolation only applies to the data
>> that the function is operating on.
>>
>> /Sverker, Erlang/OTP
>>
>>
>>
>