[erlang-bugs] Funny behaviour of dirty_next in mnesia?

Wed May 18 08:22:57 CEST 2011

The lesson is don't use mnesia:dirty api, especially not in a
distributed setting.

There is a reason dirty is in the name, and really the dirty
operations shouldn't exist at all.
They are a product of customer need "we don't care if it's correct in
all cases but it should be fast".

/Dan

On Tue, May 17, 2011 at 11:32 PM, Ahmed Omar <spawn.think@REDACTED> wrote:
> Well, it's not the same process. When mnesia find out that :
> - there are N of updates to commit,
> - the protocol to use it async
> - the node of tid  is not the local node
>
> it spawns a new process to do the commit.
>
> On Tue, May 17, 2011 at 10:44 PM, John Hughes <john.hughes@REDACTED> wrote:
>>
>>
>>
>>
>> From: Ahmed Omar
>> I'm not a mnesia expert, but i THINK the race condition is in the test not
>> mnesia. transaction is still being committed and logged, when the dirty read
>> is issued. if you add a sleep in between or better if you use
>> mnesia:sync_transaction
>> (http://www.erlang.org/doc/man/mnesia.html#sync_transaction-3) instead of
>> mnesia:transaction, the test will fail, i.e the case disappear
>> isn't that the expected behavior or am i missing something?
>>
>>
>> Adding a sleep (I added a second) or using sync_transaction instead
>> changes the behaviour to what I would expect, so it sounds as though you may
>> be right about what's happening. But even so, it's not the behaviour I would
>> expect, at least!
>>
>> There isn't any concurrency in the test. There's only distribution--and
>> there's only one copy of the table, on the slave node. Isn't it weird that
>> when the transaction returns, the SAME process that ran the transaction does
>> not see its side effects?
>>
>> By the way, if I swap the last two operations (which Ulf Wiger suggested),
>> then I see the same kind of behaviour... but now the first operation (which
>> is now a dirty_read) actually retrieves the deleted tuple from the table,
>> while the second operation (now the dirty_next) sees no keys in the table.
>>
>> This doesn't happen if the table is on the same node as the test is
>> executed on, so distribution certainly is not transparent in this case.
>>
>> John
>>
>>
>>
>> On Tue, May 17, 2011 at 6:57 PM, John Hughes <john.hughes@REDACTED>
>> wrote:
>>>
>>> QuickCheck turned up another case of odd behaviour at Klarna.
>>>
>>> The test runs mnesia on two nodes, creates a table on the OTHER node,
>>> then adds and deletes a record. After this the record is indeed not IN the
>>> table, but dirty_next finds its key anyway! Surely it shouldn't?
>>>
>>> Here's the test:
>>>
>>> test() ->
>>>     Slave = start_mnesia_with_slave(),
>>>     {atomic,ok} = mnesia:create_table(rec,[{type,set},
>>>         {disc_only_copies,[Slave]}]),
>>>     ok          = mnesia:dirty_write({rec,4,1}),
>>>     %% The next command MUST be done in a transaction, otherwise
>>> dirty_next works
>>>     {atomic,ok} =
>>> mnesia:transaction(fun()->mnesia:delete_object({rec,4,1}) end),
>>>     %% Here's the problem: dirty_next returns 4, but this key is not in
>>> the table!
>>>     4           = mnesia:dirty_next(rec,0),
>>>     []          = mnesia:dirty_read(rec,4).
>>> I'm starting mnesia and the slave node like this:
>>>
>>> start_mnesia_with_slave() ->
>>>     {ok,Dir} = file:get_cwd(),
>>>     ok = error_logger:tty(false),
>>>     mnesia:stop(),
>>>     ok = error_logger:tty(true),
>>>     delete_file("mnesia"),
>>>     delete_file("slave"),
>>>     ok = file:make_dir("mnesia"),
>>>     ok = file:make_dir("slave"),
>>>     Slave = slave(),
>>>     ok = application:set_env(mnesia,dir,Dir++"/mnesia"),
>>>     ok = rpc:call(Slave,application,set_env,[mnesia,dir,Dir++"/slave"]),
>>>     ok = mnesia:create_schema([node(),Slave]),
>>>     ok = mnesia:start(),
>>>     ok = rpc:call(Slave,mnesia,start,[]),
>>>     Slave.
>>>
>>> slave() ->
>>>     case slave:start_link(net_adm:localhost(),"slave") of
>>>  {ok,Slave} ->
>>>      Slave;
>>>  {error,{already_running,Slave}} ->
>>>      Slave
>>>     end.
>>> I also have code to delete a file or directory, easy on Linux, darn
>>> difficult on Windows. You don't need this really, just run the test in an
>>> empty directory.
>>>
>>> delete_file(Name) ->
>>>     case filelib:is_dir(Name) of
>>>  true ->
>>>      [delete_file(Name++"/"++X) || X <- list_dir(Name)],
>>>      file:del_dir(Name),
>>>      delete_file(Name);
>>>  {error,eaccess} ->
>>>      delete_file(Name);
>>>  {error,enoent} ->
>>>      io:format("Could not find ~p\n",[Name]),
>>>      ok;
>>>  false ->
>>>      case file:delete(Name) of
>>>   {error,enoent} ->
>>>       ok;
>>>   {error,eacces} ->
>>>       io:format("Could not access ~p\n",[Name]),
>>>       delete_file(Name);
>>>   ok ->
>>>       delete_file(Name)
>>>      end
>>>     end.
>>>
>>> list_dir(Name) ->
>>>     case file:list_dir(Name) of
>>>  {ok,Files} ->
>>>      Files;
>>>  {error,eacces} ->
>>>      io:format("Could not list directory ~p\n",[Name]),
>>>      list_dir(Name);
>>>  {error,enoent} ->
>>>      io:format("Could not find directory ~p\n",[Name]),
>>>      []
>>>     end.
>>> John
>>> _______________________________________________
>>> erlang-bugs mailing list
>>> erlang-bugs@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-bugs
>>>
>>
>>
>>
>> --
>> Best Regards,
>> - Ahmed Omar
>> http://nl.linkedin.com/in/adiaa
>> Follow me on twitter
>> @spawn_think
>
>
>
> --
> Best Regards,
> - Ahmed Omar
> http://nl.linkedin.com/in/adiaa
> Follow me on twitter
> @spawn_think
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs
>
>