[erlang-bugs] Funny behaviour of dirty_next in mnesia?
Dan Gudmundsson
dgud@REDACTED
Wed May 18 08:22:57 CEST 2011
The lesson is don't use mnesia:dirty api, especially not in a
distributed setting.
There is a reason dirty is in the name, and really the dirty
operations shouldn't exist at all.
They are a product of customer need "we don't care if it's correct in
all cases but it should be fast".
/Dan
On Tue, May 17, 2011 at 11:32 PM, Ahmed Omar <spawn.think@REDACTED> wrote:
> Well, it's not the same process. When mnesia find out that :
> - there are N of updates to commit,
> - the protocol to use it async
> - the node of tid is not the local node
>
> it spawns a new process to do the commit.
>
> On Tue, May 17, 2011 at 10:44 PM, John Hughes <john.hughes@REDACTED> wrote:
>>
>>
>>
>>
>> From: Ahmed Omar
>> I'm not a mnesia expert, but i THINK the race condition is in the test not
>> mnesia. transaction is still being committed and logged, when the dirty read
>> is issued. if you add a sleep in between or better if you use
>> mnesia:sync_transaction
>> (http://www.erlang.org/doc/man/mnesia.html#sync_transaction-3) instead of
>> mnesia:transaction, the test will fail, i.e the case disappear
>> isn't that the expected behavior or am i missing something?
>>
>>
>> Adding a sleep (I added a second) or using sync_transaction instead
>> changes the behaviour to what I would expect, so it sounds as though you may
>> be right about what's happening. But even so, it's not the behaviour I would
>> expect, at least!
>>
>> There isn't any concurrency in the test. There's only distribution--and
>> there's only one copy of the table, on the slave node. Isn't it weird that
>> when the transaction returns, the SAME process that ran the transaction does
>> not see its side effects?
>>
>> By the way, if I swap the last two operations (which Ulf Wiger suggested),
>> then I see the same kind of behaviour... but now the first operation (which
>> is now a dirty_read) actually retrieves the deleted tuple from the table,
>> while the second operation (now the dirty_next) sees no keys in the table.
>>
>> This doesn't happen if the table is on the same node as the test is
>> executed on, so distribution certainly is not transparent in this case.
>>
>> John
>>
>>
>>
>> On Tue, May 17, 2011 at 6:57 PM, John Hughes <john.hughes@REDACTED>
>> wrote:
>>>
>>> QuickCheck turned up another case of odd behaviour at Klarna.
>>>
>>> The test runs mnesia on two nodes, creates a table on the OTHER node,
>>> then adds and deletes a record. After this the record is indeed not IN the
>>> table, but dirty_next finds its key anyway! Surely it shouldn't?
>>>
>>> Here's the test:
>>>
>>> test() ->
>>> Slave = start_mnesia_with_slave(),
>>> {atomic,ok} = mnesia:create_table(rec,[{type,set},
>>> {disc_only_copies,[Slave]}]),
>>> ok = mnesia:dirty_write({rec,4,1}),
>>> %% The next command MUST be done in a transaction, otherwise
>>> dirty_next works
>>> {atomic,ok} =
>>> mnesia:transaction(fun()->mnesia:delete_object({rec,4,1}) end),
>>> %% Here's the problem: dirty_next returns 4, but this key is not in
>>> the table!
>>> 4 = mnesia:dirty_next(rec,0),
>>> [] = mnesia:dirty_read(rec,4).
>>> I'm starting mnesia and the slave node like this:
>>>
>>> start_mnesia_with_slave() ->
>>> {ok,Dir} = file:get_cwd(),
>>> ok = error_logger:tty(false),
>>> mnesia:stop(),
>>> ok = error_logger:tty(true),
>>> delete_file("mnesia"),
>>> delete_file("slave"),
>>> ok = file:make_dir("mnesia"),
>>> ok = file:make_dir("slave"),
>>> Slave = slave(),
>>> ok = application:set_env(mnesia,dir,Dir++"/mnesia"),
>>> ok = rpc:call(Slave,application,set_env,[mnesia,dir,Dir++"/slave"]),
>>> ok = mnesia:create_schema([node(),Slave]),
>>> ok = mnesia:start(),
>>> ok = rpc:call(Slave,mnesia,start,[]),
>>> Slave.
>>>
>>> slave() ->
>>> case slave:start_link(net_adm:localhost(),"slave") of
>>> {ok,Slave} ->
>>> Slave;
>>> {error,{already_running,Slave}} ->
>>> Slave
>>> end.
>>> I also have code to delete a file or directory, easy on Linux, darn
>>> difficult on Windows. You don't need this really, just run the test in an
>>> empty directory.
>>>
>>> delete_file(Name) ->
>>> case filelib:is_dir(Name) of
>>> true ->
>>> [delete_file(Name++"/"++X) || X <- list_dir(Name)],
>>> file:del_dir(Name),
>>> delete_file(Name);
>>> {error,eaccess} ->
>>> delete_file(Name);
>>> {error,enoent} ->
>>> io:format("Could not find ~p\n",[Name]),
>>> ok;
>>> false ->
>>> case file:delete(Name) of
>>> {error,enoent} ->
>>> ok;
>>> {error,eacces} ->
>>> io:format("Could not access ~p\n",[Name]),
>>> delete_file(Name);
>>> ok ->
>>> delete_file(Name)
>>> end
>>> end.
>>>
>>> list_dir(Name) ->
>>> case file:list_dir(Name) of
>>> {ok,Files} ->
>>> Files;
>>> {error,eacces} ->
>>> io:format("Could not list directory ~p\n",[Name]),
>>> list_dir(Name);
>>> {error,enoent} ->
>>> io:format("Could not find directory ~p\n",[Name]),
>>> []
>>> end.
>>> John
>>> _______________________________________________
>>> erlang-bugs mailing list
>>> erlang-bugs@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-bugs
>>>
>>
>>
>>
>> --
>> Best Regards,
>> - Ahmed Omar
>> http://nl.linkedin.com/in/adiaa
>> Follow me on twitter
>> @spawn_think
>
>
>
> --
> Best Regards,
> - Ahmed Omar
> http://nl.linkedin.com/in/adiaa
> Follow me on twitter
> @spawn_think
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs
>
>
More information about the erlang-bugs
mailing list