[erlang-questions] towards a unified dict api

Witold Baryluk baryluk@REDACTED
Thu Dec 29 10:39:00 CET 2011


On 12-23 23:09, Richard Carlsson wrote:
> A thing I've been tinkering with on and off for years is making a
> unified API for dictionaries in Erlang (dict, orddict, gb_trees,
> ets, dets). This requires figuring out a set of function names and
> calling conventions that are mostly familiar but which don't already
> have a conflicting definition in one or more of the modules
> involved.
> 
> In the end I went for using most of the dict module API unchanged,
> with some new synonyms, and a number of additional useful functions.
> I also made the dict module define a new 'dict' behaviour (even
> though it's just an interface rather than a complete behaviour).
> 
> One particular detail is that gb_trees (with its user-unfriendly
> name and rather different calling conventions) can now quite simply
> be used through the dict module as an ordered variant of dict, and
> you can pretend you never heard of the gb_trees module unless you
> want to use one of its specially implementation-dependent functions.
> An ordered dict can be created through dict:new([ordered_set]). This
> also resolved some major problems with clashing function
> definitions.
> 
> The code (based on the OTP maint branch) can be found here:
> 
> https://github.com/richcarl/otp/tree/dict-api
> 

I agree dict API is way to go, because it is so used. However I would be
against introdusing any synonymous, it may start confusing.

What I do not link in dict API is how update/4 function behaves.
It is called like update(Key, Fun, Initial, Dict) -> Dict.
Problem is that if Initial is some sort of complex and costly
to compute initial value, then it will be most of the time
wasted, because it will be not needed.

THis is why I often use function like this:

update_full(Key, Fun, FunInit, Dict) when is_function(Fun, 1), is_function(FunInit, 0) ->
	case dict:is_key(Key, Dict) of
	true ->
		dict:update(Key, Fun, Dict);
	false ->
		Val = FunInit(),
		dict:store(Key, Val, Dict)
	end;

Of course integrating it directly into dict will make it even faster, by
not needing to traverse datastructure twice.

I'm not really sure if integrating abstract collections into existing
dict module is good idea. It should not break compatibility, but it will
for sure bring some performance hit (due pattern matching in the dict
module and delegation to other modules).

I think dict module should be leaves as is, and new module should be
introduced, like gen_dict. Sure in some sense, it is easier to just
find all dict:new() using simple grep, and change it to dict:new([...]),
where appropriate without worring about other call sites, but if for some
reasons one changes dict:new([dict]), to something else, some functions
may subtelly change how they work, so I think it should not messed too much.

And running sed s/\bdict:\b/gen_dict:/ or similar things, shouldn't be hard,
but will make sure developer understand it can be:
 1) slightly slower, due additional level of indirection
 2) have slightly different API

In fact I prepared much simpler (but ugly) wrapper years ago, and often
I use it when I know I needs to store and access randomly potentially
lots of elements, but want to test if it is actually faster than using
simple orddict, or proplist or gb_trees or own datastructures:

https://github.com/baryluk/common_collection  % I have somewhere on disk even newer version, will need to search

No tests, no real documentation, no nothing, no simple way of adding
collections without editing file (howver it should be simple to fix). I
will for sure write it now in different way (I was using for some time
a parametrized modules as wrappers to dict and proplists also, but because
of unknown future of them Erlang - I personally like them very much! -
i stoped using them, despite being even simpler to use). But it works.

It looks that https://github.com/erlware/erlware_commons/tree/master/src
is more modular and easier to extend by other persons without changing
source code. It probably is faster. But it lacks many very useful
functions, as well is not separate project on its own (as it should be).

-- 
Witold Baryluk
JID: witold.baryluk // jabster.pl



More information about the erlang-questions mailing list