[erlang-questions] Best Practice in Map keys: Atoms or Binaries (or Strings)?

Björn-Egil Dahlberg XB bjorn-egil.xb.dahlberg@REDACTED
Mon Oct 3 10:54:30 CEST 2016


On 10/02/2016 01:07 PM, Grzegorz Junka wrote:
>
> I don't think Ruel was asking about differences between maps and 
> records, only what datatype is optimal for map's keys?
>
> I was hoping someone from the OTP team will take on this question, but 
> from what I understand the key is always hashed using some C 
> functions. So, the shorter is the key the faster will be the hashing, 
> but the difference will be so small, that only noticeable on really 
> big data structures (like long lists/strings, binaries or deep data 
> structures, like dicts, where it has to be traversed, and by long I 
> mean a few dozens of bytes).
>
Maps were designed to handle all types as keys but it's best suited for 
literal keys, i.e. 'this_is_a_literal_key', "this_is_a_literal_key", 
<<"this_is_also_a_literal_key">>. It is true that comparisons are more 
expensive for larger keys, immediates (atoms and fixnums) are word 
comparisons, but for big maps this is not so problematic. The HAMT 
implementation only does one comparison on the key itself and that it is 
when it reaches the leaf, otherwise it uses the hash to traverses the 
hash value. Also, for literal keys the hashing is done at *load time* so 
it will only hash again when it that hash exhausted, typically this will 
happen if Maps are larger then 50000 pairs. For small maps there are 
more comparisons but the Map is small so who cares =)

As for optimal keys .. it depends on your need. I would say atoms are 
optimal if you don't need to transcode the keys to atoms and back when 
you use them. Otherwise use the source type.

// Björn-Egil


> Grzegorz
>
>
> On 01/10/2016 19:24, Jesper Louis Andersen wrote:
>> Hi,
>>
>> You have to define optimal. Do you want efficient lookup or do you 
>> want to save space? For static keys, using atoms has the advantage of 
>> being fast to compare and rather small (1 machine word). For small 
>> maps, lookup speeds are so quick I don't think it really matters too 
>> much if a record is marginally faster. I'd go for readability over 
>> efficiency in almost all situations.
>>
>> As for the record vs maps discussion: I tend to use an algebraic 
>> datatype instead, so I can avoid representing state which is not 
>> valid. In some circumstances, a map is fit for the representation. 
>> One weakness of a record is that if we define
>>
>> -record(state, { name, socket }).
>>
>> then we need to have #state { socket = undefined } at some point if 
>> we don't have a valid socket. With a map, we can simply initialize to 
>> the state #{ name => Name } and then when the socket is opened later, 
>> we can do State#{ socket => Sock } in the code and extend the state 
>> with a socket key. In a language such as OCaml, we could represent it 
>> as the type
>>
>> -type state() :: {unconnected, name()} | {connected, name(), socket()}.
>>
>> And I've done this in Erlang as well. It depends on the complexity of 
>> the representation. As for the goal: the goal is to build a state 
>> which is very hard to accidentally pattern match wrongly on in the 
>> code base. If your state has no socket, a match such as
>>
>> barney(#{ socket := Sock }) -> ...
>>
>> cannot match, even by accident. In turn, it forces the code to fail 
>> on the match itself, not later on when you try to do something with 
>> an undefined socket.
>>
>>
>> On Sat, Oct 1, 2016 at 12:37 PM, Pagayon, Ruel <ruel@REDACTED 
>> <mailto:ruel@REDACTED>> wrote:
>>
>>     Hi everyone,
>>
>>     I'm just wondering (assuming the keys of my maps in my
>>     application is not dynamically generated):
>>
>>     1. What is the most optimal key type for maps?
>>     2. If there is little to no effect in performance (or resources
>>     in general), as a convention, which is the best to use?
>>
>>     Thank you in advance for your responses.
>>
>>     Cheers,
>>     Ruel
>>
>>     _______________________________________________
>>     erlang-questions mailing list
>>     erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>>     http://erlang.org/mailman/listinfo/erlang-questions
>>     <http://erlang.org/mailman/listinfo/erlang-questions>
>>
>>
>>
>>
>> -- 
>> J.
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20161003/0664f810/attachment.htm>


More information about the erlang-questions mailing list