[erlang-questions] Changing the representation of sets
Tue Apr 25 08:51:10 CEST 2017
The sets module implements an opaque structure to represent sets. The current implementation is based on hashes and nested tuples. Since OTP 18, we have maps that are an excellent and performant data structure for storing arbitrary keys & values. This means, they can be used to efficiently implement sets, by storing set elements as keys with some default value.
My measurements  show that a set implementation based on maps is 2-4 times faster in all operations than the current implementation from the sets module. Comparing performance tradeoffs to other set implementations, it seems that map-based sets have exactly the same characteristics as current sets, except they are faster. Map-based sets also have a huge advantage of being easily readable when printed in the console. Given the sets data structure is opaque it should be possible to just switch implementations and make everybody's code faster without changing a line of code. The only issue might be incorrect uses of the sets module that assume some particular shape for the structure - I'm not sure how widespread that is.
Another possibility would be to implement a new map_sets module, but I think 4 set implementation is already too many - no need to add another. There's also no need to write a new implementation since there's already a well tested, map-based set implementation available in OTP in the cerl_sets module in the compiler application .
A similar consideration could be made for the dict module, but I haven't looked closely at the performance differences there - I would expect something similar, though, since sets and dict use a similar implementation.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions