phash2 isn't very resistant to collisions?

Sverker Eriksson sverker.eriksson@REDACTED
Wed Dec 15 17:34:09 CET 2021


To get 5 more bits:

phash2(Term, 1 bsl 32)

You get less collisions, but it still does not get close to a uniform distribution for your example.

Test(500) -> 249993


/Sverker

________________________________
From: erlang-questions <erlang-questions-bounces@REDACTED> on behalf of Roger Lipscombe <roger@REDACTED>
Sent: Wednesday, December 15, 2021 12:57 PM
To: erlang-questions <erlang-questions@REDACTED>
Subject: phash2 isn't very resistant to collisions?

I'm sure this has come up previously, but all I could find talked about atoms.

I know it's not intended to be cryptographically strong, but
erlang:phash2 is woefully collidey with 2-tuples:

Test = fun(N) ->
    L = [{X, Y} || X <- lists:seq(0, N - 1), Y <- lists:seq(0, N - 1)],
    Expected = N * N,
    Expected = length(L),
    Expected = length(lists:usort([erlang:phash2(Term) || Term <- L]))
end.

1> Test(10).
100

2> Test(100).
** exception error: no match of right hand side value 9999

3> Test(99).
** exception error: no match of right hand side value 9800

4> Test(500).
** exception error: no match of right hand side value 249763

What alternatives are there to phash2, if I still care about performance?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20211215/2f47bacd/attachment.htm>


More information about the erlang-questions mailing list