[erlang-questions] String versus variable in binary literal
Ulf Wiger
ulf@REDACTED
Thu May 17 12:42:44 CEST 2012
If we're discussing ways to extend the bit syntax, here is one gripe of mine:
We all try to use binaries as much as possible in ets tables, not least since there is a huge difference in space requirements on 64-bit between binaries and lists.
But lists have one great advantage when it comes to match specifications and the select() function:
You can do a prefix match on the key, e.g.
ets:select(Tab, [{ {"foo" ++ '_', '_'}, [], ['$_']} }])
…and on an ordered_set table, this will be executed efficiently, with a cost that is roughly proportional to the number of matching elements.
I was going to write that the only way to do something similar with binary keys is to bind the key to a variable (forcing a full table scan) and then doing the test in a guard, but (realizing that I'd never actually done that and consulting the docs), it seems to be even worse: you can't do it at all!
(Unless I missed something, or the docs fail to describe some relevant guard function).
In order to avoid scanning the whole table, you instead have to resort to next-ing your way through the table. This has roughly similar complexity to the select() above, but is slower, and requires more code. Something like:
pfx_match(T, P) when is_binary(P) ->
lists:concat([ets:lookup(T, P) | pfx_match_(ets:next(T, P), T, P)]).
pfx_match_(K, T, P) when is_binary(K) ->
Sz = byte_size(P),
case K of
<<P:Sz/binary, _/binary>> ->
[ets:lookup(T, K) | pfx_match_(ets:next(T, K), T, P)];
_ ->
[]
end;
pfx_match_(_, _, _) ->
[].
So basically, we're talking about two extensions:
(1) allowing something similar to "foo" ++ '_' in bit syntax, which is probably a very bad idea, and/or
(2) extending the match specifications, possibly with even more bizzare syntax, to at the very least allow som basic inspection of binaries.
Obviously, if the condition can be expressed in a match_spec guard, the select operation could figure out that the first part of the key is bound. I see two problems with this:
- No one, not even Patrik Nyblom who wrote it, seems to want to mess with the match spec compiler and evaluator
- There is obviously a tradeoff between performance and expressivity, since the compilation is done on the fly.
All things considered, I guess this is quite a challenge. :)
BR,
Ulf W
On 17 May 2012, at 00:12, Richard O'Keefe wrote:
> The real answer is that <<"foo">> is ad hoc special purpose syntax;
> there is (as yet) no provision for building a binary using any old
> string, only for string *literals*.
>
> This hack went in basically to provide a more-or-less convenient
> way to write string-as-binary literals.
>
> Yes, it's confusing, and yes, it would be nice if any
> list-of-character-code expressions were allowed.
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com
More information about the erlang-questions
mailing list