[erlang-questions] String versus variable in binary literal

Ulf Wiger ulf@REDACTED
Thu May 17 12:42:44 CEST 2012


If we're discussing ways to extend the bit syntax, here is one gripe of mine:

We all try to use binaries as much as possible in ets tables, not least since there is a huge difference in space requirements on 64-bit between binaries and lists.

But lists have one great advantage when it comes to match specifications and the select() function:

You can do a prefix match on the key, e.g.

ets:select(Tab, [{ {"foo" ++ '_', '_'}, [], ['$_']} }])

…and on an ordered_set table, this will be executed efficiently, with a cost that is roughly proportional to the number of matching elements.

I was going to write that the only way to do something similar with binary keys is to bind the key to a variable (forcing a full table scan) and then doing the test in a guard, but (realizing that I'd never actually done that and consulting the docs), it seems to be even worse: you can't do it at all!

(Unless I missed something, or the docs fail to describe some relevant guard function).

In order to avoid scanning the whole table, you instead have to resort to next-ing your way through the table. This has roughly similar complexity to the select() above, but is slower, and requires more code. Something like:

pfx_match(T, P) when is_binary(P) ->
    lists:concat([ets:lookup(T, P) | pfx_match_(ets:next(T, P), T, P)]).

pfx_match_(K, T, P) when is_binary(K) ->
    Sz = byte_size(P),
    case K of
	<<P:Sz/binary, _/binary>> ->
	    [ets:lookup(T, K) | pfx_match_(ets:next(T, K), T, P)];
	_ ->
	    []
    end;
pfx_match_(_, _, _) ->
    [].

So basically, we're talking about two extensions: 

(1) allowing something similar to "foo" ++ '_' in bit syntax, which is probably a very bad idea, and/or
(2) extending the match specifications, possibly with even more bizzare syntax, to at the very least allow som basic inspection of binaries.

Obviously, if the condition can be expressed in a match_spec guard, the select operation could figure out that the first part of the key is bound. I see two problems with this:

- No one, not even Patrik Nyblom who wrote it, seems to want to mess with the match spec compiler and evaluator
- There is obviously a tradeoff between performance and expressivity, since the compilation is done on the fly.

All things considered, I guess this is quite a challenge. :)

BR,
Ulf W

On 17 May 2012, at 00:12, Richard O'Keefe wrote:

> The real answer is that <<"foo">> is ad hoc special purpose syntax;
> there is (as yet) no provision for building a binary using any old
> string, only for string *literals*.
> 
> This hack went in basically to provide a more-or-less convenient
> way to write string-as-binary literals.
> 
> Yes, it's confusing, and yes, it would be nice if any
> list-of-character-code expressions were allowed.
> 
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com






More information about the erlang-questions mailing list