[erlang-questions] Did Erlang's grammar change in R16A?

Anthony Ramine n.oxyde@REDACTED
Fri Feb 15 02:04:09 CET 2013


My (unfinished) implementation dates back from when Erlang didn't have UTF-8 atoms
and I didn't think they would be coming that fast.

So I didn't have to mess with the arity field and just used this structure:

	typedef struct local_atom_ {
	    Eterm  header;
	    Eterm  equivrep;
	    Uint32 hash;
	    Uint32 len;
	    Eterm  name[1]; // by the way can we use C99/C11 variable-length arrays in OTP?
	} LocalAtom;

I didn't use the same structure as global atoms because they have a member specific
to their hash table structure.

When I make it handle UTF-8 atoms, I'll just split Uint32 len into two Uint16 bytes_len and
Uint16 char_len; 16 bits ought to be enough, right?

https://github.com/nox/otp/blob/bf3334c/erts/emulator/beam/erl_term.h#L561-567

That's an overhead of 3 words on 64-bit, 2 words on 64-bit with halfword emulator
and 4 words on 32-bit. Should we worry about 4 words when safety is concerned?
Should we worry about 4 words when the OTP XML parser cannot be used in production
with user input because it uses atoms for XML names? We shouldn't.

-- 
Anthony Ramine

Le 15 févr. 2013 à 01:48, Richard A. O'Keefe a écrit :

>> .. first thought you were messing with the arity thing meaning .. perhaps i should sleep. putting more stuff in the header .. seems good
> 
> EEP 20 was written with no knowledge of Erlang's low level implementation details.
> The background for it is a WAM-like architecture, with a 2-bit tag
> 00 immediate
> 01 box-of-bits
> 10 pointer to [_|_] (which has no header)
> 11 pointer to box-of-tagged-words
> and boxes have an "arity" field (the size of a tuple, the length of a binary)
> that includes a few "supertag" bits that say what kind of box it is.
> The "arity" field used in EEP 20 holds the bits that say "I am a local atom"
> and a length, encoded to make *both* "number of bytes" and "number of
> Unicode characters" constant-time operations.
> 
> The equivrep field is what enables atoms that have been found to be equal
> to be chained together; if a 3-word header is too big (despite being the same
> size as or smaller than a binary's header), that word could be sacrificed.




More information about the erlang-questions mailing list