Query about external format for lists (strings)
Kent Boortz
kent@REDACTED
Fri Jan 17 03:24:57 CET 2003
Lawrie Brown <Lawrie.Brown@REDACTED> writes:
> As part of my work developing the runtime for the EC Erlang Compiler,
> I've recently been implementing its support for the external binary
> format, based heavily on the low-level (en/decode*.c) erl_interface
> code, but adapted to suit the rather different term coding we use. I
> have a query about the list coding. From my reading of the code, and
> playing with binaries created on a standard erlang node, I take it that
> a list is always terminated with the ER_NIL_EXT (106) tag. However the
> code for encoding very large strings in encode_string.c as lists does
> not seem to write this trailing marker. Should it do so??? It confused
> me at first since I based my overall list encoder on it, then wondered
> why the binary from a conventional Erlang node for a list which I
> decoded and then regenerated as a binary wasn't the same size as the
> original, sigh.
>
> Anyway, I'd appreciate any feedback:
>
> 1) confirming that ALL lists (incl those generated for very large strings)
> should always be terminated by ER_NIL_EXT;
>
> 2) and if so, whether the code in erl_interface*/src/en/decode_string.c
> is hence incorrect.
I'm currently looking over the erl_interface/ei source but haven't
gone deeply into the external format yet. But this seem to be a
bug. Strangely enough the decoding function also ignore the tail of
the list (i.e. the last element).
Because I'm working on improving the erl_interface code we have just
begun to discuss some aspects of the interpretation of the external
format. There is very little written about it. Some examples of
problems with the interpretation
- An Erlang node in the current implementation will always pack an
Erlang integer into the smallest container possible in the
external format but there is nothing said about this in
"erts/emulator/internal_doc/erl_ext_dist.txt". Erl_interface can't
for example decode a long string with ei_decode_string() if one of
the elements is between 0 and 255 but coded into a bignum in the
external format. I think even the emulator will break if the
number 42 is sent to it as a large integer or bignum in the
external format. It should fit into a small integer on the heap
but will be coded into a bignum. If I'm not mistaken a compare for
equality between the integer 42 created on the heap from the
external format and the integer 42 created from your program could
fail because the internal type is checked first and it is assumed
that if the type is different then the integer can't be the same
(I'm not 100% sure about this).
- Reading the list header will not give the list length in all
cases. If the tail of the list is also a list it will add elements
to the list, i.e.
ei_encode_list_header(buf, &i, 2);
ei_encode_integer(buf, &i, 'a');
ei_encode_integer(buf, &i, 'b');
ei_encode_list_header(buf, &i, 1);
ei_encode_integer(buf, &i, 'c');
ei_encode_empty_list(buf, &i);
is actually the list [$a,$b,$c], i.e. a list of length 3. Again
the emulator will not create lists coded like this but if a list
like this was created by erl_interface then decoding it with
ei_decode_string() will break. With the current bug not checking
for [] it will return the string "ab".
It is not currently decided if we will tighten up the definition of
the external format, what is allowed or not, or if we will correct
all source code that make false assumptions about how things are
coded,
kent
More information about the erlang-questions
mailing list