Query about external format for lists (strings)

Kent Boortz kent@REDACTED
Fri Jan 17 03:24:57 CET 2003


Lawrie Brown <Lawrie.Brown@REDACTED> writes:
> As part of my work developing the runtime for the EC Erlang Compiler,
> I've recently been implementing its support for the external binary
> format, based heavily on the low-level (en/decode*.c) erl_interface
> code, but adapted to suit the rather different term coding we use. I
> have a query about the list coding.  From my reading of the code, and
> playing with binaries created on a standard erlang node, I take it that
> a list is always terminated with the ER_NIL_EXT (106) tag. However the
> code for encoding very large strings in encode_string.c as lists does
> not seem to write this trailing marker.  Should it do so??? It confused
> me at first since I based my overall list encoder on it, then wondered
> why the binary from a conventional Erlang node for a list which I
> decoded and then regenerated as a binary wasn't the same size as the
> original, sigh.
> 
> Anyway, I'd appreciate any feedback:
> 
> 1) confirming that ALL lists (incl those generated for very large strings)
>    should always be terminated by ER_NIL_EXT;
> 
> 2) and if so, whether the code in erl_interface*/src/en/decode_string.c
>    is hence incorrect.

I'm currently looking over the erl_interface/ei source but haven't
gone deeply into the external format yet. But this seem to be a
bug. Strangely enough the decoding function also ignore the tail of
the list (i.e. the last element).

Because I'm working on improving the erl_interface code we have just
begun to discuss some aspects of the interpretation of the external
format. There is very little written about it. Some examples of
problems with the interpretation

  - An Erlang node in the current implementation will always pack an
    Erlang integer into the smallest container possible in the
    external format but there is nothing said about this in
    "erts/emulator/internal_doc/erl_ext_dist.txt". Erl_interface can't
    for example decode a long string with ei_decode_string() if one of
    the elements is between 0 and 255 but coded into a bignum in the
    external format. I think even the emulator will break if the
    number 42 is sent to it as a large integer or bignum in the
    external format. It should fit into a small integer on the heap
    but will be coded into a bignum. If I'm not mistaken a compare for
    equality between the integer 42 created on the heap from the
    external format and the integer 42 created from your program could
    fail because the internal type is checked first and it is assumed
    that if the type is different then the integer can't be the same
    (I'm not 100% sure about this).

  - Reading the list header will not give the list length in all
    cases. If the tail of the list is also a list it will add elements
    to the list, i.e. 

      ei_encode_list_header(buf, &i, 2);
      ei_encode_integer(buf, &i, 'a');
      ei_encode_integer(buf, &i, 'b');
      ei_encode_list_header(buf, &i, 1);
      ei_encode_integer(buf, &i, 'c');
      ei_encode_empty_list(buf, &i);

    is actually the list [$a,$b,$c], i.e. a list of length 3.  Again
    the emulator will not create lists coded like this but if a list
    like this was created by erl_interface then decoding it with
    ei_decode_string() will break. With the current bug not checking
    for [] it will return the string "ab".

It is not currently decided if we will tighten up the definition of
the external format, what is allowed or not, or if we will correct
all source code that make false assumptions about how things are
coded,

kent



More information about the erlang-questions mailing list