Bit syntax frustrations, again (LONG)

Tony Rogvall tony@REDACTED
Thu Oct 3 14:20:49 CEST 2002


Matthias Lang wrote:

>
>  4. None of the above options. I'll have to go out and take more
>     drugs to twist my brain some more. 
>  
>
Buy some for me as well :-)

>Matthias
>
>(* Bits-in-a-byte the wrong way around can happen, but surely not in a
>file format. When you transmit SS7 traffic over a E1/T1 PCM line, you
>send the LSB first. When you send voice over the same line, you send
>the MSB first. Some hardware, such as ours, doesn't distinguish
>between voice and data at lower levels, so sometimes you get
>bit-reversed data up at the CPU. This is kinda annoying, but easily
>dealt with in hardware.)
>  
>

This is a attempt to solve this question once and for all :-)

First the conclusion ! (The investigation below)

/Tony


Conclusion:

    When we bit match data from a raw bit field structure we must know
    the size of the word that the bits where packed in 8,16,32 ...

    For big endian data the layout is irrelevant since bits
    and bytes corresponds to the bit syntax default (natural layout)

    For little endian data the bytes must be reversed group wise
    taking care of the way bits are packed in words. Then
    the bit field spec must be reversed as well.

    so if a big endian spec is:
        (and that is generated from a packed bit field structure in C)

       <<A:3, B:2, C:1, D:4, E:6>> = Bin

    The little edian spec whould look like

       <<E:6,D:4,C:1,B:2,A:3>> = bin_reverse(Bin)

        To solve this once and for all I wote for (re)introduction of
    bit groups!

    To given an example of how to specify the above bit pattern in
    a bit group:

        << <<A:3, B:2, C:1, D:4, E:6>>:2/little >>

        << <<A:3, B:2, C:1, D:4, E:6>>:2/big >> ==
            <<A:3, B:2, C:1, D:4, E:6>>

       Even better should be to be able to give the endian in runtime,
       Then we could have ONE pattern.
        << <<A:3, B:2, C:1, D:4, E:6>>:2/Endian >>  ==
       
        case Endian of
         big ->
            << <<A:3, B:2, C:1, D:4, E:6>>:2/big >>;
         little ->
            << <<A:3, B:2, C:1, D:4, E:6>>:2/little >>
            end

THE GORY DEATAILS
----------------------------------

Given the C program test.c

typedef struct {
    unsigned a:2;
    unsigned b:3;
    unsigned c:1;
    unsigned d:2;
} __attribute__((packed)) b8_t;

typedef struct {
    unsigned a:3;
    unsigned b:7;
    unsigned c:1;
    unsigned d:5;
} __attribute__((packed)) b16_t;


int main(int argc, char** argv)
{
    b8_t x8;
    b16_t x16;

    x8.a = 1; x8.b = 2; x8.c = 0; x8.d = 3;
    x16.a = 1; x16.b = 2; x16.c = 0; x16.d = 3;

    write(1, &x8, sizeof(x8));
    write(1, &x16, sizeof(x16));
    exit(0);
}


We want to be able to read the ouput from Erlang with the binary
syntax.

We may use open_port or a file to read the raw data from the above program.

  X8 = <read 1 raw byte>
  X16 = <read 2 raw bytes>



BIG ENDIAN 

  (compile "gcc" the above program on a big endian machine)

   X8 = <<2#01010011>>
   -------------------

Layout:

        +---+---+---+---+---+---+---+---+
     X8 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
        +---+---+---+---+---+---+---+---+
        |   A   |     B     | C |   D   |
        +---+---+---+---+---+---+---+---+

Match: <<A:2, B:3, C:1, D:2>> = X8
Values A=1, B=2, C=0, D=3


   X16 = <<2#00100000,2#10000011>>
   -------------------------------

Layout:

        +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    X16 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
        +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
        |     A     |             B             | C |         D         |
        +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+


Match: <<A:3, B:7, C:1, D:5>> = X16
Values A=1, B=2, C=0, D=3


LITTLE ENDIAN

    (compile "gcc" the above C program on a little endian machine)

    X8 = <<2#11001001>>
   --------------------

Layout:

        +---+---+---+---+---+---+---+---+
     X8 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 |
        +---+---+---+---+---+---+---+---+
        |   D   | C |     B     |   A   |
        +---+---+---+---+---+---+---+---+

Match: <<D:2, C:3, B:1, A:2>> = X8

Values A=1, B=2, C=0, D=3

 THE FUN PART

   X16 = <<2#00010001, 2#00011000>>
   --------------------------------
Layout:

        +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    X16 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
        +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
        |      Bl           |     A     |         D         | C |  Bh   |
        +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

Match: <<Bl:5,A:3,D:5,C:1,Bh:2>> = X16

Values A=1, Bl+(Bh<<5)=2, C=0, D=3

 From The Little endian case we can see that for little endian bit fields
with in a word must be reversed. Depending on what word size the bits
was stuffed in we must reverse the bytes with in the word size then
reverse the fields.


i.e if we reverse the bytes for X16 we get the layout:

        +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    X16 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
        +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
        |         D         | C |  Bh   |      Bl           |     A     |
        +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

Match: <<D:5, C:1, B:7, A:3>> = X16

At least this is consistent with the X8 case, where we reverse all
fields.

END






More information about the erlang-questions mailing list