Bit syntax frustrations, again

Thu Sep 26 22:39:53 CEST 2002

James Hague writes:
 > I've been writing an Erlang module to decode swf (Flash) files.  The swf
 > format is documented at http://www.openswf.org.  The problem I've been
 > having is that Erlang starts decoding bits with the most significant bit,
 > whereas the swf format--which is very bit-oriented--starts with the least
 > significant bit.  As best I can tell, the "little" qualifier in Erlang's bit
 > syntax refers to bytes in multi-byte values, not bits.
 > 
 > To give an example, suppose the next byte in the input stream contains
 > 16#fe, and I need to grab a 3-bit field followed by a 5-bit field.  The
 > obvious match of:
 > 
 > 	<<Field1:3, Field2:5, _/binary>> = Binary
 > 
 > results in:
 > 
 > 	Field1 = 2#111
 > 	Field2 = 2#11110
 > 
 > In the swf format, though, the correct answer is:
 > 
 > 	Field1 = 2#110
 > 	Field2 = 2#11111

Nobody else has answered, maybe because they're as confused as I am
about what the problem actually is. Options:

  1. Shockwave _really_ does pack the bits the wrong way around, i.e.
     when you look at byte, what my CPU considers to be the LSB
     is actually the MSB. This just cannot be true. (*)

or
  2. Shockwave numbers the bits in a byte the opposite way to Erlang.
     This is quite common, for instance in Motorola manuals the MSB
     is bit 0, whereas infineon calls bit 0 the LSB. So if the 
     documentation says

     Octet   Bits       Meaning
     ----------------------------------------------------------------------
     0       0,1        Filter mask
             2,3,4,5,6  Packet counter
             7          Reserved

     then you'd write it differently for Motorola and Infineon:

     <<Reserved:1, Counter:5, Mask:2>> = Infineon,
     <<Mask:2, Counter:5, Reserved:1>> = Motorola.

     But you seemed to say it wasn't that simple...

or
  3. Shockwave numbers the bits in a word in a topsy-turvy
     little-endian inspired way, e.g. if I got the following
     16 bits (32 bits is the same idea, just more confusion):

     MSB             LSB
     0000 0001 1100 0000

     then the shockwave format considers them to be numbered after
     the way a 16 bit word would be laid out in memory on a little
     endian machine, i.e.

     Byte Address   Value
     0              0xc0 
     1              0x01

     Expressing the set bits in Motorola numbering, we have bits
     7,8 and 9 set. In Infineon numbering we have 6,7 and 8. In
     Bizarro numbering it's 0, 14 and 15. 

     If this is the case, then when the manual says

     Bits   Meaning
     -------------------
     0-5    Annoyingness factor
     6-8    Bullshit power
     9-15   Convolution correction

     Then expressed in sane notation that means

     BBBA AAAA CCCC CCCB  

     The only general way I can think of decoding that with the bit
     syntax is

     <<H:16/little, T/binary>> = Bin,
     <<C:7, B:4, A:5>> = <<H:16>>.

     All of the above could also be 32-bits at a time little endian.

  4. None of the above options. I'll have to go out and take more
     drugs to twist my brain some more. 

Matthias

(* Bits-in-a-byte the wrong way around can happen, but surely not in a
file format. When you transmit SS7 traffic over a E1/T1 PCM line, you
send the LSB first. When you send voice over the same line, you send
the MSB first. Some hardware, such as ours, doesn't distinguish
between voice and data at lower levels, so sometimes you get
bit-reversed data up at the CPU. This is kinda annoying, but easily
dealt with in hardware.)