Bit syntax frustrations, again
Matthias Lang
matthias@REDACTED
Thu Sep 26 22:39:53 CEST 2002
James Hague writes:
> I've been writing an Erlang module to decode swf (Flash) files. The swf
> format is documented at http://www.openswf.org. The problem I've been
> having is that Erlang starts decoding bits with the most significant bit,
> whereas the swf format--which is very bit-oriented--starts with the least
> significant bit. As best I can tell, the "little" qualifier in Erlang's bit
> syntax refers to bytes in multi-byte values, not bits.
>
> To give an example, suppose the next byte in the input stream contains
> 16#fe, and I need to grab a 3-bit field followed by a 5-bit field. The
> obvious match of:
>
> <<Field1:3, Field2:5, _/binary>> = Binary
>
> results in:
>
> Field1 = 2#111
> Field2 = 2#11110
>
> In the swf format, though, the correct answer is:
>
> Field1 = 2#110
> Field2 = 2#11111
Nobody else has answered, maybe because they're as confused as I am
about what the problem actually is. Options:
1. Shockwave _really_ does pack the bits the wrong way around, i.e.
when you look at byte, what my CPU considers to be the LSB
is actually the MSB. This just cannot be true. (*)
or
2. Shockwave numbers the bits in a byte the opposite way to Erlang.
This is quite common, for instance in Motorola manuals the MSB
is bit 0, whereas infineon calls bit 0 the LSB. So if the
documentation says
Octet Bits Meaning
----------------------------------------------------------------------
0 0,1 Filter mask
2,3,4,5,6 Packet counter
7 Reserved
then you'd write it differently for Motorola and Infineon:
<<Reserved:1, Counter:5, Mask:2>> = Infineon,
<<Mask:2, Counter:5, Reserved:1>> = Motorola.
But you seemed to say it wasn't that simple...
or
3. Shockwave numbers the bits in a word in a topsy-turvy
little-endian inspired way, e.g. if I got the following
16 bits (32 bits is the same idea, just more confusion):
MSB LSB
0000 0001 1100 0000
then the shockwave format considers them to be numbered after
the way a 16 bit word would be laid out in memory on a little
endian machine, i.e.
Byte Address Value
0 0xc0
1 0x01
Expressing the set bits in Motorola numbering, we have bits
7,8 and 9 set. In Infineon numbering we have 6,7 and 8. In
Bizarro numbering it's 0, 14 and 15.
If this is the case, then when the manual says
Bits Meaning
-------------------
0-5 Annoyingness factor
6-8 Bullshit power
9-15 Convolution correction
Then expressed in sane notation that means
BBBA AAAA CCCC CCCB
The only general way I can think of decoding that with the bit
syntax is
<<H:16/little, T/binary>> = Bin,
<<C:7, B:4, A:5>> = <<H:16>>.
All of the above could also be 32-bits at a time little endian.
4. None of the above options. I'll have to go out and take more
drugs to twist my brain some more.
Matthias
(* Bits-in-a-byte the wrong way around can happen, but surely not in a
file format. When you transmit SS7 traffic over a E1/T1 PCM line, you
send the LSB first. When you send voice over the same line, you send
the MSB first. Some hardware, such as ours, doesn't distinguish
between voice and data at lower levels, so sometimes you get
bit-reversed data up at the CPU. This is kinda annoying, but easily
dealt with in hardware.)
More information about the erlang-questions
mailing list