[erlang-questions] decoding nmea messages

Fri Aug 13 02:08:22 CEST 2010

On Aug 13, 2010, at 2:19 AM, info wrote:

> Hi Richard,
> The fields don't contain commas but sometimes they are empty. Therefore the size of the messages is not a constant.

That's why I pointed out that you can "extract the rest
(because you know HOW LONG the binary is)".

I didn't mean that you know _without looking_, but that
first you _find out_ how long the binary is and _then_
you do the binary match for the fixed length parts.

> And what about the syntax analysis if each field is considered as a binary ?
> Someone suggest to use regexp ...

The question about how to analyse the fields depends on what the
fields are supposed to look like.  You'd really like to stick
fairly close to the specification.  For example, suppose you are
given something like this:

DBS	Depth Below Surface
       1   2 3   4 5   6 7
       |   | |   | |   | |
$--DBS,x.x,f,x.x,M,x.x,F*hh
1) Depth, feet
2) f = feet
3) Depth, meters
4) M = meters
5) Depth, Fathoms
6) F = Fathoms
7) Checksum

(NMEAdescription.pdf, page 7, found on the web)
I'd actually think about writing a tiny pattern compiler that
took this and generated

%% DBS	Depth Below Surface
%% $--DBS,x.x,f,x.x,M,x.x,F*hh

depth_below_surface(Talker, N,
   <<"$", Talker/binary:2, "DBS,", Data/binary:(N-10), _/binary>>
) ->
   case split(Data)
     of [F1,<<"f">>,F3,<<"M">>,F5,<<"F">>] ->
        case {nchk(F1),nck(F3),nchk(F5)}
          of {N1,N3,N5} when is_number(N1),is_number(N3),is_number(N5) ->
	     {Talker,dbs,N1,N3,N5}
	   ; fail
	end
      ; _ -> fail
    end;
depth_below_surface(_, _, _) -> fail.

or something like that.  It would be necessary to read the specification
of all the items one might want to recognise to come up with a suitable
vocabulary.  There are
	- distance (a number)
	- angle (a number)
	- waypoint id (no idea of format)
	- hexadecimal of specified length
	- fixed letter
	- one of several letters
	- mode
	- satellite ID
	- time
	- time difference
	- frequency
	...

Now regular expressions CAN be used to match these, but not
to decode them.  And supposing you do decide to use regular
expressions for taking things apart,

 (a) it would be better not to do this by hand.  Complex
     regular expressions are not easy to get right.   It
     is much better to write a little AWK or Python (or Erlang!)
     program to read the specifications and do the conversions.
     It is so much easier to check that you have copied information
     verbatim than to check that you have translated it correctly.
     If a match or translation for some field type is wrong, it is
     so much easier to fix it ONCE in the pattern compiler than to
     fix every occurrence of that pattern.  And with a pattern
     compiler (NOT a complex program in this case), it is so much
     easier for maintainers to add new formats or revise old ones.

 (b) the different semantic types imply _range_ checks that are
     difficult to do with regular expressions, so you would STILL
     need custom code to validate what you found, which for the
     reasons given in (a) should be generated, not hand written.

Which means that regular expressions really give you very very
little for this problem.

I note that these messages are limited to 80 bytes, so I would
not be scared to turn them into lists and parse them that way.
And it appears that the fields cannot contain commas, so breaking
one of these things into fields is easy, it's just checking and
converting them that's tricky.