[erlang-questions] tokenising broken code

Tue Jan 14 21:14:22 CET 2014

Hello,

Does anybody have a tokeniser for broken erlang code (broken means
unparsable).

I just want to render variables, atoms strings etc. in different
colors and typefaces.

So I need a tokeniser that

  - retains everything (comments and all)
  - does not do any token conversions (ie 16#abc) is not tokenised
    as {int,2748}, but as {integer,"16#abc"}

It needs to handle broken code in a sensible way - for example if a
string end quote is missing - do something sensible.

Assuming that what I wants tokenises a string S into a sequence
[{Tag1,S1},{Tag2,S2},...] I'd like S1 ++ S2 ++ ... = S. ie. the tokeniser
should be lossless.

Cheers

/Joe