[erlang-questions] tokenising broken code

Tue Jan 14 21:20:01 CET 2014

Hello Joe -

  Antlr 4 has an Erlang grammar that might be able to help out, if you are
willing to work with Java.

https://github.com/antlr/grammars-v4/blob/master/erlang/Erlang.g4

Cheers -
Dave

On Tue, Jan 14, 2014 at 3:14 PM, Joe Armstrong <erlang@REDACTED> wrote:

> Hello,
>
> Does anybody have a tokeniser for broken erlang code (broken means
> unparsable).
>
> I just want to render variables, atoms strings etc. in different
> colors and typefaces.
>
> So I need a tokeniser that
>
>   - retains everything (comments and all)
>   - does not do any token conversions (ie 16#abc) is not tokenised
>     as {int,2748}, but as {integer,"16#abc"}
>
> It needs to handle broken code in a sensible way - for example if a
> string end quote is missing - do something sensible.
>
> Assuming that what I wants tokenises a string S into a sequence
> [{Tag1,S1},{Tag2,S2},...] I'd like S1 ++ S2 ++ ... = S. ie. the tokeniser
> should be lossless.
>
> Cheers
>
> /Joe
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140114/ed3411d5/attachment.htm>