[erlang-questions] String parsing recommendations

Mon Dec 14 10:27:57 CET 2015

Mark Steele writes:
 > Hi all,
 > 
 > This is a pretty basic question, but I'm new so bear with me.
 > 
 > I've got a binary that might look something like this:
 > 
 > <<"(foo,bar(faz,fez,fur(foe)),fuz)">>
 > 
 > Parsed, the it might look like:
 >   [
 >    [<<"foo">>],
 >    [<<"bar>>,<<"faz">>],
 >    [<<"bar">>,<<"fez">>],
 >    [<<"bar">>,<<"fur">>,<<"foe">>],
 >    [<<"fuz">>]
 >   ]
 > 
 > 
 > Any recommendations on the best approach to tackle this in Erlang?

The input language is clearly not regular, so you'd need
something stronger than regular expressions for parsing.
In this case a context-free parser should work.

Personally I'd implement a separate scanner which returns tokens
and updated input, and a recursive descent parser which handles
the grammar and produces the output.  Your desired output is not
like a parse tree but more like an evaluation of the parse tree
(bar(faz,fez,...) expands to a different structure), but that
expansion looks trivial so I'd let the parser do it -- otherwise
an intermediate parse tree and a separate evaluator will work.

There are scanner and parser generating tools for Erlang, but
they would be overkill for this simple language -- unless you're
new to language processing, in which case they could help by
providing examples and imposing a sensible structure to the
solution.  The re module could be used in a hand-written scanner.

Alas, no Erlang magic here, just ordinary compiler frontend code.

/Mikael