[erlang-questions] Erlang and bbcode
Thu Jul 12 07:37:18 CEST 2012
On reading the slides about "Erlang sucks" I thought,
"what is bbcode and how hard can it be to write an
Erlang parser for it?"
Since having that thought, I've checked half a dozen
definitions of bbcode and looked at a parser or two
and am little the wiser.
BBcode strikes me as truly bizarre. What is the point
of entering something that looks pretty much like HTML
except for using square brackets instead of angle brackets?
But there is worse.
- I cannot discover whether any particular character set
or encoding is presumed and if so which one. (I'd
*guess* Unicode/UTF-8, but a guess is all it would be.)
- I cannot discover how you get a plain [ into text.
Could it be [[? Could it be [?
- I cannot discover exactly what is a well-formed tag and
what is not.
- I cannot discover whether [/*] is legal or not.
- I cannot discover whether markup is legal inside
a [url]...[/url] or not (it could be stripped out)
- Same for [email] and [img] and [youtube] and [gvideo]
- I cannot discover whether [size=n] takes n in points,
pixels, percentage of default, or anything else (it
seems that different systems do different things)
- I cannot discover whether [youtube] and [gvideo]
allow width/height like [img] or not.
- Some descriptions say that :-) is processed as a
smiley, and that other emoticons may be processed
too, but I cannot find a list; others say [:-)] is
a smiley; others say nothing about this.
- It is not clear how the author of [quote-author]...
should be rendered; I have a strong suspicion it
should be locale-dependent.
- It appears that different instances of bbcode support
different tag sets out of the box and most of them
allow some sort of customisation.
- It appears to be _expected_ that different bbcode
implementations will translate things differently
(so [b]xxx[/b] might yield <b> or <strong> or
<span style="font-weight: bolder;"> or something else),
which means that it would be hard to make a test suite.
Indeed, I can find no guarantee that [b] [i] and so on
won't just be stripped out.
If the lexical issues could be sorted out, one could easily
enough write a BBcode -> XML value tree parser, and an
XML -> XML translator to do things like
<url default=X>Y</url> -> <a href=X>Y</a>
<url>Y</url> -> <a href=string_value(Y)>Y</a>
and then use an existing XML -> text unparser.
An non-validating XML parser in Erlang took me 275 lines,
so I doubt bbcode would be much harder.
More information about the erlang-questions