[erlang-questions] Erlang and bbcode
Thu Jul 12 09:31:47 CEST 2012
I had an interesting experience when I wrote a markdown parser:
Markdown is widely used and fairly mature with a lot of different
Initially I started writing against the 'spec' but the spec is not so
Then I copied over a chunk of tests from the C# implementation so my
spec would be 'like them'.
Finally I realised that I was an idiot and that I had it all
backwards. In order to be useful it needed to play nicely with the
library was pretty mature, widely used and substantially the default.
So I then built a test case generator that took sample markedown and
generated tests that checked my hand written parser returned the same
At that point the old 'html doesn't care about whitespace' bug kicked
in - some constructions in the js library would return additional
spaces or carriage returns and the effort of unpicking my parser to
make it match was too much effort.
The key takeway is that you can't/shouldn't write any of this sort of
stuff without cognisance of the wider environment that they will be
used with - particularly with respect to js libraries.
We moved away from using markdown and went to an open source WYSIWG
html editor in the end.
On 12 July 2012 06:37, Richard O'Keefe <> wrote:
> On reading the slides about "Erlang sucks" I thought,
> "what is bbcode and how hard can it be to write an
> Erlang parser for it?"
> Since having that thought, I've checked half a dozen
> definitions of bbcode and looked at a parser or two
> and am little the wiser.
> BBcode strikes me as truly bizarre. What is the point
> of entering something that looks pretty much like HTML
> except for using square brackets instead of angle brackets?
> But there is worse.
> - I cannot discover whether any particular character set
> or encoding is presumed and if so which one. (I'd
> *guess* Unicode/UTF-8, but a guess is all it would be.)
> - I cannot discover how you get a plain [ into text.
> Could it be [[? Could it be [?
> - I cannot discover exactly what is a well-formed tag and
> what is not.
> - I cannot discover whether [/*] is legal or not.
> - I cannot discover whether markup is legal inside
> a [url]...[/url] or not (it could be stripped out)
> - Same for [email] and [img] and [youtube] and [gvideo]
> - I cannot discover whether [size=n] takes n in points,
> pixels, percentage of default, or anything else (it
> seems that different systems do different things)
> - I cannot discover whether [youtube] and [gvideo]
> allow width/height like [img] or not.
> - Some descriptions say that :-) is processed as a
> smiley, and that other emoticons may be processed
> too, but I cannot find a list; others say [:-)] is
> a smiley; others say nothing about this.
> - It is not clear how the author of [quote-author]...
> should be rendered; I have a strong suspicion it
> should be locale-dependent.
> - It appears that different instances of bbcode support
> different tag sets out of the box and most of them
> allow some sort of customisation.
> - It appears to be _expected_ that different bbcode
> implementations will translate things differently
> (so [b]xxx[/b] might yield <b> or <strong> or
> <span style="font-weight: bolder;"> or something else),
> which means that it would be hard to make a test suite.
> Indeed, I can find no guarantee that [b] [i] and so on
> won't just be stripped out.
> If the lexical issues could be sorted out, one could easily
> enough write a BBcode -> XML value tree parser, and an
> XML -> XML translator to do things like
> <url default=X>Y</url> -> <a href=X>Y</a>
> <url>Y</url> -> <a href=string_value(Y)>Y</a>
> and then use an existing XML -> text unparser.
> An non-validating XML parser in Erlang took me 275 lines,
> so I doubt bbcode would be much harder.
> erlang-questions mailing list
More information about the erlang-questions