[erlang-questions] Erlang and bbcode

Loïc Hoguin essen@REDACTED
Thu Jul 12 10:02:05 CEST 2012


Protip: BBCode was introduced by PHP developers who thought entering 
HTML-alike code converted using regexpes was more secure than entering 
HTML directly. It's not and it actually brought a lot of security issues 
on its own. See PHPBB for a more complete history.

You're of course right about all this, there's no point in implementing 
BBCode, HTML works out just fine, we just need an HTML filtering 
library, perhaps something akin to HTML Purifier (http://htmlpurifier.org/).

On 07/12/2012 07:37 AM, Richard O'Keefe wrote:
> On reading the slides about "Erlang sucks" I thought,
> "what is bbcode and how hard can it be to write an
> Erlang parser for it?"
>
> Since having that thought, I've checked half a dozen
> definitions of bbcode and looked at a parser or two
> and am little the wiser.
>
> BBcode strikes me as truly bizarre.  What is the point
> of entering something that looks pretty much like HTML
> except for using square brackets instead of angle brackets?
> But there is worse.
>
> - I cannot discover whether any particular character set
>    or encoding is presumed and if so which one.  (I'd
>    *guess* Unicode/UTF-8, but a guess is all it would be.)
> - I cannot discover how you get a plain [ into text.
>    Could it be [[?  Could it be [[]?
> - I cannot discover exactly what is a well-formed tag and
>    what is not.
> - I cannot discover whether [/*] is legal or not.
> - I cannot discover whether markup is legal inside
>    a [url]...[/url] or not (it could be stripped out)
> - Same for [email] and [img] and [youtube] and [gvideo]
> - I cannot discover whether [size=n] takes n in points,
>    pixels, percentage of default, or anything else (it
>    seems that different systems do different things)
> - I cannot discover whether [youtube] and [gvideo]
>    allow width/height like [img] or not.
> - Some descriptions say that :-) is processed as a
>    smiley, and that other emoticons may be processed
>    too, but I cannot find a list; others say [:-)] is
>    a smiley; others say nothing about this.
> - It is not clear how the author of [quote-author]...
>    should be rendered; I have a strong suspicion it
>    should be locale-dependent.
> - It appears that different instances of bbcode support
>    different tag sets out of the box and most of them
>    allow some sort of customisation.
> - It appears to be _expected_ that different bbcode
>    implementations will translate things differently
>    (so [b]xxx[/b] might yield <b> or <strong> or
>    <span style="font-weight: bolder;"> or something else),
>    which means that it would be hard to make a test suite.
>    Indeed, I can find no guarantee that [b] [i] and so on
>    won't just be stripped out.
>
> If the lexical issues could be sorted out, one could easily
> enough write a BBcode -> XML value tree parser, and an
> XML -> XML translator to do things like
> <url default=X>Y</url> -> <a href=X>Y</a>
> <url>Y</url> -> <a href=string_value(Y)>Y</a>
> and then use an existing XML -> text unparser.
> An non-validating XML parser in Erlang took me 275 lines,
> so I doubt bbcode would be much harder.
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>


-- 
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu





More information about the erlang-questions mailing list