[erlang-questions] wkipedia rendering engine

Vlad Skvortsov vss@REDACTED
Sat Jul 5 02:14:31 CEST 2008


Joe Armstrong wrote:
>>> We (collectively) promised to help Alexander - I promised to provide him with a
>>> rendering engine (in Erlang) for the wikipedia markup language.
>>>
>>> Before I start hacking has anybody done this before?
>>>       
>> What exactly do you mean by a 'rendering engine'? Translating the
>> markup language (its name is Mediawiki, by the way) to something else?
>>     
>
> I want a number of functions
>
>      mediaWiki_to_rtf(bin()) -> rtf().
>      rtf_to_html(rtf()) -> html().
>      rtf_to_pdf(rtf()) -> pdf()
>
> etc. where rtf(), html() pdf() are abstract datav types representing
> (abstracted) rich text, html, and pdf() etc.
>
> The rendering engine is a wrapper round these routines to display ther
> result in a browser or generate PDF etc.
>   

 From my experience it is very hard to convert mediaWiki format to an 
AST. Well, it's pretty easy to get 80% working, but the rest 20% are 
really tough. I wasn't even able to find a renderer which would be 
compatible with MediaWiki; most of them use simple regexp substitutions 
which work ok "most of the time". It was a year ago, though; may be 
things have changed.


>> It's not a trivial task you have set yourself. There are some elements
>> that are quite complex, for example the fact that '' is italics and
>> ''' is bold. Notice the difference between:
>>
>> '''this is bold'''
>>
>> '''this is italic, starting with a ' ''
>>
>> '''this is bold '' and this part italic as well '''''
>>
>>     
>
> This is almost trivial :-)
>   

It seems so until you get into gory details of templates, tables, math 
formulas and the like.

We ended up in screenscraping with a few heuristics to handle most 
common cases. The requirements had to be relaxed significantly.

-- 
Vlad Skvortsov, vss@REDACTED, http://vss.73rus.com




More information about the erlang-questions mailing list