[erlang-questions] Rant: I hate parsing XML with Erlang

Thomas Lindgren thomasl_erlang@REDACTED
Tue Oct 23 16:27:19 CEST 2007


--- Joel Reymont <joelr1@REDACTED> wrote:

> 
> On Oct 23, 2007, at 2:30 PM, Sean Hinde wrote:
> 
> > Take a look at yaws_html.erl. That is quite a nice
> parser that  
> > doesn't produce the same bloat as xmerl
> 
> Are there any examples of using yaws_html as well as
> the output that  
> it produces? Would be nice to include in this
> thread.

1> yaws_html:parse("ShowLetter.html").
{ehtml,[],
       [{html,[],
              [{head,[],
                     [{title,[],"Yahoo! Mail -
thomasl_erlang@REDACTED"},
                      {script,[],
                              "\n<!-- \n\tif(typeof
top.frames[\"wmailmain\"] != \"undefined\")
window.open(\"http://mail.yahoo.com\", \"_top\");\n//
-->\n"},
                      {link,[{rel,"stylesheet"},
                            
{href,"ShowLetter_files/mail_blue_all.css"},
                             {type,"text/css"},
                             {media,"all"}]},
                     
{script,[{src,"ShowLetter_files/mailcommonlib.js"}],[]},
...
and so on

However, note that yaws_html (1.68 in this case)
apparently isn't robust enough to handle unclosed tags
and perhaps other nastiness. You get parse errors
instead, which might not be what you want for a real
html processor. Good luck.

(For extra credit, write an xmerl -> ehtml converter.)

Best,
Thomas


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



More information about the erlang-questions mailing list