[erlang-questions] built-in xml parser

Dale Harvey <>
Sat Jun 27 10:21:16 CEST 2009


r13b1 released with a new xml sax parser module, supposedly around the
performance level as erlsom

http://erlang.org/doc/man/xmerl_sax_parser.html
http://erlang.org/download/otp_src_R13B01.readme

2009/6/26 Praveen Ray <>

> Jeffrey Friedl will be proud of you!
> On Thu, Jun 25, 2009 at 3:14 PM, Per Melin <> wrote:
>
> > Joel Reymont:
> > > I used to avoid regular expressions but then the new 're' module became
> > part
> > > of OTP.
> > >
> > > I'm now using regular expressions with abandon!
> > >
> > > Is there a chance that future versions of OTP come with a built-in XML
> > > parser based on a C library, just like 're'?
> >
> > The solution is obvious; use 're' to parse XML.
> >
> > -module(regexml).
> >
> > -export([parse/1]).
> >
> > -define(XML_RE, "[^<]+|<(?:!(?:--(?:[^-]*-(?:[^-][^-]*-)*->?)?"
> >                "|\\[CDATA\\[(?:[^\\]]*](?:[^\\]]+])*]+"
> >                "(?:[^\\]>][^\\]]*](?:[^\\]]+])*]+)*>)?"
> >                "|DOCTYPE(?:[ \\n\\t\\r]+(?:[A-Za-z_:]"
> >                "|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> >                "|[^\\x00-\\x7F])*(?:[ \\n\\t\\r]+(?:(?:[A-Za-z_:]"
> >                "|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> >                "|[^\\x00-\\x7F])*|\"[^\"]*\""
> >                "|'[^']*'))*(?:[ \\n\\t\\r]+)?"
> >                "(?:\\[(?:<(?:!(?:--[^-]*-(?:[^-][^-]*-)*->"
> >                "|[^-](?:[^\\]\"'><]+|\"[^\"]*\"|'[^']*')*>)"
> >                "|\\?(?:[A-Za-z_:]|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> >                "|[^\\x00-\\x7F])*(?:\\?>"
> >                "|[\\n\\r\\t ][^?]*\\?+(?:[^>?][^?]*\\?+)*>))"
> >                "|%(?:[A-Za-z_:]|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> >                "|[^\\x00-\\x7F])*;|[ \\n\\t\\r]+)*](?:[
> > \\n\\t\\r]+)?)?>?)?)?"
> >                "|\\?(?:(?:[A-Za-z_:]|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> >                "|[^\\x00-\\x7F])*(?:\\?>"
> >                "|[\\n\\r\\t ][^?]*\\?+(?:[^>?][^?]*\\?+)*>)?)?"
> >                "|/(?:(?:[A-Za-z_:]|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> >                "|[^\\x00-\\x7F])*(?:[ \\n\\t\\r]+)?>?)?|(?:(?:[A-Za-z_:]"
> >                "|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> >                "|[^\\x00-\\x7F])*(?:[ \\n\\t\\r]+(?:[A-Za-z_:]"
> >                "|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> >                "|[^\\x00-\\x7F])*(?:[ \\n\\t\\r]+)?="
> >                "(?:[ \\n\\t\\r]+)?(?:\"[^<\"]*\""
> >                "|'[^<']*'))*(?:[ \\n\\t\\r]+)?/?>?)?)").
> >
> > parse(String) ->
> >    re:run(String, ?XML_RE, [{capture, all, list}, global]).
> >
> > ---
> >
> > Adapted from http://www.cs.sfu.ca/~cameron/REX.html<http://www.cs.sfu.ca/%7Ecameron/REX.html>("XML Shallow
> > Parsing with Regular Expressions").
> >
> > ________________________________________________________________
> > erlang-questions mailing list. See http://www.erlang.org/faq.html
> > erlang-questions (at) erlang.org
> >
> >
>
>
> --
> Yellowfish Technologies Inc
> http://www.yellowfish.biz
> 
> (888) 817 2969 x 233
> gtalk/skype: praveenray
>


More information about the erlang-questions mailing list