[erlang-questions] built-in xml parser
Praveen Ray
praveen.ray@REDACTED
Fri Jun 26 16:25:14 CEST 2009
Jeffrey Friedl will be proud of you!
On Thu, Jun 25, 2009 at 3:14 PM, Per Melin <per.melin@REDACTED> wrote:
> Joel Reymont:
> > I used to avoid regular expressions but then the new 're' module became
> part
> > of OTP.
> >
> > I'm now using regular expressions with abandon!
> >
> > Is there a chance that future versions of OTP come with a built-in XML
> > parser based on a C library, just like 're'?
>
> The solution is obvious; use 're' to parse XML.
>
> -module(regexml).
>
> -export([parse/1]).
>
> -define(XML_RE, "[^<]+|<(?:!(?:--(?:[^-]*-(?:[^-][^-]*-)*->?)?"
> "|\\[CDATA\\[(?:[^\\]]*](?:[^\\]]+])*]+"
> "(?:[^\\]>][^\\]]*](?:[^\\]]+])*]+)*>)?"
> "|DOCTYPE(?:[ \\n\\t\\r]+(?:[A-Za-z_:]"
> "|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> "|[^\\x00-\\x7F])*(?:[ \\n\\t\\r]+(?:(?:[A-Za-z_:]"
> "|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> "|[^\\x00-\\x7F])*|\"[^\"]*\""
> "|'[^']*'))*(?:[ \\n\\t\\r]+)?"
> "(?:\\[(?:<(?:!(?:--[^-]*-(?:[^-][^-]*-)*->"
> "|[^-](?:[^\\]\"'><]+|\"[^\"]*\"|'[^']*')*>)"
> "|\\?(?:[A-Za-z_:]|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> "|[^\\x00-\\x7F])*(?:\\?>"
> "|[\\n\\r\\t ][^?]*\\?+(?:[^>?][^?]*\\?+)*>))"
> "|%(?:[A-Za-z_:]|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> "|[^\\x00-\\x7F])*;|[ \\n\\t\\r]+)*](?:[
> \\n\\t\\r]+)?)?>?)?)?"
> "|\\?(?:(?:[A-Za-z_:]|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> "|[^\\x00-\\x7F])*(?:\\?>"
> "|[\\n\\r\\t ][^?]*\\?+(?:[^>?][^?]*\\?+)*>)?)?"
> "|/(?:(?:[A-Za-z_:]|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> "|[^\\x00-\\x7F])*(?:[ \\n\\t\\r]+)?>?)?|(?:(?:[A-Za-z_:]"
> "|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> "|[^\\x00-\\x7F])*(?:[ \\n\\t\\r]+(?:[A-Za-z_:]"
> "|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> "|[^\\x00-\\x7F])*(?:[ \\n\\t\\r]+)?="
> "(?:[ \\n\\t\\r]+)?(?:\"[^<\"]*\""
> "|'[^<']*'))*(?:[ \\n\\t\\r]+)?/?>?)?)").
>
> parse(String) ->
> re:run(String, ?XML_RE, [{capture, all, list}, global]).
>
> ---
>
> Adapted from http://www.cs.sfu.ca/~cameron/REX.html ("XML Shallow
> Parsing with Regular Expressions").
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>
--
Yellowfish Technologies Inc
http://www.yellowfish.biz
praveen.ray@REDACTED
(888) 817 2969 x 233
gtalk/skype: praveenray
More information about the erlang-questions
mailing list