[erlang-questions] How to extract string between XML tags
Hugo Mills
hugo@REDACTED
Wed Sep 26 00:06:30 CEST 2018
On Tue, Sep 25, 2018 at 05:56:01PM -0400, lloyd@REDACTED wrote:
> Hello,
>
> By now I should know how to do this. But I've fumbled for more time than I have to find an elegant solution.
>
> Can anyone show a better way?
>
> Example string: "<th>Firstname</th>" % NOTE: could be any valid tag
>
> My kludge:
>
> extract_text(TaggedText) ->
> Split = re:split(TaggedText, "<"),
> Split2 = lists:nth(2, Split),
> Split3 = binary_to_list(Split2),
> Split4 = re:split(Split3, ">"),
> Split5 = lists:nth(2, Split4),
> binary_to_list(Split5).
>
> Surely there's a better way.
XML isn't a regular language, so (in the general case) you can't(*)
use regexes and simple string splitting to parse XML correctly. If
you've got a very constrained input, where you know that it's going to
conform to specific patterns that you can match on, then you might get
away with it, but if that's not the case, you're barking up the wrong
tree with any kind of regex.
The solution is to swallow the pain and use a proper XML library.
I've had good results with erlsom in my own projects, but there's
several other erlang XML libs out there, with various benefits and
issues. I'm sure others will weigh in with their experiences with
those.
Hugo.
(*) By "can't", I don't mean "it's just too painful". I mean "it's
provably not possible to do it right in all cases".
--
Hugo Mills | "There's a Martian war machine outside -- they want
hugo@REDACTED carfax.org.uk | to talk to you about a cure for the common cold."
http://carfax.org.uk/ |
PGP: E2AB1DE4 | Stephen Franklin, Babylon 5
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180925/428f8d3a/attachment.bin>
More information about the erlang-questions
mailing list