[erlang-questions] How to extract string between XML tags
PAILLEAU Eric
eric.pailleau@REDACTED
Sat Sep 29 17:59:34 CEST 2018
Hello,
if the question is to be sure that tags are correctly balanced, it is
better to use xmerl parser like Fred proposed.
I see an issue in your regexp
"<\([^>]\+\)>\(.*\)</\([^>]\+\)>" (.*) will catch anything including
tags (I mean also < )
use instead
"<\([^>]\+\)>\([^<]+\)</\([^>]\+\)>" i.e anything that is not a tag start.
but for instance it will not work on nested tags :
1> re:replace("<th>title
<b>bold</b></th>","<\([^>]\+\)>\([^<]+\)</\([^>]\+\)>","\\1 \\2
\\3",[global,{return, list}]).
"<th>title b bold b</th>"
note that could be rewritten also to :
2> A = re:replace("<th>title
<b>bold</b></th>","<\([^>]\+\)>\([^<]+\)</\([^>]\+\)>","\\1 \\2
\\1",[global,{return, list}]).
"<th>title b bold b</th>"
3> B = re:replace("<th>title
<b>bold</b></th>","<\([^>]\+\)>\([^<]+\)</\([^>]\+\)>","\\1 \\2
\\1",[global,{return, list}]).
"<th>title b bold b</th>"
As \\1 MUST BE equal to \\3
4> A = B.
should be ok.
Exemple with a single tag
43> A =
re:replace("<th>title</th>","<\([^>]\+\)>\([^<]+\)</\([^>]\+\)>","\\1
\\2 \\3",[global,{return, list}]).
"th title th"
44> B =
re:replace("<th>title</th>","<\([^>]\+\)>\([^<]+\)</\([^>]\+\)>","\\1
\\2 \\3",[global,{return, list}]).
"th title th"
45> A = B.
"th title th"
But with unbalanced tag fails:
48> A =
re:replace("<th>title</b>","<\([^>]\+\)>\([^<]+\)</\([^>]\+\)>","\\1 \\2
\\3",[global,{return, list}]).
"th title b"
49> B =
re:replace("<th>title</b>","<\([^>]\+\)>\([^<]+\)</\([^>]\+\)>","\\1 \\2
\\1",[global,{return, list}]).
"th title th"
50> A=B.
** exception error: no match of right hand side value "th title th"
Regards
Le 29/09/2018 à 13:30, Eckard Brauer a écrit :
> Hello,
>
> just another (a beginner's) question probably leading away from the
> initial point:
>
> If I use
>
> T = re:replace("<th>title <b>bold</b></th>",
> "<\([^>]\+\)>\(.*\)</\([^>]\+\)>",
> "\\1 \\2 \\3",
> [global,{return, list}]).
>
> how could I check that T is of the form "X Y X"?
>
>
>
> Am Sat, 29 Sep 2018 11:18:14 +0200
> schrieb PAILLEAU Eric <eric.pailleau@REDACTED>:
>
>> Hello,
>> BTW "</?[^>]{1,}>" works too, no need to escape / (Perl
>> reflex :) ...)
>>
>>
>> Le 29/09/2018 à 11:13, PAILLEAU Eric a écrit :
>> [...]
>> [...]
>> [...]
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
More information about the erlang-questions
mailing list