[erlang-questions] parsing text
Wes James
comptekki@REDACTED
Fri Apr 30 19:10:43 CEST 2010
On Fri, Apr 30, 2010 at 10:51 AM, Anthony Molinaro
<anthonym@REDACTED> wrote:
<snip>
>
> Your string contains an HTML entity but that is not a valid xml
> entity (there are only 5 of those
> http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references)
>
> So if you tried
>
> 1> xmerl_scan:string("<B>Page Counter</B></TD><TD>4880</TD></TR>").
> {{xmlElement,'B','B',[],
> {xmlNamespace,[],[]},
> [],1,[],
> [{xmlText,[{'B',1}],1,[],"Page",text},
> {xmlText,[{'B',1}],2,[]," Counter",text}],
> [],"/tmp",undeclared},
> "</TD><TD>4880</TD></TR>"}
>
> You can see it does better, but still not what you want as it can only parse
> part of the structure (only <b>...</b> can be parsed, then you hit an end
> element without a start and the parsing stops).
>
> Your best bet might be to attempt to parse the entire file and not just part
> of it. But you'd still need a way to escape html entities so they can be
> parsed by an xml parser.
>
> -Anthony
Anthony,
Thx for this. This give me an idea on this. Since the line will
mostly be the same:
All I need to really do is chop off the chars before the 4 and then
chop off the last 10 chars that would give me the result I need.
Something like:
S=string:substr("<B>Page Counter</B></TD><TD>4880</TD></TR>", 34).
"4880</TD></TR>"
list_to_integer(string:substr(S, 1, string:len(S)-10)).
4880
Thx all!
-wes
More information about the erlang-questions
mailing list