[erlang-questions] parsing text

Fri Apr 30 19:10:43 CEST 2010

On Fri, Apr 30, 2010 at 10:51 AM, Anthony Molinaro
<anthonym@REDACTED> wrote:

<snip>

>
> Your string contains an HTML entity   but that is not a valid xml
> entity (there are only 5 of those
> http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references)
>
> So if you tried
>
> 1> xmerl_scan:string("<B>Page&nbsp;Counter</B></TD><TD>4880</TD></TR>").
> {{xmlElement,'B','B',[],
>             {xmlNamespace,[],[]},
>             [],1,[],
>             [{xmlText,[{'B',1}],1,[],"Page",text},
>              {xmlText,[{'B',1}],2,[]," Counter",text}],
>             [],"/tmp",undeclared},
>  "</TD><TD>4880</TD></TR>"}
>
> You can see it does better, but still not what you want as it can only parse
> part of the structure (only <b>...</b> can be parsed, then you hit an end
> element without a start and the parsing stops).
>
> Your best bet might be to attempt to parse the entire file and not just part
> of it.  But you'd still need a way to escape html entities so they can be
> parsed by an xml parser.
>
> -Anthony

Anthony,

Thx for this.  This give me an idea on this.  Since the line will
mostly be the same:

All I need to really do is chop off the chars before the 4 and then
chop off the last 10 chars that would give me the result I need.

Something like:

S=string:substr("<B>Page Counter</B></TD><TD>4880</TD></TR>", 34).
"4880</TD></TR>"
list_to_integer(string:substr(S, 1, string:len(S)-10)).
4880

Thx all!

-wes