[erlang-questions] parsing text

Wes James <>
Fri Apr 30 17:04:02 CEST 2010

On Thu, Apr 29, 2010 at 11:07 PM, Richard O'Keefe <> wrote:
> On Apr 30, 2010, at 6:39 AM, Wes James wrote:
>> I have a function grabbing a page and I'm pulling text out of the
>> result.  I can get the line:
>> lists:nth(424,B).
>> <<"<B>Page Counter</B></TD><TD>4880</TD></TR>">>
>> but 4880 will eventually get to 10000, etc.
> It's not clear exactly how much else about the data will
> vary.  My take on this is that you want the stuff between
> <TD> and </TD>.



Thanks for your input on this.  I tested it and it worked.  I messed
around with xmerl_scan:string, but
"<B>Page Counter</B></TD><TD>4880</TD></TR>" doesn't seem to be
well formed xml - I kept getting errors.

xmerl_scan:string("<foo>" ++
11>                       "<myelement myattribute=\"red\">x</myelement>" ++
11>                       "<myelement myattribute=\"blue\">x</myelement>" ++
11>                       "<myelement myattribute=\"blue\">y</myelement>" ++
11>                     "</foo>").

works, but

xmerl_scan:string("<B>Page Counter</B></TD><TD>4880</TD></TR>").
2711- fatal: {unknown_entity_ref,nbsp}
2621- fatal: error_scanning_entity_ref
** exception exit: {fatal,{error_scanning_entity_ref,{file,file_name_unknown},
     in function  xmerl_scan:fatal/2
     in call from xmerl_scan:scan_content/11
     in call from xmerl_scan:scan_element/12
     in call from xmerl_scan:scan_document/2
     in call from xmerl_scan:string/2



More information about the erlang-questions mailing list