[erlang-questions] parsing text
Wes James
comptekki@REDACTED
Fri Apr 30 17:04:02 CEST 2010
On Thu, Apr 29, 2010 at 11:07 PM, Richard O'Keefe <ok@REDACTED> wrote:
>
> On Apr 30, 2010, at 6:39 AM, Wes James wrote:
>
>> I have a function grabbing a page and I'm pulling text out of the
>> result. I can get the line:
>>
>> lists:nth(424,B).
>> <<"<B>Page Counter</B></TD><TD>4880</TD></TR>">>
>>
>>
>> but 4880 will eventually get to 10000, etc.
>
> It's not clear exactly how much else about the data will
> vary. My take on this is that you want the stuff between
> <TD> and </TD>.
<snip>
Richard,
Thanks for your input on this. I tested it and it worked. I messed
around with xmerl_scan:string, but
"<B>Page Counter</B></TD><TD>4880</TD></TR>" doesn't seem to be
well formed xml - I kept getting errors.
xmerl_scan:string("<foo>" ++
11> "<myelement myattribute=\"red\">x</myelement>" ++
11> "<myelement myattribute=\"blue\">x</myelement>" ++
11> "<myelement myattribute=\"blue\">y</myelement>" ++
11> "</foo>").
works, but
xmerl_scan:string("<B>Page Counter</B></TD><TD>4880</TD></TR>").
2711- fatal: {unknown_entity_ref,nbsp}
2621- fatal: error_scanning_entity_ref
** exception exit: {fatal,{error_scanning_entity_ref,{file,file_name_unknown},
{line,1},
{col,10}}}
in function xmerl_scan:fatal/2
in call from xmerl_scan:scan_content/11
in call from xmerl_scan:scan_element/12
in call from xmerl_scan:scan_document/2
in call from xmerl_scan:string/2
not....
-wes
More information about the erlang-questions
mailing list