get error when using xmerl to parse a html string
Marc Worrell
marc@REDACTED
Wed Apr 22 12:28:12 CEST 2020
HTML is not XML
You can use the HTML parser (and sanitizer) in zotonic_stdlib
https://github.com/zotonic/z_stdlib/tree/master/src <https://github.com/zotonic/z_stdlib/tree/master/src>
Check z_html.erl en z_html_parse.erl
The parser is an adapted version of the parser in mochiweb.
Cheers,
Marc
> On 22 Apr 2020, at 11:06, L yrosgi <absente@REDACTED> wrote:
>
> I checked the error report and found that it can not parse the text which contains escape character such like " ". After googling for a while, I still can not find any solution by using xmerl lib. Is that means xmerl can not do this kind of parsing, or there is other built-in solution to parse html?
>
> Thanks for any replys.
>
> below is the error messages:
>
> =ERROR REPORT==== 22-Apr-2020::12:16:45.003000 ===
> 2868- fatal: {unknown_entity_ref,nbsp}
>
> =ERROR REPORT==== 22-Apr-2020::12:16:45.003000 ===
> 2778- fatal: error_scanning_entity_ref
>
> escript: exception exit: {fatal,
> {error_scanning_entity_ref,
> {file,file_name_unknown},
> {line,126},
> {col,60}}}
> in function xmerl_scan:fatal/2 (xmerl_scan.erl, line 4124)
> in call from xmerl_scan:scan_content/11 (xmerl_scan.erl, line 2579)
> in call from xmerl_scan:scan_element/12 (xmerl_scan.erl, line 2133)
> in call from xmerl_scan:scan_content/11 (xmerl_scan.erl, line 2605)
> in call from xmerl_scan:scan_element/12 (xmerl_scan.erl, line 2133)
> in call from xmerl_scan:scan_content/11 (xmerl_scan.erl, line 2605)
> in call from xmerl_scan:scan_element/12 (xmerl_scan.erl, line 2133)
> in call from xmerl_scan:scan_content/11 (xmerl_scan.erl, line 2605)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20200422/bc666b52/attachment.htm>
More information about the erlang-questions
mailing list