get error when using xmerl to parse a html string

Richard O'Keefe raoknz@REDACTED
Wed Apr 22 12:59:42 CEST 2020


"nbsp" is not predefined in XML.
Other XML parsers will also complain if you try to use an entity that is not
declared in the DTD.
HTML *does* declare a lot of character entities in its DTD.
You need an HTML parser.

On Wed, 22 Apr 2020 at 21:10, L yrosgi <absente@REDACTED> wrote:
>
> I checked the error report and found that it can not parse the text which contains escape character such like "&nbsp". After googling for a while, I still can not find any solution by using xmerl lib. Is that means xmerl can not do this kind of parsing, or there is other built-in solution to parse html?
>
> Thanks for any replys.
>
> below is the error messages:
>
> =ERROR REPORT==== 22-Apr-2020::12:16:45.003000 ===
> 2868- fatal: {unknown_entity_ref,nbsp}
>
> =ERROR REPORT==== 22-Apr-2020::12:16:45.003000 ===
> 2778- fatal: error_scanning_entity_ref
>
> escript: exception exit: {fatal,
>                     {error_scanning_entity_ref,
>                         {file,file_name_unknown},
>                         {line,126},
>                         {col,60}}}
>   in function  xmerl_scan:fatal/2 (xmerl_scan.erl, line 4124)
>   in call from xmerl_scan:scan_content/11 (xmerl_scan.erl, line 2579)
>   in call from xmerl_scan:scan_element/12 (xmerl_scan.erl, line 2133)
>   in call from xmerl_scan:scan_content/11 (xmerl_scan.erl, line 2605)
>   in call from xmerl_scan:scan_element/12 (xmerl_scan.erl, line 2133)
>   in call from xmerl_scan:scan_content/11 (xmerl_scan.erl, line 2605)
>   in call from xmerl_scan:scan_element/12 (xmerl_scan.erl, line 2133)
>   in call from xmerl_scan:scan_content/11 (xmerl_scan.erl, line 2605)
>


More information about the erlang-questions mailing list