get error when using xmerl to parse a html string

L yrosgi absente@REDACTED
Wed Apr 22 11:06:51 CEST 2020


I checked the error report and found that it can not parse the text which contains escape character such like "&nbsp". After googling for a while, I still can not find any solution by using xmerl lib. Is that means xmerl can not do this kind of parsing, or there is other built-in solution to parse html?

Thanks for any replys.

below is the error messages:

=ERROR REPORT==== 22-Apr-2020::12:16:45.003000 ===
2868- fatal: {unknown_entity_ref,nbsp}

=ERROR REPORT==== 22-Apr-2020::12:16:45.003000 ===
2778- fatal: error_scanning_entity_ref

escript: exception exit: {fatal,
                    {error_scanning_entity_ref,
                        {file,file_name_unknown},
                        {line,126},
                        {col,60}}}
  in function  xmerl_scan:fatal/2 (xmerl_scan.erl, line 4124)
  in call from xmerl_scan:scan_content/11 (xmerl_scan.erl, line 2579)
  in call from xmerl_scan:scan_element/12 (xmerl_scan.erl, line 2133)
  in call from xmerl_scan:scan_content/11 (xmerl_scan.erl, line 2605)
  in call from xmerl_scan:scan_element/12 (xmerl_scan.erl, line 2133)
  in call from xmerl_scan:scan_content/11 (xmerl_scan.erl, line 2605)
  in call from xmerl_scan:scan_element/12 (xmerl_scan.erl, line 2133)
  in call from xmerl_scan:scan_content/11 (xmerl_scan.erl, line 2605)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20200422/1b145b7d/attachment.htm>


More information about the erlang-questions mailing list