[erlang-questions] beginner: Generating HTML with ">" from Erlang

Fri Feb 14 13:55:18 CET 2014

Greetings,

Sorry about the delay.
Have I misunderstood that #xmlText{} would help to preserve "<" when
doing xmerl:export_simple/2?

I tried this input:
{script, [], [#xmlText{value=Script}]}

where script has some
if (i < 0) { ...

It ends up in the HTML as 
if (i i< 0) { ...

bengt

On Fri, 2014-02-14 at 11:42 +0100, Richard Carlsson wrote:
> Thank you. So, to summarize, the interpretation of what's between the 
> <script> and </script> tags depends on whether you parse the text as XML 
> or HTML. Some more googling shows that the following is a common idiom 
> to allow the text to be parsed in both ways:
> 
>      <script type="text/javascript">
>      // <![CDATA[
>      ... if (i < 0) { ...
>      // ]]>
>      </script>
> 
> The // keeps Javascript from seeing the CDATA start/end markers, HTML 
> ignores them, and XML removes them and passes on everything in between 
> as it is.
> 
> I think the #xmlText{} record in xmerl (see xmerl.hrl) can be used here 
> to wrap the contents of the <script> element so it gets written verbatim 
> when you export the XML data.
> 
>      /Richard Carlsson
> 
> On 2014-02-14 03:02 , Richard A. O'Keefe wrote:
> >
> > On 14/02/2014, at 3:31 AM, Richard Carlsson wrote:
> >> Out of curiosity, if it had been < instead, which of the following would work?
> >>
> >>   if (i < 0) {
> >
> > That should work in XHTML but not HTML.
> >>
> >>   if (i < 0) {
> >
> > That should work in HTML but not XHTML.
> >
> > XHTML is an application of XML.  It declares
> > 	<!ELEMENT script (#PCDATA)>
> > and we have
> > 	[14] CharData ::= [^<&]* - ([*<&]* ']]>' [^<&]*)
> >
> > That is, a chunk of character data is any run of characters
> > not containing '<' or '&' or ']]>'.
> >
> > 	The ampersand character (&) and the left angle
> > 	bracket (<) MUST NOT appear in their literal form,
> > 	except when used as markup delimiters, or within
> > 	a comment, a processing instruction, or a CDATA
> > 	section.  If they are needed elsewhere, they
> > 	MUST be escaped using either numeric character
> > 	references or the strings "&" and "<"
> > 	respectively. The right angle bracket (>) may be
> > 	represented using the string ">", and MUST, for
> > 	compatibility, be escaped using either ">" or a
> > 	character reference when it appears in the string
> > 	"]]>" in content, when that string is not marking
> > 	the end of a CDATA section.
> >
> > #PCDATA may also contain entity references (<),
> > character references (<), comments,
> >
> >>
> >> If it is the first case, there is presumably a very specific rule for this,
> >
> > The legality of "i < 0" in XHTML falls out of general rules
> > and the content model of the <script> element.
> >
> > As far as HTML is concerned, it's not illegal, but HTML
> > will pass the '<' on verbatim to Javascript, which doesn't
> > like it.
> >
> >> If it's the second case, how is the script text really supposed to be handled by XML tools? As CDATA (then, how is it delimited?)
> >
> > XML has <![CDATA[...]]> *marked sections*, but it
> > does NOT have CDATA *content models*.
> >
> >> or as normal XML text (and then how can the < be accepted by the parser,
> >
> > In HTML, a "<" character followed by white space is perfectly legal;
> > in XML, it is not.
> >
> >> and why wasn't > converted to > before the Javascript parser got hold of the text)?
> >
> > Possibly because the web browser got it wrong.
> >
> > CDATA and RCDATA content models in SGML were broken by design.
> > Such an element beginning with <foo> should only have been
> > terminated by </foo>, but they're terminated by *any* '</'
> > followed by any of > ( or letter.
> > It had already been explained extremely clearly *before* the
> > <SCRIPT> element was added to HTML that the content model
> > should have been (#PCDATA) using <![CDATA[ sections for quoting.
> >
> >
>