[erlang-questions] beginner: Generating HTML with ">" from Erlang

Fri Feb 14 03:02:37 CET 2014

On 14/02/2014, at 3:31 AM, Richard Carlsson wrote:
> Out of curiosity, if it had been < instead, which of the following would work?
> 
>  if (i < 0) {

That should work in XHTML but not HTML.
> 
>  if (i < 0) {

That should work in HTML but not XHTML.

XHTML is an application of XML.  It declares
	<!ELEMENT script (#PCDATA)>
and we have
	[14] CharData ::= [^<&]* - ([*<&]* ']]>' [^<&]*)

That is, a chunk of character data is any run of characters
not containing '<' or '&' or ']]>'.

	The ampersand character (&) and the left angle
	bracket (<) MUST NOT appear in their literal form,
	except when used as markup delimiters, or within
	a comment, a processing instruction, or a CDATA
	section.  If they are needed elsewhere, they
	MUST be escaped using either numeric character
	references or the strings "&" and "<"
	respectively. The right angle bracket (>) may be
	represented using the string ">", and MUST, for
	compatibility, be escaped using either ">" or a
	character reference when it appears in the string
	"]]>" in content, when that string is not marking
	the end of a CDATA section.

#PCDATA may also contain entity references (<),
character references (<), comments, 

> 
> If it is the first case, there is presumably a very specific rule for this, 

The legality of "i < 0" in XHTML falls out of general rules
and the content model of the <script> element.

As far as HTML is concerned, it's not illegal, but HTML
will pass the '<' on verbatim to Javascript, which doesn't
like it.

> If it's the second case, how is the script text really supposed to be handled by XML tools? As CDATA (then, how is it delimited?)

XML has <![CDATA[...]]> *marked sections*, but it
does NOT have CDATA *content models*.

> or as normal XML text (and then how can the < be accepted by the parser,

In HTML, a "<" character followed by white space is perfectly legal;
in XML, it is not.

> and why wasn't > converted to > before the Javascript parser got hold of the text)?

Possibly because the web browser got it wrong.

CDATA and RCDATA content models in SGML were broken by design.
Such an element beginning with <foo> should only have been
terminated by </foo>, but they're terminated by *any* '</'
followed by any of > ( or letter.
It had already been explained extremely clearly *before* the
<SCRIPT> element was added to HTML that the content model
should have been (#PCDATA) using <![CDATA[ sections for quoting.