[erlang-questions] beginner: Generating HTML with ">" from Erlang

Richard A. O'Keefe ok@REDACTED
Fri Feb 14 02:45:09 CET 2014


It's worth noting that the restriction on ]]> in text
makes no sense whatsoever in XML; it exists solely so
that XML documents can be parsed by SGML parsers (at
least ones that conform to TC2 + TC3).  And the
restriction on ]]> in text in SGML made sense only if
you assumed that an SGML parser writer would be dumb
enough to use the *same* tokeniser in too many places.


W3C document, differences between XHTML and HTML:

4.8. Script and Style elements

In XHTML, the script and style elements are declared as having #PCDATA content. As a result, < and & will be treated as the start of markup, and entities such as < and & will be recognized as entity references by the XML processor to < and & respectively. Wrapping the content of the script or style element within a CDATA marked section avoids the expansion of these entities.

<script type="text/javascript">
<![CDATA[
... unescaped script content ...
]]>
</script>

CDATA sections are recognized by the XML processor and appear as nodes in the Document Object Model, see Section 1.3 of the DOM Level 1 Recommendation [DOM].

An alternative is to use external script and style documents.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Erik Naggum waxed vehement against the use of CDATA content
models, and boy was he right.  Joe English pointed out that

	<script>
	  document.write("<h1>", "Foo", "</h1">)
	</script>

is not legal in HTML, because it contains the '</'-followed-by-
a-letter sequence, which will terminate the <script> element
at that point, so you get
	<script>
	  document.write("<h1>", "Foo", "</script>
	</h1>")
	</script>
so you have to write
	  document.write("<h1>", "Foo", "<"+"/h1>")

Did I mention that the rules are subtly and importantly
different in HTML 4 and HTML 5, to the point where a document
can conform to HTML 5 but not HTML 4, not for structural but
for lexical reasons?

This whole area is a MESS and it is precisely the kind of
mess that XML was supposed to get us out of.

If people are going to generate HTML5, we need something
where you can write {raw_text_element,Name,Attributes,Text}
and have it *checked* that the "poison" sequences do not
occur.




More information about the erlang-questions mailing list