[erlang-questions] beginner: Generating HTML with ">" from Erlang

Richard Carlsson carlsson.richard@REDACTED
Fri Feb 14 11:42:47 CET 2014


Thank you. So, to summarize, the interpretation of what's between the 
<script> and </script> tags depends on whether you parse the text as XML 
or HTML. Some more googling shows that the following is a common idiom 
to allow the text to be parsed in both ways:

     <script type="text/javascript">
     // <![CDATA[
     ... if (i < 0) { ...
     // ]]>
     </script>

The // keeps Javascript from seeing the CDATA start/end markers, HTML 
ignores them, and XML removes them and passes on everything in between 
as it is.

I think the #xmlText{} record in xmerl (see xmerl.hrl) can be used here 
to wrap the contents of the <script> element so it gets written verbatim 
when you export the XML data.

     /Richard Carlsson

On 2014-02-14 03:02 , Richard A. O'Keefe wrote:
>
> On 14/02/2014, at 3:31 AM, Richard Carlsson wrote:
>> Out of curiosity, if it had been < instead, which of the following would work?
>>
>>   if (i < 0) {
>
> That should work in XHTML but not HTML.
>>
>>   if (i < 0) {
>
> That should work in HTML but not XHTML.
>
> XHTML is an application of XML.  It declares
> 	<!ELEMENT script (#PCDATA)>
> and we have
> 	[14] CharData ::= [^<&]* - ([*<&]* ']]>' [^<&]*)
>
> That is, a chunk of character data is any run of characters
> not containing '<' or '&' or ']]>'.
>
> 	The ampersand character (&) and the left angle
> 	bracket (<) MUST NOT appear in their literal form,
> 	except when used as markup delimiters, or within
> 	a comment, a processing instruction, or a CDATA
> 	section.  If they are needed elsewhere, they
> 	MUST be escaped using either numeric character
> 	references or the strings "&" and "<"
> 	respectively. The right angle bracket (>) may be
> 	represented using the string ">", and MUST, for
> 	compatibility, be escaped using either ">" or a
> 	character reference when it appears in the string
> 	"]]>" in content, when that string is not marking
> 	the end of a CDATA section.
>
> #PCDATA may also contain entity references (<),
> character references (<), comments,
>
>>
>> If it is the first case, there is presumably a very specific rule for this,
>
> The legality of "i < 0" in XHTML falls out of general rules
> and the content model of the <script> element.
>
> As far as HTML is concerned, it's not illegal, but HTML
> will pass the '<' on verbatim to Javascript, which doesn't
> like it.
>
>> If it's the second case, how is the script text really supposed to be handled by XML tools? As CDATA (then, how is it delimited?)
>
> XML has <![CDATA[...]]> *marked sections*, but it
> does NOT have CDATA *content models*.
>
>> or as normal XML text (and then how can the < be accepted by the parser,
>
> In HTML, a "<" character followed by white space is perfectly legal;
> in XML, it is not.
>
>> and why wasn't > converted to > before the Javascript parser got hold of the text)?
>
> Possibly because the web browser got it wrong.
>
> CDATA and RCDATA content models in SGML were broken by design.
> Such an element beginning with <foo> should only have been
> terminated by </foo>, but they're terminated by *any* '</'
> followed by any of > ( or letter.
> It had already been explained extremely clearly *before* the
> <SCRIPT> element was added to HTML that the content model
> should have been (#PCDATA) using <![CDATA[ sections for quoting.
>
>




More information about the erlang-questions mailing list