[erlang-questions] beginner: Generating HTML with ">" from Erlang
Richard Carlsson
carlsson.richard@REDACTED
Fri Feb 14 11:42:47 CET 2014
Thank you. So, to summarize, the interpretation of what's between the
<script> and </script> tags depends on whether you parse the text as XML
or HTML. Some more googling shows that the following is a common idiom
to allow the text to be parsed in both ways:
<script type="text/javascript">
// <![CDATA[
... if (i < 0) { ...
// ]]>
</script>
The // keeps Javascript from seeing the CDATA start/end markers, HTML
ignores them, and XML removes them and passes on everything in between
as it is.
I think the #xmlText{} record in xmerl (see xmerl.hrl) can be used here
to wrap the contents of the <script> element so it gets written verbatim
when you export the XML data.
/Richard Carlsson
On 2014-02-14 03:02 , Richard A. O'Keefe wrote:
>
> On 14/02/2014, at 3:31 AM, Richard Carlsson wrote:
>> Out of curiosity, if it had been < instead, which of the following would work?
>>
>> if (i < 0) {
>
> That should work in XHTML but not HTML.
>>
>> if (i < 0) {
>
> That should work in HTML but not XHTML.
>
> XHTML is an application of XML. It declares
> <!ELEMENT script (#PCDATA)>
> and we have
> [14] CharData ::= [^<&]* - ([*<&]* ']]>' [^<&]*)
>
> That is, a chunk of character data is any run of characters
> not containing '<' or '&' or ']]>'.
>
> The ampersand character (&) and the left angle
> bracket (<) MUST NOT appear in their literal form,
> except when used as markup delimiters, or within
> a comment, a processing instruction, or a CDATA
> section. If they are needed elsewhere, they
> MUST be escaped using either numeric character
> references or the strings "&" and "<"
> respectively. The right angle bracket (>) may be
> represented using the string ">", and MUST, for
> compatibility, be escaped using either ">" or a
> character reference when it appears in the string
> "]]>" in content, when that string is not marking
> the end of a CDATA section.
>
> #PCDATA may also contain entity references (<),
> character references (<), comments,
>
>>
>> If it is the first case, there is presumably a very specific rule for this,
>
> The legality of "i < 0" in XHTML falls out of general rules
> and the content model of the <script> element.
>
> As far as HTML is concerned, it's not illegal, but HTML
> will pass the '<' on verbatim to Javascript, which doesn't
> like it.
>
>> If it's the second case, how is the script text really supposed to be handled by XML tools? As CDATA (then, how is it delimited?)
>
> XML has <![CDATA[...]]> *marked sections*, but it
> does NOT have CDATA *content models*.
>
>> or as normal XML text (and then how can the < be accepted by the parser,
>
> In HTML, a "<" character followed by white space is perfectly legal;
> in XML, it is not.
>
>> and why wasn't > converted to > before the Javascript parser got hold of the text)?
>
> Possibly because the web browser got it wrong.
>
> CDATA and RCDATA content models in SGML were broken by design.
> Such an element beginning with <foo> should only have been
> terminated by </foo>, but they're terminated by *any* '</'
> followed by any of > ( or letter.
> It had already been explained extremely clearly *before* the
> <SCRIPT> element was added to HTML that the content model
> should have been (#PCDATA) using <![CDATA[ sections for quoting.
>
>
More information about the erlang-questions
mailing list