Announcing Dryverl: an Erlang-to-C binding compiler

Mon May 29 07:30:14 CEST 2006

Hi,

Richard A. O'Keefe wrote:
> Romain Lenglet <rlenglet@REDACTED> wrote:
> 	XSLT is a nice standard language for writing transformations.
>
> It is a language.  True.
> It is a language for writing transformations.
> It is not a de jure standard, but it's true that it is a de
> facto one. "Nice", however, doesn't really fit in that
> sentence. Not without some emphatic negatives attached rather
> closely.
>
> I described XML as absurdly bulky and not humanly readable.
> The same could be said of XSLT, if one wished to demonstrate
> the art of underemphasis.

I already agreed about the verbosity of XML.

> XSLT is an unholy mix of at least three different syntaxes,
> not just different in detail, but working on different
> principles.  It includes two different subsets of XPath, with
> not-quite-compatible semantics.
>
> I have half a dozen XSLT implementations on my machine.
> But I found that writing XML transformations in Scheme was
> SEVEN TIMES SHORTER.  Heck, even writing transformations in
> *C* was about five times shorter.  (Mind you, that's with my
> own dvm2 library, but still...) About the only thing that
> could make me write XSLT would be very large sums of money, or
> possibly a gun to my head.

I also know how to write nice transformation systems.  However, my
objective with Dryverl was not to demonstrate my skills.  My objective
with Dryverl was to develop something that anybody can broadly use and
modify, even if I ever orphan this project.  People will never be
locked up with a dead transformation system that is not, or cannot be,
maintained.  That is exactly what happened with EDTK: it relies on the
GSLgen library, which is no more maintained, and this makes the
maintainance of *EDTK* difficult.  If I used your dvm2 library, how
could I make sure that you would maintain it as long as the Dryverl
project will exist?

As you write yourself, there are many implementations of XSLT
1.0. That was one main reason why I chose it.

I agree that XSLT 1.0 has problems.  The biggest problem is the lack
of typing.  If you specify a wrong XPath expression in the match
attribute of a template, the template will simply never be evaluated,
silently.  This makes testing difficult.  There are other, typed
transformation systems for XML around:

- Cduce. Cduce transformations are strongly typed according to an XML
  Schema. However, Cduce seems no more maintained, and seems
  appropriate only for fully structured XML documents, not documents
  with mostly mixed content (CDATA + elements mixed) such as Dryverl.

- erlsom. Not yet mature, and does not yet implement all the XML
  Schema features that are used in Dryverl.

- XSLT 2.0 specifies conformance rules for schema-aware processors,
  which solves the lack of typing of XSLT 1.0, and makes XML Schema's
  data model the only one used.  However, XSLT 2.0 is not yet as
  widespread as XSLT 1.0, although Free Software schema-aware
  processors exist (e.g. Saxon).

Therefore, I still claim that XSLT 1.0 is a nice *widespread* and
usable transformation language, for which several transformation
systems exist and are actively maintained.  People can use, read and
modify the Dryverl stylesheets (although it is verbose), *now* and at
any time in the future.

[...]
> When I said that Erlang had better ways than using XML, I
> wasn't confining myself to pure Erlang data structures.  The
> design of little languages is an art that should not be lost
> just because XML exists.  The great merit of little languages
> is that they can help us three major ways:
>
>     (1) They can make it much easier to write what we do want.
>         (1a) This requires them to be easy to WRITE, so that
> there is very little there that is not necessary. (1b) It also
> requires them to be easy to READ, so that it is easy for
> people to see whether we have written what we meant to write.
>     (2) They can make it much harder to write what we do NOT
> want. (2a) They can make large classes of mistakes simply
> inexpressible. (2b) They can have specially tailored
> polymorphic type systems (possibly including effects or state
> in the type system) which make it easy to check for semantic
> errors. (3) They can decouple semantics from processing.
>
> As far as I can see, doing this kind of stuff in XML buys you
> (3). *POSSIBLY*, if it were all done using markup in such a
> way that a Schema could check a lot of it, it could buy you
> some of (2a).

That is true for Dryverl.  Cf. the typing example below.

> But XML Schemas don't buy you any of (2b),

That is plainly WRONG. This is one big feature of XML Schema: it
provides typing of syntactic elements.  And it supports relational
constraints.

For instance, here is an extract of the Dryverl XML Schema:

<xsd:key name="keyValueMapNames">
  <xsd:selector xpath="./dryverl:def-value-map"/>
  <xsd:field xpath="@name"/>
</xsd:key>

<xsd:keyref name="keyrefValueMapNames" refer="dryverl:keyValueMapNames">
  <xsd:selector
    xpath=".//dryverl:value-map-lock|.//dryverl:value-map-unlock|.//dryverl:value-map-find-entry"/>
  <xsd:field xpath="@name"/>
</xsd:keyref>

This specifies that:

1- Value map names must be unique, i.e. the name="..." attributes in
<def-value-map> elements must be unique.

2- References to value maps must use names that are specified in the
name="..." attributes of <def-value-map> elements.  For instance, the
name="..." attribute in a <value-map-lock> element must be a name in a
<dev-value-map> element.

*That* "makes it easy to check for semantic errors".

> and
> not only does XML not help with (1a) or (1b), it actively
> sabotages any gains we might have had there.

Wrong. The use of XML Schema provides (2b) and (3), and (2a) is true
for the Dryverl language.

What is left from your critics is that the use of XML makes the
Dryverl language verbose.  And yes, I agree, that's true.

> 	This fragment of C-XML dialect code is a call to a C GSSAPI
> 	function. There is as much C code text as XML elements. This
> 	fits XML and XSLT very well.
>
> Well, no, because XSLT isn't particularly good at transforming
> text.

Yes, but XSLT is good at transforming mixed content (text + markup
mixed), using recursive templates and <apply-templates> elements.
*This* was my point.

> A good way to support little languages in Erlang is through
> Leex and Yecc.

Perhaps, but they are not appropriate *to the Dryverl language*, for
two reasons:

1- Dryverl is in fact composed of *six* "sub-"languages:

- One structured, declarative language for specifying value maps, etc.
  Yes, this simple language could be specified using Leex and Yecc.

- *Two* distinct "Erlang + markup" dialects.  Using Yecc, you would
   have to specify *twice* the complete Erlang grammar + extensions.

- *Three* distinct "C + markup" dialects.  Using Yecc, you would have
  to specify *thrice* the complete C grammar (!) + extensions.

Therefore, Dryverl is no more a "little language" once you use tools
such as Leex and Yecc.  Those tools are not appropriate for the
specification of the Dryverl language.

2- Yecc is limited to the specification of context-free grammars.  The
grammar of Dryverl is not context-free, because the specified
restrictions cannot be expressed using a context-free grammar, but can
be using XML Schema's typing.

I am not aware of other specification languages that provide typing
like XML Schema and ASN.1.  This is a strong feature.  And it is not
provided by Yecc.

For instance, let's consider that simplified extract from the Dryverl
XML Schema:

<xsd:complexType name="DecodeInputCCode" mixed="true">
  <xsd:choice minOccurs="0" maxOccurs="unbounded">
    <xsd:element name="decode-input-long-into"
                 type="dryverl:DecodeInputCCode" />
    <xsd:element name="value-map-find-entry">
      <xsd:complexType>
        <xsd:sequence>
          <xsd:element name="value-map-entry-id-ref"
                       type="dryverl:DecodeInputCCode" />
        </xsd:sequence>
      </xsd:complexType>
    </xsd:element>
  </xsd:choice>
</xsd:complexType>

<xsd:complexType name="EncodeOutputCCode" mixed="true">
  <xsd:choice minOccurs="0" maxOccurs="unbounded">
    <xsd:element name="encode-output-long"
                 type="dryverl:EncodeOutputCCode" />
    <xsd:element name="value-map-find-entry">
      <xsd:complexType>
        <xsd:sequence>
          <xsd:element name="value-map-entry-id-ref"
                       type="dryverl:EncodeOutputCCode" />
        </xsd:sequence>
      </xsd:complexType>
    </xsd:element>
  </xsd:choice>
</xsd:complexType>

Those are extracts for two of the C + XML dialects that are defined in
the Dryverl language.  Some elements have the same name in the two
dialects: you can remark that there are two definitions of elements
with the same name (<value-map-find-entry> and
<value-map-entry-id-ref>), but in different contexts, and with
different types that define *different constraints on their contents*.

These constraints specify that it is allowed to write:

<decode-input>
  <value-map-find-entry>
    <value-map-entry-id-ref>
      <!-- decoding statements are OK -->
      <decode-input-long-into>...</decode-input-long-into>
    </value-map-entry-id-ref>
  </value-map-find-entry>
</decode-input>

and it is allowed to write:

<encode-output>
  <value-map-find-entry>
    <value-map-entry-id-ref>
      <!-- encoding statements are OK -->
      <encode-output-long>...</encode-output-long>
    </value-map-entry-id-ref>
  </value-map-find-entry>
</encode-output>

but it is forbidden to write:

<decode-input>
  <value-map-find-entry>
    <value-map-entry-id-ref>
      <!-- encoding statements are forbidden in decode-input -->
      <encode-output-long>...</encode-output-long>
    </value-map-entry-id-ref>
  </value-map-find-entry>
</decode-input>

This cannot be specified in a context-free grammar such as Yecc's.

The two solutions that approach this, using context-free grammars,
are:

1- to loose semantics, by specifying one single non-terminal for C
code blocks and for <value-map-entry-id-ref> and
<value-map-entry-id-ref>, which would be equivalent to specifying this
looser XML Schema:

<xsd:complexType name="CCode" mixed="true">
  <xsd:choice minOccurs="0" maxOccurs="unbounded">
    <xsd:element name="decode-input-long-into"
                 type="dryverl:CCode" />
    <xsd:element name="encode-output-long"
                 type="dryverl:CCode" />
    <xsd:element name="value-map-find-entry">
      <xsd:complexType>
        <xsd:sequence>
          <xsd:element name="value-map-entry-id-ref"
                       type="dryverl:Code" />
        </xsd:sequence>
      </xsd:complexType>
    </xsd:element>
  </xsd:choice>
</xsd:complexType>

Wrong constructs are allowed by such a schema, e.g.:

<decode-input>
  <value-map-find-entry>
    <value-map-entry-id-ref>
      <!-- encoding statements are semantically forbidden,
           but this cannot be checked by the parser -->
      <encode-output-long>...</encode-output-long>
    </value-map-entry-id-ref>
  </value-map-find-entry>
</decode-input>

You're loosing points (2a) and (2b).

2- by using different lexical tokens for the elements with different
types, which would be equivalent to always giving different names to
elements that have different types (e.g. add "di-" and "eo-"
prefixes):

<xsd:complexType name="DecodeInputCCode" mixed="true">
  <xsd:choice minOccurs="0" maxOccurs="unbounded">
    <xsd:element name="decode-input-long-into"
                 type="dryverl:DecodeInputCCode" />
    <xsd:element name="di-value-map-find-entry">
      <xsd:complexType>
        <xsd:sequence>
          <xsd:element name="di-value-map-entry-id-ref"
                       type="dryverl:DecodeInputCCode" />
        </xsd:sequence>
      </xsd:complexType>
    </xsd:element>
  </xsd:choice>
</xsd:complexType>

<xsd:complexType name="EncodeOutputCCode" mixed="true">
  <xsd:choice minOccurs="0" maxOccurs="unbounded">
    <xsd:element name="encode-output-long"
                 type="dryverl:EncodeOutputCCode" />
    <xsd:element name="eo-value-map-find-entry">
      <xsd:complexType>
        <xsd:sequence>
          <xsd:element name="eo-value-map-entry-id-ref"
                       type="dryverl:EncodeOutputCCode" />
        </xsd:sequence>
      </xsd:complexType>
    </xsd:element>
  </xsd:choice>
</xsd:complexType>

These constraints specify that it is allowed to write:

<decode-input>
  <di-value-map-find-entry>
    <di-value-map-entry-id-ref>
      <decode-input-long-into>...</decode-input-long-into>
    </di-value-map-entry-id-ref>
  </di-value-map-find-entry>
</decode-input>

and it is allowed to write:

<encode-output>
  <eo-value-map-find-entry>
    <eo-value-map-entry-id-ref>
      <encode-output-long>...</encode-output-long>
    </eo-value-map-entry-id-ref>
  </eo-value-map-find-entry>
</encode-output>

and it is forbidden to write:

<decode-input>
  <di-value-map-find-entry>
    <di-value-map-entry-id-ref>
      <!-- encoding statements are forbidden in decode-input -->
      <encode-output-long>...</encode-output-long>
    </di-value-map-entry-id-ref>
  </di-value-map-find-entry>
</decode-input>

Problem: that is no more the same language!  And it makes the language
artificially more complex only because of limitations in the
implementation of your parser: developers have to remember the
different lexical variants of elements, depending on the context.

Regards,

-- 
Romain LENGLET