XML reading and writing

Fri Jul 21 18:00:22 CEST 2000

On 21 Jul 2000, Mickael Remond wrote:

>>>>>> "Luc" == Luc Taesch <ltaesch@REDACTED> writes:
>
>    > Id like to parse (basic) xml files, . are there something usable , like
>    > a grammar for yecc, or even some lib available (so i can learn the way
>    > its done in the erlang spirit) ?
>    > and some examples ?
>
>To parse/read XML files, you have :
>
>www.bluetail.com/~joe/xml/xml.html 
>
>This is a good program (validating parser) The only thing I need is
>to modify this program to make it able to downgrade to a non
>validating parser when no DTD is available (stand-alone XML files).
>    > and to write xml files ?
>
>I have nothin special, but this is relativly simple to do to walk your data
>structure and to write it as an XML file.
>Look at the existing xml program. Look at the data structure it returns.
>It should be easy to inverse the process and write the XML.
>(If I have time I should try doing something to handle XML writing).

No need to keep things secret until the conference. This is a huge
area, and I'll probably need all the help I can get.  ;)

I have a non-validating(*) XML parser, which seems to work fairly
well, but I'm reluctant to spread it at the moment, because I'm still
making changes off and on. I will need some guinea pigs in a week or
two. Any volunteers?

(*) It does cheat and impose some validity constraints. I will remove
those checks.

My XML parser adheres closely to the XML 1.0 spec, as far as I've been
able to tell -- it handles:

- UniCode characters (although transfer decoding is not integrated yet
  and proper UniCode string matching is not done)
- XML namespaces, with inheritance according to the specs (Name 
  expansion is not done yet)
- Language codes (also with proper inheritance)
- DTD parsing, Conditional includes (I've yet to implement actual
  fetching of external DTDs, but I feel that that should be an
  add-on, so I've implemented hooks for it.)

It is also done to support both event-based and tree-based parsing,
and is capable of pausing and waiting for data during the parse.
The default behaviour is a tree-based parse.

I am also almost done with XPATH, and I have an approach for
generating output, e.g. XML (easy) or, say, HTML, from an Erlang
structure. I've tried to make the parser extract enough information to
make the XPATH search engine efficient (things like element position,
namespace prefixes, language codes, ancestry.)

I'm hoping to get to stylesheet support before I go on parental leave
in August. Perhaps someone would like to take a stab at e.g. DTD- or
namespace caching, or perhaps XPointer using my framework?

If any of you feel that you have some concrete problems that might be
good for some beta testing, please let me know.

/Uffe
-- 
Ulf Wiger                                    tfn: +46  8 719 81 95
Network Architecture & Product Strategies    mob: +46 70 519 81 95
Ericsson Telecom AB,              Datacom Networks and IP Services
Varuvägen 9, Älvsjö,                    S-126 25 Stockholm, Sweden