[erlang-questions] 700% speedup

dda <>
Sat Jun 23 22:02:29 CEST 2007


That would be me.

I was a little nonplussed by your doubts and your [implied] conclusion
that if you failed to do it, then it was impossible. What you only
proved is that fiddling a couple of hours with XML parsers in Erlang
failed. Sure, been there done that, 6 months ago. Then I switched to
other options.

I haven't explained in depth how I did it, since this is going into a
commercial application, and I am not at liberty to expose the innards
of the code, but I can tell you this much: I didn't use an xml parser
– existing or else – for this [neither did I in the version used
currently by Dot-Tunes]. I wrote a specific parser for this file
format. Since the iTunes Libray XML file is machine-produced, the
format is extremely regular, and an XML parser has way too much
overhead for this task, which is quite simple, really, albeit time
consuming.

I was myself surprised not only by the speed improvements, but also by
the non-linearity of the performance. This is probably a sign that
there's room for improvement in my code, but my client deemed the
perfs good enough for now on the 50,000-record file. And when the
client's happy, the coder's happy too.

Many months ago I had asked a question about Elang and sqlite, and it
was related to this problem. Since sqlite is not suited to
multi-threaded tasks, I had to split the process into producing first
the sql, and then dump it into an sqlite db [Dot-Tunes uses sqlite as
a backend, so I had no choice in the matter].

I wish I could show more, but then again I care more about my client's
satisfaction then grumbles emitted on a mailing list.

Cheers.

-- 
dda aka Didier


On 6/22/07, Willem de Jong <> wrote:
>
>
> It is a strange sory. The author claims to have achieved very good results
> using Erlang to parse a very big (35Mbyte) XML file (an Itunes Music Library
> file). He suggests that he uses lots of processes to do this.
>
> It made me curious, and I decided to do some tests.  I used my 1.7 GHz
> laptop with 1GB of memory, running Windows XP.
>
> - Parsing an Itunes file of 4Mbyte takes about 4 seconds with the SAX parser
> that is the basis of Erlsom (if you let the callback function do something
> trivial).
>
> - Parsing the file with Erlsom (which validates it against an XSD and
> translates it to records) takes about 5 seconds.
>
> - Parsing the file with Xmerl takes about 8 seconds.
>
> I found an article on parsing the Itunes library using mono
> http://www.xml.com/pub/a/2004/11/03/itunes.html). On an
> 800MHz powerbook parsing a 2.5Mbyte file apparently took 9 seconds, so I
> would say that Erlang doesn't look bad.
>
> Surprisingly, loading the file into Microsoft Internet Explorer takes more
> than a minute...
>
> If things would scale lineary, parsing the 35Mbyte file should take about 40
> to 80 seconds, which is about twice as fast as what the author of the blog
> claims to have achieved (on another machine, obviously, so comparing these
> figures may not make a lot of sense).
>
> Unfortunately, these tests fail miserably - Erlang crashes. On my machine I
> cannot translate a file (binary) of this size to a list. I have to say that
> I was a bit disappointed... Is there a way to fix this?
>
> Willem.
>
>
> On 6/20/07, Brad Anderson <> wrote:
> > I came across this blog today...
> >
> > http://www.sungnyemun.org/wordpress/?p=323
> >
> > BA
> > _______________________________________________
> > erlang-questions mailing list
> > 
> > http://www.erlang.org/mailman/listinfo/erlang-questions
> >
>
>
> _______________________________________________
> erlang-questions mailing list
> 
> http://www.erlang.org/mailman/listinfo/erlang-questions
>


More information about the erlang-questions mailing list