string performance

tmb@REDACTED tmb@REDACTED
Wed Sep 22 08:39:40 CEST 1999


Hi,

thanks for the response.  I'm glad to see that binaries and deep lists
are available for speeding things up.  I like the lightweight, dynamic,
and distributed character of Erlang; the string processing seems like
the weakest point right now to me, in particular given the profusion
of text-based Internet protocols and standards (HTTP, XML, POP, SMTP,
etc.).

Maybe it's just because I don't have much experience with it, but the
problem I see with relying on deep lists of binaries is that it may
make code harder to maintain and debug.  It certainly makes it harder
to explain to a new Erlang programmer how to do string processing.

> There is talk of a string syntax, but more importantly (I think) is the
> upcoming bit syntax, which will allow you to manipulate binaries directly,
> including pattern matching of binary data.

I think that kind of syntax would be great.  I don't think, however,
that using the same data type for strings is such a good idea:  if
the distinction between binary and text isn't made early on in the
evolution of a language, it's very difficult to retrofit code later, in
particular when issues like UNICODE support come up.  Also, binary and
text should behave (print) differently in an interactive development
environment.  It seems to me that the syntax, implementation, and
functionality for "binary" and "text" could be almost the same; but
carrying around one extra bit of type information to distinguish
the two cases would seem very useful to me.

In addition to the syntax, there is also the question of what a
good underlying representation of strings should be in a mostly
functional language.  Trees of chunks, lists of blocks, and
pointer pairs (start/end) into a buffer that's extensible at both
ends but otherwise unmodifiable are all possibilities.

Before the language changes, I'm wondering: for getting things done
right now, are there any efficient and powerful string packages around
that can deal with large amounts of variable text and handle things
like string substitutions?  Packages that avoid converting back and
forth between various list types and binaries?

Cheers,
Thomas.



More information about the erlang-questions mailing list