string performance

Ulf Wiger etxuwig@REDACTED
Mon Sep 20 16:26:50 CEST 1999


One thing you can do today is to use binaries and deep lists to speed up
processing in web applications.

This has to do with the fact that an Erlang port accepts anything that is a
mix of binaries and byte lists.

Basically, anything that works with erlang:list_to_binary([...]) will work
with a port, and with web applications, everything will eventually go
through a port.

Thus, you can append two strings by writing [String1,String2], and you can
also use erlang:list_to_binary/1 on strings which do not need further
manipulation.

Also, binary_to_list(list_to_binary(DeepList)) can be significantly faster
than lists:flatten(DeepList).

There is talk of a string syntax, but more importantly (I think) is the
upcoming bit syntax, which will allow you to manipulate binaries directly,
including pattern matching of binary data.

The syntax is not set yet, but an early proposal suggested something like:

parse(<"GET ", What/binary, <"\r\n"> | Tail>) ->
    {Fields, Contents} = parse_tail(Tail, empty, []),
    {What, Fields, Contents}.

parse_tail(<B/binary, <"\r\n\r\n"> | Cont>, A) ->
    {Cont, Ack};
parse_tail(<B/binary, <"\r\n"> | Tail>, A) -> 
    parse_tail(Tail, [B|A]).

(Example of parsing the binary data of a HTTP request)

Take the syntax with a grain of salt. It has changed since then, and I
don't know the latest details.

/Uffe


On Mon, 20 Sep 1999 tmb-erlang@REDACTED wrote:

tmb-er>In many ways, Erlang looks very good for building distributed
tmb-er>web applications, but its string performance is very poor in my
tmb-er>benchmarks: string append is orders of magnitude slower than
tmb-er>in Perl, and characters take many bytes to store (compared to one
tmb-er>byte per character in other languages).
tmb-er>
tmb-er>I'm curious whether there are any plans to address this.
tmb-er>One approach would be to transparently switch representations
tmb-er>between lists and strings, like Tcl does.  That would be 
tmb-er>completely backwards compatible.  An alternative would be to
tmb-er>define a separate string type, define new pattern matching
tmb-er>syntax for true strings or prohibit pattern matching on true
tmb-er>strings altogether; this would be backwards compatible, but less
tmb-er>interopreable between old and new code (I still prefer this latter
tmb-er>choice).
tmb-er>
tmb-er>So, what are the plans?  I know I can use byte arrays, but that
tmb-er>doesn't seem like it's quite the same as having a real string type.
tmb-er>
tmb-er>Thanks,
tmb-er>Thomas.
tmb-er>

Ulf Wiger, Chief Designer AXD 301      <ulf.wiger@REDACTED>
Ericsson Telecom AB                          tfn: +46  8 719 81 95
Varuvägen 9, Älvsjö                          mob: +46 70 519 81 95
S-126 25 Stockholm, Sweden                   fax: +46  8 719 43 44




More information about the erlang-questions mailing list