[erlang-questions] unexpected result of term_to_binary

Richard O'Keefe ok@REDACTED
Wed Jan 23 05:15:56 CET 2013


On 23/01/2013, at 8:18 AM, Steve Davis wrote:

> Let's just say that I'm now sorry I asked the question in the first place. I have a decent, unambiguous solution to the issue.
> 
> However, to name that byte flag STRING_EXT is simply dishonest and misleading.

More precisely, it DOES honestly reveal the reason for the byte's
existence, which is to compactly represent an instance of one of
the data structures Erlang uses for strings.  It might be an idea
to rename it to BYTE_LIST_EXT.

It's a common problem in naming, that a name goes one way
(if you know what it means you understand the name)
but not the other (if you see the name you understand what it means).

In this case, like all cases really, you have to go by the
*documentation*, not just by the name.
> 
> And apart from erl_scan that Ulf mentioned, consider the meaning of this module for erlang: http://www.erlang.org/doc/man/string.html
> 
> Finally, if you search this list you'll see I have a history of promoting the revocation of the whole concept of a "string" in erlang. 

It's a bit like C, really.
C doesn't have strings.
It has nine different _representations_ for strings
( pointer to NUL-terminated array without counter,
| pointer to NUL-terminated array with upper bound,
| pointer to prefix of array with counter
) x
( one byte per character
| multiple possibly variable bytes per character
| fix multiple bytes per character
)
with two different syntaxes for string literals.

Yet nobody seems to feel any qualms about talking about strings in C.

In any case, strings are wrong.

	The string is a stark data structure
	and everywhere it is passed
	there is much duplication of process.
	It is a perfect vehicle for hiding information.
		-- Alan Perlis.

My enlightenment came when a fellow masters student
wrote a batch version of the PRIMOS text editor in
PL/I to use on IBM mainframes.  (Neither of us had
heard of sed at that point.)  It took him about
150 pages of PL/I, and he was fighting the language's
strings every step of the way.  The following year,
when I learned C, I wrote my own implementation,
which took 20 pages.  Reason?  I did my own text-
handling functions, bypassing C's.  It also ran about
as fast on a PDP-11/60 as his editor did on a mainframe,
apart from I/O.  Again, at Quintus, our emulator was
written in a high level portable macro language (think
PL/360) called Progol.  The Progol compiler was rewritten
by someone who _didn't_ believe that strings were a bad
idea, and instead of running faster, which was one of the
reasons we wanted it rewritten, it ran about four times
slower.

I keep on watching out for this kind of problem.
Render-this-to-an-output-stream is [usually] good;
convert-this-to-a-string is [usually] bad,
except when it is unbelievably bad.
(Smalltalk got this right, and Java, with the Smalltalk
example before it and clearly explained, got it wrong.)




More information about the erlang-questions mailing list