<div>No, list_to_binary and iolist_to_binary are <strong>not</strong> considered harmful.<br></div><div><br></div><div>The problem is that they both implicitly convert integers on the range 0..255 inclusively. This works based on the definition that iolists only contain bytes or binaries. list_to_binary will accept all iolists (deeply nested lists of bytes (0..255) and binaries) and iolist_to_binary will accept both iolists or flat binaries.</div>
<div><br></div><div>They were apparently meant to convert between lists and binaries of bytes, not lists or binaries of arbitrary large (or negative) numbers. Because we Erlang programmers have been so relying on the idea of lists as strings, using ASCII and most Latin1 sequences of bytes made for fine conversions to binaries and nobody ever had a problem.</div>
<div><br></div><div>Unicode standards (and their respective UTF encodings) break this assumption that strings most of the time only contain ASCII and Latin1 integers for their codepoints. This is why you need the unicode module's conversion there.</div>
<div><br></div><div>The same is then true of binary_to_list. The trouble there is that the binary representation (in bytes) of unicode strings doesn't match the list representation of unicode as accepted by ~ts printing and whatnot. The raw bytes representation isn't good enough, and that's what binary_to_list gives you. Again, the unicode module is clever enough to handle that.</div>
<div><br></div><div>iolist_to_binary, list_to_binary and binary_to_list are fine when you know they're meant to be used for bytes, not arbitrary data. </div><div><br></div><div>What's hurting Erlang more, I think, is the lack of Unicode algorithms as described by Michael Uvarov. If we have a unicode string, we currently can't get its length (in terms of graphemes, or 'characters to the human mind') in any reliable way. We also can't specify locales, can't do casing and whatnot, etc. Ideally those would be the next step for Erlang's unicode support, I think.</div>
<div><br></div><div class="gmail_quote">On Thu, Oct 20, 2011 at 4:23 AM, Joe Armstrong <span dir="ltr"><<a href="mailto:erlang@gmail.com">erlang@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Interesting comment: this is almost where I could write an article with the<br>
title "list_to_binary considered harmful" - I guess if Erlang is<br>
serializing terms<br>
to be stored on disk etc. term_to_binary and its inverse should be used.<br>
list_to_binary seems to imply that you are going to send something to the<br>
outside world - and then you should stop and think hard, this is<br>
because there is<br>
no universal agreement in the outside world as to what an integer is<br>
(ie is it bounded or not)<br>
fixing a notion of an integer to something in the range 0..255 allows<br>
communication of<br>
integers, but requires a framing protocol (ie UTF8, or ASN.1) that<br>
tells how integers<br>
are encoded - but this is out of band.<br>
<br>
The problem is that I might write<br>
<br>
X1 = "10$" (10 dollars) or<br>
X2 = "10\x{20ac}" (10 euros)<br>
<br>
Now list_to_binary(X1) will succeed but list_to_binary(X2) will fail<br>
<br>
So maybe I should write<br>
<br>
X1 = {ansii, "10$"}<br>
X2 = {unicode,"10\x{20ac}"}<br>
<br>
If the libraries were written this way then life might be easier<br>
<font color="#888888"><br>
/Joe<br>
</font><div><div class="h5"><br></div></div></blockquote><div><br></div><div> </div></div>