[erlang-questions] strings vs binaries

zxq9 zxq9@REDACTED
Wed Aug 19 14:04:17 CEST 2015


On 2015年8月19日 水曜日 13:44:23 you wrote:
> On 08/19/2015 12:04 PM, zxq9 wrote:
> > Going out again, of course, iolists can accept binaries and strings
> 
> That's actually incorrect. iolist() does not include string() in the 
> types it allows. The type is:
> 
> iolist() :: maybe_improper_list(byte() | binary() | iolist(), binary() | [])
> 
> If you have a string() that only has 0..255 characters, then it will 
> work; but this is incidental. A Unicode string inside an iolist() will 
> not work.
> 
> So if you convert a binary to Unicode string to do manipulation, you 
> have to *convert it back*.
> 
> The fact that strings are faster is not very interesting considering the 
> amount of conversion you have to do and the extra memory you end up 
> using. That, and you can just send your binary to the i18n library and 
> let it handle things anyway:
> 
> i18n_string:from_utf8(Bin)
> 
> That's assuming you use the i18n library, but if you're going to do 
> anything with Unicode that's pretty much the only good choice.

Most of the time I'm either putting something out through a socket, using io_lib:format, io:format, or doing something between `unicode:characters_to_list(Data, utf8)` and 
`unicode:characters_to_binary(UTF8_data, utf8)` (this last typically when dealing with files).

So far I haven't run into any cases where having a utf8 binary or a utf8 string has caused problems in a deep list with these functions. Where will it break? (I really want to know, since I could easily run into something that will just suddenly break at some point!)

For example:

1> io:format("~ts~n", [[["何か","何々"],"So why is this working?",<<"何か"/utf8>>]]).
何か何々So why is this working?何か
ok

There are certainly functions that operate on strings that won't work on such a list, but I don't know that I've run into any places that say they accept iolist() and won't accept that. But I'm really curious to know what won't work.

-Craig

PS:
Luckily it remains the case that performance (memory, aside from leaks, and generally processing speed) simply has never been a real issue for us, so whatever overhead is being incurred is richly paid back in savings in coding time not panicking about every form of input. But we're not webscale coolguys -- just solving business problems at roughly the lowest scale of aggregation possible.



More information about the erlang-questions mailing list