[erlang-questions] Strings - deprecated functions

Fri Nov 24 16:50:20 CET 2017

On 11/24, zxq9 wrote:
>On 2017年11月24日 金曜日 08:13:49 Eric des Courtis wrote:
>On reflection, I actually think the new string functions should have
>been rolled into a "utf8" module. Or something. And the "strings" module
>could either have had implementation adjustments that use the utf8
>utilities underneath or been left alone to deal with latin1 (but either
>way be amply documented).
>

This would be bad naming; utf8 is but one encoding of Unicode as I'm 
sure you're aware. The current string module should be able to handle 
utf8, utf16, utf32, *and* lists of codepoints (chardata()).

'strings' is sadly the most appropriate name for this module, and for 
the first time it also has the ability to really handle *anything* we 
consider to be a string: lists, binaries, and mixes of both.

The problem was really that the old 'string' module was not super great 
at being a 'string' module. It would have been better named as 'cstring' 
or something.

Maybe it could have been 'str' instead, who knows. Then we'd get cool 
conference talks saying how shitty and confusing the stdlib is because
you don't know whether to use string or str in your code!

Please let's not get inspired by 'mysql_real_escape_string', which had 
to be implemented because 'mysql_escape_string' was not good enough but 
people kept relying on it. It's a laughing stock and even a security 
problem for everyone in that case.

>I'm the heaviest unicode string dealing guy I know. I'm SUPER happy that
>the idea of "string" has been advanced (finally!) to mean "unicode strings".
>But breakage is a thing, and Lloyd and Joe have a point.
>

I think we have to consider a thing, as someone else has mentioned: old 
code is not broken. Old code keeps running. Old code is fine. It has a 
new compiler warning.

Old code could be wrong, because it may be getting unicode data and 
mangling it instead of doing the right thing. Old code could not even 
detect that. Old code was getting passed by everyone in the world. Old 
code can't even work safely on Erlang modules or the content of your 
.app file anymore because that content is now UTF-8.

We can't know. For all we care, old code is getting to break because the 
world is passing it by and it's not keeping up. If old code must remain 
stable in a changing world, old code must be run in its old context: 
maybe stick it into the same old VM, or vendor it in along with its 
build tools, artifacts, and everything it needs (be careful though, R15 
old code no longer can fetch packages or dependencies safely, since TLS 
from R15 is no longer safe in the real world).

Old code can't necessarily be recompiled onto new tools, because new 
tools have to address a changing world. Sad thing for old code, but if 
you use old tools, maybe it will work.

Maybe one can just copy/paste the string.erl module into cstring.erl, 
change the `-module` attribute, and then do a 'sed' call on their code.  

If old code is expected to work together with the existing API and no 
changes, but that the API is no longer right for the current world, 
maybe old code should freeze its dependencies and environment.

In any case, current code is not yet old code. It's being told that 
eventually it will be old code. In the meanwhile, people who need to 
worry are those who need to support more than one version at once. Those 
people feel some pain for sure. The other people to suffer are those on 
a tight budget with humongous codebase whose life may be to keep 
maintaining changing code all the time because there's so much code they 
can hardly keep up with all the changes.

So uh, what are you gonna promise in terms of breaking? It sounds like 
the best policy is to pick a value of what you promise in terms of 
backwards compatibility, and the OTP team has a very clear policy there.  
Two major versions at least.

The risk of never changing nor breaking a thing forever is that old code 
can prevent new code from being written if old code is so bad that new 
code can't make sense.  Then you have nothing but old code. You've 
accrued enough technical debt that nobody who hasn't grown with the 
system can now understand it. Then your old code for your old system is 
only possible to be used and maintain by old programmers, because the 
new programmers have gone somewhere else.