[erlang-questions] Strings - deprecated functions
Fred Hebert
mononcqc@REDACTED
Fri Nov 24 16:50:20 CET 2017
On 11/24, zxq9 wrote:
>On 2017年11月24日 金曜日 08:13:49 Eric des Courtis wrote:
>On reflection, I actually think the new string functions should have
>been rolled into a "utf8" module. Or something. And the "strings" module
>could either have had implementation adjustments that use the utf8
>utilities underneath or been left alone to deal with latin1 (but either
>way be amply documented).
>
This would be bad naming; utf8 is but one encoding of Unicode as I'm
sure you're aware. The current string module should be able to handle
utf8, utf16, utf32, *and* lists of codepoints (chardata()).
'strings' is sadly the most appropriate name for this module, and for
the first time it also has the ability to really handle *anything* we
consider to be a string: lists, binaries, and mixes of both.
The problem was really that the old 'string' module was not super great
at being a 'string' module. It would have been better named as 'cstring'
or something.
Maybe it could have been 'str' instead, who knows. Then we'd get cool
conference talks saying how shitty and confusing the stdlib is because
you don't know whether to use string or str in your code!
Please let's not get inspired by 'mysql_real_escape_string', which had
to be implemented because 'mysql_escape_string' was not good enough but
people kept relying on it. It's a laughing stock and even a security
problem for everyone in that case.
>I'm the heaviest unicode string dealing guy I know. I'm SUPER happy that
>the idea of "string" has been advanced (finally!) to mean "unicode strings".
>But breakage is a thing, and Lloyd and Joe have a point.
>
I think we have to consider a thing, as someone else has mentioned: old
code is not broken. Old code keeps running. Old code is fine. It has a
new compiler warning.
Old code could be wrong, because it may be getting unicode data and
mangling it instead of doing the right thing. Old code could not even
detect that. Old code was getting passed by everyone in the world. Old
code can't even work safely on Erlang modules or the content of your
.app file anymore because that content is now UTF-8.
We can't know. For all we care, old code is getting to break because the
world is passing it by and it's not keeping up. If old code must remain
stable in a changing world, old code must be run in its old context:
maybe stick it into the same old VM, or vendor it in along with its
build tools, artifacts, and everything it needs (be careful though, R15
old code no longer can fetch packages or dependencies safely, since TLS
from R15 is no longer safe in the real world).
Old code can't necessarily be recompiled onto new tools, because new
tools have to address a changing world. Sad thing for old code, but if
you use old tools, maybe it will work.
Maybe one can just copy/paste the string.erl module into cstring.erl,
change the `-module` attribute, and then do a 'sed' call on their code.
If old code is expected to work together with the existing API and no
changes, but that the API is no longer right for the current world,
maybe old code should freeze its dependencies and environment.
In any case, current code is not yet old code. It's being told that
eventually it will be old code. In the meanwhile, people who need to
worry are those who need to support more than one version at once. Those
people feel some pain for sure. The other people to suffer are those on
a tight budget with humongous codebase whose life may be to keep
maintaining changing code all the time because there's so much code they
can hardly keep up with all the changes.
So uh, what are you gonna promise in terms of breaking? It sounds like
the best policy is to pick a value of what you promise in terms of
backwards compatibility, and the OTP team has a very clear policy there.
Two major versions at least.
The risk of never changing nor breaking a thing forever is that old code
can prevent new code from being written if old code is so bad that new
code can't make sense. Then you have nothing but old code. You've
accrued enough technical debt that nobody who hasn't grown with the
system can now understand it. Then your old code for your old system is
only possible to be used and maintain by old programmers, because the
new programmers have gone somewhere else.
More information about the erlang-questions
mailing list