[erlang-questions] Strings - deprecated functions

Anthony Ramine n.oxyde@REDACTED
Thu Nov 23 13:20:37 CET 2017


> Le 22 nov. 2017 à 20:43, lloyd@REDACTED a écrit :
> 
> Dear Gods of Erlang,
>  
> "This module has been reworked in Erlang/OTP 20 to handle unicode:chardata() and operate on grapheme clusters. The old functions that only work on Latin-1 lists as input are still available but should not be used. They will be deprecated in Erlang/OTP 21."
>  
> I'm sorry. I've brought up this issue before and got lots of push back.
>  
> But every time I look up tried and true and long-used string functions to find that they are deprecated and will be dropped in future Erlang releases my blood pressure soars. Both my wife and my doctor tell me that at my age this is a dangerous thing.
>  
> I do understand the importance and necessity of Unicode. And applaud the addition of Unicode functions.
>  
> But the deprecated string functions have a long history. The English language and Latin-1 characters are widely used around the world. 

You do know that Latin-1 cannot be used to represent all English words, right?

> Yes, it should be easy for programmers to translate code from one user language to another. But I'm not convinced that the Gods of Erlang have found the optimal solution by dropping all Latin-1 string functions.
>  
> My particular application is directed toward English speakers. So, until further notice, I have no use for Unicode.

Damn, I hope your users will never want to tell their friends how delicious was the hors-d'œuvre they ate yesterday.

> I don't want to sound like nationalist pig, but I think dropping the Latin-1 string functions from future Erlang releases is a BIG mistake.
>  
> I look up tokens/2, a function that I use fairly frequently, and I see that it's deprecated. I look up the suggested replacement and I see lexemes/2.
>  
> So I ask, what the ... is a lexeme? I look it up in Merriam-Webster and I see that a lexeme is  "a meaningful linguistic unit." 
>  
> Meaning what? I just want to turn "this and that" into "This And That."
>  
> I read further in the Erlang docs and I see "grapheme cluster."  WHAT THE ... IS GRAPHEME CLUSTER?
>  
> I look up "grapheme" in Merriam-Webster. Oh it is now all so clear: "a unit of a writing system."
>  
> Ah yes, grapheme is defined in the docs. But I have to read and re-read the definition to understand what the God's of Erlang mean by a "graphene cluster." And I'm still not sure I get it.
>  
> It sounds like someone took a linguistics class and is trying to show off.
>  
> But now I've spent 30 minutes--- time that I don't have to waste trying to figure out how do a simple manipulation of "this and that." Recurse the next time I want to look up a string function in the Erlang docs.

IMO the functions should have been named according to "grapheme cluster", not "lexeme".

> SOLUTION
>  
> Keep the Latin-1 string functions. Put them in a separate library if necessary. Or put the new Unicode functions in a separate library. But don't arbitrarily drop them.
>  
> Some folks have suggested that I maintain my own library of the deprecated Latin1 functions. But why should I have to do that? How does that help other folks with the same issue?

The issue is that you want to keep using Latin-1 (which Latin-1 btw, you do know there are at least 2 of them? Do you know which one Erlang uses? Beware that's a tricky question) instead of switching to Unicode, which will benefit even your English users.

> Bottom line: please please please do not drop the existing Latin-1 string functions.
>  
> Please don't.
>  
> Best wishes,
>  
> LRP



More information about the erlang-questions mailing list