[erlang-questions] string:lexeme/s2 - an old man's rant

Thu May 9 14:48:05 CEST 2019

Who isn't going to expect 'split' 'tokenize'/'tokens' 'clean' 'right'
'left' 'pad'?
A Ruby programmer will recognise 'split' from that list but nothing else.
An SML programmer will recognise 'tokens' but nothing else.
A Haskell programmer will wonder whether left/right correspond to
justifyLeft/
  justifyRight and if so, which way around.  'split' might be OK, if only I
knew'
  what you expect it to do.
An F# or C# programmer will wonder whether left/right correspond to padLeft/
  padRight and if so, which way around.  As for 'split', which of the 10
  methods by that name did you have in mind?
A PL/I programmer will expect 'left' and 'right' to correspond to LEFT and
RIGHT
  (or possibly the other way around, depending on whether the focus is
where the
  *string* goes or where the *padding* goes).  The others will be a
complete mystery.
A Simula programmer won't have a clue what any of these are and will be
disappointed
  by strings that don't have a movable cursor.
An OCaml programmer will hope that 'split' is related to 'split_on_char'
but will
  not have any idea what the other functions are.
A Python programmer may be surprised that 'split' is actually 'tokens'.

And so it goes.  What *does* 'clean' do?

On Thu, 9 May 2019 at 02:18, <zxq9@REDACTED> wrote:

> On 2019年5月8日水曜日 10時53分25秒 JST Richard O'Keefe wrote:
> > For what it's worth, in Unicode, Line Separator and Paragraph
> > Separator are the recommended characters, with CR, LF, CR+LF,
> > and of arguably NEL (U+0085) being "legacy".
> >
> > Again for what it's worth, Unicode defines an algorithm for
> > breaking text into word( token)s.
>
> I don't really mind the term "lexeme", but I've wondered why the
> existing tokens/2 function wasn't simply updated to work the way
> lexemes/2 works.
>
> If we needed a new function, it seems the name "tokenize/2" might
> have been an easier mental adjustment.
>
> But anyway, naming things is hard and... meh. For me the unicode
> enhancements are a big enough deal that I could *almost* care less
> what they are called.
>
> That said, who isn't going to open a new language's string lib and
> expect to find things called "split" "tokenize"/"tokens", "clean",
> "right", "left", "pad", etc.?
>
> -Craig
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190510/07306cc7/attachment.htm>