<div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-family:monospace,monospace">Words ending with the morpheme "-eme" generally come from linguistics.</div><div class="gmail_default" style="font-family:monospace,monospace">In particular, "grapheme" can only be defined with respect to a</div><div class="gmail_default" style="font-family:monospace,monospace">particular writing system.  "In <a href="https://en.wikipedia.org/wiki/Linguistics" title="Linguistics" target="_blank">linguistics</a>, a <b>grapheme</b> is the smallest unit of a <a href="https://en.wikipedia.org/wiki/Writing_system" title="Writing system" target="_blank">writing system</a> of any given language."  For example, in English,</div><div class="gmail_default" style="font-family:monospace,monospace">"ë" is two graphemes, an "e" grapheme, and a "pronounce this vowel</div><div class="gmail_default" style="font-family:monospace,monospace">separately" grapheme.  In other European languages, "e" and "ë" are</div><div class="gmail_default" style="font-family:monospace,monospace">quite separate letters.</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">What Hugo Mills described is not a grapheme but a grapheme *cluster*.</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">We have code unit, code point, glyph, character, grapheme, grapheme</div><div class="gmail_default" style="font-family:monospace,monospace">cluster, and a bunch of other terms that are pretty much identical</div><div class="gmail_default" style="font-family:monospace,monospace">in ASCII or ISO 8859 but when you make a serious attempt to encode</div><div class="gmail_default" style="font-family:monospace,monospace">all the scripts anyone wants to use on a computer, things get</div><div class="gmail_default" style="font-family:monospace,monospace">horribly complicated.  And they get complicated in *language-specific*</div><div class="gmail_default" style="font-family:monospace,monospace">ways.  (Like case conversion.  You can't really do case conversion in</div><div class="gmail_default" style="font-family:monospace,monospace">Unicode without knowing what language you are concerned with.)  This</div><div class="gmail_default" style="font-family:monospace,monospace">always *was* complicated in the real world, but people in Western</div><div class="gmail_default" style="font-family:monospace,monospace">Europe and the Americas were mostly able to ignore it.  (Things got</div><div class="gmail_default" style="font-family:monospace,monospace">somewhat complicated in NZ where the indigenous language uses a</div><div class="gmail_default" style="font-family:monospace,monospace">Latin-based script with macrons and where wh and ng count as single</div><div class="gmail_default" style="font-family:monospace,monospace">letters.)</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">Curiously, in the Unicode 12 standard, "grapheme" is not in the index,</div><div class="gmail_default" style="font-family:monospace,monospace">but "grapheme base", "grapheme cluster", and "grapheme extender", for</div><div class="gmail_default" style="font-family:monospace,monospace">example, are.</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">I suspect that the word "grapheme", precisely because it is a</div><div class="gmail_default" style="font-family:monospace,monospace">language-dependent technical term with some surprising twists,</div><div class="gmail_default" style="font-family:monospace,monospace">may not be a good word to use here.</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">"Lexeme" is, if anything worse. "A <b>lexeme</b>  is a unit of <a href="https://en.wikipedia.org/wiki/Lexical_semantics" title="Lexical semantics" target="_blank">lexical</a> meaning that</div><div class="gmail_default" style="font-family:monospace,monospace"> underlies a set of words that are related through inflection. It is a basic</div><div class="gmail_default" style="font-family:monospace,monospace"> abstract unit of meaning, a <a href="https://en.wikipedia.org/wiki/Emic_unit" title="Emic unit" target="_blank">unit</a> of <a href="https://en.wikipedia.org/wiki/Morphology_(linguistics)" title="Morphology (linguistics)" target="_blank">morphological</a> <a href="https://en.wikipedia.org/wiki/Semantic_analysis_(linguistics)" title="Semantic analysis (linguistics)" target="_blank">analysis</a> in <a href="https://en.wikipedia.org/wiki/Linguistics" title="Linguistics" target="_blank">linguistics</a> that</div><div class="gmail_default" style="font-family:monospace,monospace">roughly corresponds to a set of forms taken by a single root <a href="https://en.wikipedia.org/wiki/Word" title="Word" target="_blank">word</a>."  That is</div><div class="gmail_default" style="font-family:monospace,monospace">NOT what it means here.  In computing, it basically means "token".  But what</div><div class="gmail_default" style="font-family:monospace,monospace">*does* it mean?  In "Now we see it, now we don't." are there two lexemes</div><div class="gmail_default" style="font-family:monospace,monospace">spelled "we" or is there one "lexeme" with two occurrences?  (If you ever</div><div class="gmail_default" style="font-family:monospace,monospace">meet two linguists in a bar who don't know each other, try asking them what</div><div class="gmail_default" style="font-family:monospace,monospace">a "word" is.  There are at least four different meanings.)</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">"token" has the merit of coming from one half of the type/token distinction.</div><div class="gmail_default" style="font-family:monospace,monospace">In fact, that's *WHY* they are called tokens.  In "Now we see it, now we</div><div class="gmail_default" style="font-family:monospace,monospace">don't" there is ONE word type "we" which has TWO tokens.</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">So seriously, as someone who has been reading academic linguistics for</div><div class="gmail_default" style="font-family:monospace,monospace">several decades and has spent more time trying to understand Unicode than</div><div class="gmail_default" style="font-family:monospace,monospace">is compatible with sanity, I think the OP's objection carries weight.</div><div class="gmail_default" style="font-family:monospace,monospace">(I said I've been *reading* the stuff.  That's not always the same as</div><div class="gmail_default" style="font-family:monospace,monospace">*understanding* it, and I certainly couldn't *write* like a linguist.)</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 7 May 2019 at 19:56, Hugo Mills <<a href="mailto:hugo@carfax.org.uk" target="_blank">hugo@carfax.org.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Mon, May 06, 2019 at 05:40:48PM -0400, <a href="mailto:lloyd@writersglen.com" target="_blank">lloyd@writersglen.com</a> wrote:<br>

> Hi,<br>

> <br>

> This has come up before with various work-arounds suggested. Apologies for this old-man's rant, but every time I run across the impending death of string:tokens/2 to the glory of string:lexemes/2 my blood pressure rises.<br>

> <br>

> I HATE IT. I HATE IT. I HATE IT, not least because the terms lexeme and grapheme are ugly inside-baseball words. Reading the docs, I have to do a Google search to understand what these obscure terms are referring to-- precious time wasted. And with my waning years, I don't have time to waste.<br>

> <br>

> Even my spell-checker doesn't recognize them.<br>

> <br>

> I get the desirability of welcoming unicode into Erlang. But can't we come up with friendlier nomenclature or, at least revise the docs so they don't sound like copy-and-paste out a academic linguistics journal? <br>

<br>

<br>

   Most of the other words you might want to use are already in use<br>

for other things. Modern (computer) representation of writing systems<br>

is complicated, and there's not enough words to go round the existing<br>

concepts. Particularly words without well-known and either misleading<br>

or overly-narrow definitions -- see my comment on "letters", below.<br>

<br>

   For the two particular words you're complaining of here, I think of<br>

them thus:<br>

<br>

   graphemes, like graphology(*), are to do with the way that<br>

   something's written on the page -- the shape and composition of the<br>

   symbols. It's essentially a letter plus all of its diacritics (but<br>

   it's not defined as such, because there are some graphemes that are<br>

   ligatures of two or more letters, and some languages where each<br>

   grapheme is a word in its own right).<br>

<br>

   lexemes, like a lexicon, are to do with words, and are therefore<br>

   groups of (certain kinds of) graphemes.<br>

<br>

   Hugo.<br>

<br>

(*) For all that it's unsubstantiated in its psychometric claims.<br>

<br>

-- <br>

Hugo Mills             | I can't foretell the future, I just work there.<br>

hugo@... <a href="http://carfax.org.uk" rel="noreferrer" target="_blank">carfax.org.uk</a> |<br>

<a href="http://carfax.org.uk/" rel="noreferrer" target="_blank">http://carfax.org.uk/</a>  |<br>

PGP: E2AB1DE4          |                                            The Doctor<br>

_______________________________________________<br>

erlang-questions mailing list<br>

<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>

<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>

</blockquote></div>