[erlang-questions] Strings as Lists

Masklinn masklinn@REDACTED
Fri Feb 15 12:06:12 CET 2008


On 15 Feb 2008, at 11:27 , Richard Carlsson wrote:

> Dmitrii 'Mamut' Dimandt wrote:
>> Richard Carlsson wrote:
>>> Strings as lists is simple and flexible (i.e., if you already have  
>>> lists,
>>> you don't need to add another data type). Functions that work on  
>>> lists,
>>> such as append, reverse, etc., can be used directly on strings; you
>>> don't need to program in different styles if you're traversing a  
>>> list
>>> or a string; etc.
>> This is only true for ASCII text ;) Non-ASCII gets screwed up badly:
>>
>> lists:reverse("text") %% gives you "txet"
>> lists:reverse("текст") %% Russian for text becomes
>> [130,209,129,209,186,208,181,208,130,209] which is clearly not what I
>> wanted :)
>
> That's because the second line is currently not a legal Erlang  
> program.
> The tokenizer will assume that your source code is encoded using  
> Latin-1,
> and since you are giving the compiler garbage input, it gives you  
> garbage
> output. Basically, the compiler thinks that you wrote "Ñ 
> \202екÑ\201Ñ\202",
> not "текст", and the reverse of that is indeed "\202Ñ 
> \201ѺеÐ\202Ñ",
> which is what you got (regardless of what you _wanted_).
>
> What Erlang needs to support non Latin-1 languages, is filters for  
> decoding
> input and encoding output.

Yep. How extensive would be the changes to perform to have a  
configurable tokenizer? Something like Python where you can specify  
the encoding of your source code if you want something other than the  
default (which, in python, is ASCII)?




More information about the erlang-questions mailing list