[erlang-questions] Charset conversion / stylistic question.

Erik Stenman <>
Thu Apr 19 09:20:43 CEST 2007


Tim Becker wrote:
>> map_1(F, Bin) ->
>>   {C, Rest} = F(Bin),
>>   [C | map_1(F, Rest)];
>> map_1(F, <<>>) -> [].
>>     
>
> Just to make sure I understand. The above example isn't tail recursive
> either, correct? The [|] list construction gets executed after the
> recursive call to map_1 ...
>
> So what I've changed is:
>
> the binary map just converts the binary to a list which uses lists:map/2:
>
>   map (Function, <<B/binary>>) -> lists:map(Function, binary_to_list(B));
>   map (_Function, <<>>) -> [].
>
>
> I've moved all the conversion functions into a separate module, so I'd
> have one module per charset later. Still:
>   ...
>   cp037_to_iso_8859_1 (129) -> 97; % a
>   cp037_to_iso_8859_1 (130) -> 98; % b
>   cp037_to_iso_8859_1 (131) -> 99; % c
>   ...
>   

> seems sort of inefficient, especially considering the size of the
> source to convert UCS :)
>   

This should (depending on the look of the complete code) be turned into 
a jumptable
(at least with native compilation).
So it should be quite efficient. In the worts case it will become a 
binary search,
still probably fast enough.

The general advice still holds:
 First write a beautiful solution, the worry about performance (if 
necessary).

Note that trying to optimize the code by doing something like:
 cp037_to_iso_8859_1 (Char) -> element (Vhar, {..., 97,...}).
Will be very inefficient on old systems since the tuple would be created 
dynamically
on every call. (I have heard rumors that beam now support statically 
allocated
constants, but I am not "in the loop" any more so I'm not sure whether 
this is true.
HiPE does have support for such constants, so with hipe compilation this 
could
be a valid optimization.)


For another solution to character conversion you might want to look at iconv
in jungerl.

-- Happi
> Finally, in the main module, I implement the `convert` function like this:
>
> convert({ebcdic, iso_8859_1}, <<Binary/binary>>) ->
> 	map(fun cp037:cp037_to_iso_8859_1/1, Binary);
>
> convert({iso_8859_1, ebcdic}, String) ->
> 	L = lists:map(fun cp037:iso_8859_1_to_cp037/1, String),
> 	list_to_binary(L);
>
> convert({To, From}, _) -> {unsupported_encodings, To, From}.
>
>
> thinking that a non-native (non iso-8859-1) string would usually show
> up as a binary and 8859-1 strings are most useful as lists. Anyway,
> I'm just sort of thinking out loud. I should start writing the actual
> code to see how the interface to `convert` should look.
>
> Thanks for all the help,
>    -tim
>
>
>
>
>   
>> where e.g. F could call:
>>
>> cp037_to_iso8859_1 (<<129:1, Rest/binary>>) -> {97,Rest}; % a
>> ...
>>
>>  Then, your conversion fun could select as many bytes
>> as it needs to.
>>
>> BR,
>> Ulf W
>>
>>
>>     
> _______________________________________________
> erlang-questions mailing list
> 
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
>   




More information about the erlang-questions mailing list