[erlang-questions] Charset conversion / stylistic question.

Wed Apr 18 23:19:18 CEST 2007

> map_1(F, Bin) ->
>   {C, Rest} = F(Bin),
>   [C | map_1(F, Rest)];
> map_1(F, <<>>) -> [].

Just to make sure I understand. The above example isn't tail recursive
either, correct? The [|] list construction gets executed after the
recursive call to map_1 ...

So what I've changed is:

the binary map just converts the binary to a list which uses lists:map/2:

  map (Function, <<B/binary>>) -> lists:map(Function, binary_to_list(B));
  map (_Function, <<>>) -> [].

I've moved all the conversion functions into a separate module, so I'd
have one module per charset later. Still:
  ...
  cp037_to_iso_8859_1 (129) -> 97; % a
  cp037_to_iso_8859_1 (130) -> 98; % b
  cp037_to_iso_8859_1 (131) -> 99; % c
  ...

seems sort of inefficient, especially considering the size of the
source to convert UCS :)

Finally, in the main module, I implement the `convert` function like this:

convert({ebcdic, iso_8859_1}, <<Binary/binary>>) ->
	map(fun cp037:cp037_to_iso_8859_1/1, Binary);

convert({iso_8859_1, ebcdic}, String) ->
	L = lists:map(fun cp037:iso_8859_1_to_cp037/1, String),
	list_to_binary(L);

convert({To, From}, _) -> {unsupported_encodings, To, From}.

thinking that a non-native (non iso-8859-1) string would usually show
up as a binary and 8859-1 strings are most useful as lists. Anyway,
I'm just sort of thinking out loud. I should start writing the actual
code to see how the interface to `convert` should look.

Thanks for all the help,
   -tim

>
> where e.g. F could call:
>
> cp037_to_iso8859_1 (<<129:1, Rest/binary>>) -> {97,Rest}; % a
> ...
>
>  Then, your conversion fun could select as many bytes
> as it needs to.
>
> BR,
> Ulf W
>
>