[erlang-questions] Charset conversion / stylistic question.

Ulf Wiger <>
Tue Apr 17 22:38:24 CEST 2007


Hi Tim,

There is a problem in your map/2 function.
The call to list_to_binary/1 makes it non-tail recursive,
so the recursion cannot reuse the stack frame.
Also, you will get many unnecessary calls to list_to_binary()
(one per iteration).

One could instead write like this

map(F, Bin) ->
  list_to_binary(map_1(F, Bin))

map_1(F, <<A:8, Rest/binary>>) -> [F(A)|map_1(Rest)];
map_1(_, <<>>) -> [].

In your map, the terminating case returned <<>>, which
would be right for a call to map(F, <<>>), but when
ending the iteration creates a non-proper list, e.g.
[$a,$b,$c|<<>>]. This is tolerated by list_to_binary(),
but you should be aware that it is happening, because
it can bite you in other situations.

In OTP R11B-4, there is support for binary comprehensions,
even though it's still experimental. With it, I believe your
function could be done in this fashion (haven't tried it
myself, the syntax might not be quite correct):

map(F, Bin) ->
    << F(C):1 || C:1 <<- Bin >>.

BR,
Ulf W

2007/4/17, Tim Becker <>:
>
> Hi all,
>
> I'm just starting out and still a little bit intimidated about how to
> do things correctly in erlang and would like some advice. (In other
> words, please bear with me.) I'd like to write some routines to recode
> charsets and I'm not sure what would be the best way to go about it.
> What I've come up with so far is:
>
>   map (Function, <<A:8, Rest/binary>>) -> list_to_binary([Function(A)|
> map (Function, Rest)]);
>   map (_Function, <<>>) -> <<>>.
>
>   convert({ebcdic, iso_8859_1}, <<Binary/binary>>) -> map(fun
> cp037_to_iso8859_1/1, Binary);
>   (...)
>   convert({From, To}, <<_Binary/binary>>) -> {unsupported_encoding, From,
> To}.
>
>   cp037_to_iso8859_1 (0) -> 0;
>   (...)
>   cp037_to_iso8859_1 (129) -> 97; % a
>   cp037_to_iso8859_1 (130) -> 98; % b
>   cp037_to_iso8859_1 (131) -> 99; % c
>   cp037_to_iso8859_1 (132) -> 100; % d
>   cp037_to_iso8859_1 (133) -> 101; % e
>   (...)
>   cp037_to_iso8859_1 (255) -> 159.
>
>
> which works, so I'm relieved. But I'm not sure about the most
> appropriate way to write the actual conversion functions. Using
> pattern matching for each conversion like above, or just having one
> function body with a bunch of if's in it or a case statement...
>
> >From the languages I'm used to, I'd probably use a bunch of arrays and
> have the value of the `from` charset be an index into the conversion
> array. This would be possible as well, though it wouldn't be possible
> to convert multibyte charsets like UTF-8...
>
> Thanks in advance and sorry about the newbie question,
>    -tim
> _______________________________________________
> erlang-questions mailing list
> 
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20070417/a5fda320/attachment.html>


More information about the erlang-questions mailing list