[erlang-questions] file:read_file an UTF-8 encoded file

Ivan Uemlianin ivan@REDACTED
Thu Jun 26 18:20:26 CEST 2014


Could you use

   {ok, Pid} = file:open(F,[read, {encoding, utf8}]).

Pid is an IoDevice, like a cursor.  Then

   L = io:get_line(Pid, x).

L is an ordinary erlang "string", i.e., a list of integers.

You could have a maybe_strip_bom/1 function, eg for data in utf-8:

   maybe_strip_bom([65279|T]) -> T;
   maybe_strip_bom(T)         -> T.

Ivan


On 26/06/2014 16:58, Camille Troillard wrote:
> Hi list,
>
> This is a simple question, yet I haven’t found the right answer.
> Using Erlang/OTP 16B03-2:
>
> I read a file using file:read_file(“my_utf8_file.txt”).
>
> The result binary contains the 3 BOM bytes. I was not expecting that. Since this is such a high-level call, isn’t file:read_file/1 supposed to get rid of the byte order mark?
>
> So, how do you professional Erlang users read the contents of a UTF-8 encoded file on Erlang 16B03?
>
>
> All the bast,
> Cam
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>

-- 
============================================================
Ivan A. Uemlianin PhD
Llaisdy
Speech Technology Research and Development

                     ivan@REDACTED
                      www.llaisdy.com
                          llaisdy.wordpress.com
               github.com/llaisdy
                      www.linkedin.com/in/ivanuemlianin

                         festina lente
============================================================



More information about the erlang-questions mailing list