[erlang-questions] file:read_file an UTF-8 encoded file

Ivan Uemlianin ivan@REDACTED
Thu Jun 26 18:20:26 CEST 2014

Could you use

   {ok, Pid} = file:open(F,[read, {encoding, utf8}]).

Pid is an IoDevice, like a cursor.  Then

   L = io:get_line(Pid, x).

L is an ordinary erlang "string", i.e., a list of integers.

You could have a maybe_strip_bom/1 function, eg for data in utf-8:

   maybe_strip_bom([65279|T]) -> T;
   maybe_strip_bom(T)         -> T.


On 26/06/2014 16:58, Camille Troillard wrote:
> Hi list,
> This is a simple question, yet I haven’t found the right answer.
> Using Erlang/OTP 16B03-2:
> I read a file using file:read_file(“my_utf8_file.txt”).
> The result binary contains the 3 BOM bytes. I was not expecting that. Since this is such a high-level call, isn’t file:read_file/1 supposed to get rid of the byte order mark?
> So, how do you professional Erlang users read the contents of a UTF-8 encoded file on Erlang 16B03?
> All the bast,
> Cam
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

Ivan A. Uemlianin PhD
Speech Technology Research and Development


                         festina lente

More information about the erlang-questions mailing list