[erlang-questions] file:read_file an UTF-8 encoded file
Ivan Uemlianin
ivan@REDACTED
Thu Jun 26 18:20:26 CEST 2014
Could you use
{ok, Pid} = file:open(F,[read, {encoding, utf8}]).
Pid is an IoDevice, like a cursor. Then
L = io:get_line(Pid, x).
L is an ordinary erlang "string", i.e., a list of integers.
You could have a maybe_strip_bom/1 function, eg for data in utf-8:
maybe_strip_bom([65279|T]) -> T;
maybe_strip_bom(T) -> T.
Ivan
On 26/06/2014 16:58, Camille Troillard wrote:
> Hi list,
>
> This is a simple question, yet I haven’t found the right answer.
> Using Erlang/OTP 16B03-2:
>
> I read a file using file:read_file(“my_utf8_file.txt”).
>
> The result binary contains the 3 BOM bytes. I was not expecting that. Since this is such a high-level call, isn’t file:read_file/1 supposed to get rid of the byte order mark?
>
> So, how do you professional Erlang users read the contents of a UTF-8 encoded file on Erlang 16B03?
>
>
> All the bast,
> Cam
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
--
============================================================
Ivan A. Uemlianin PhD
Llaisdy
Speech Technology Research and Development
ivan@REDACTED
www.llaisdy.com
llaisdy.wordpress.com
github.com/llaisdy
www.linkedin.com/in/ivanuemlianin
festina lente
============================================================
More information about the erlang-questions
mailing list