[erlang-questions] file:read_file an UTF-8 encoded file

Camille Troillard lists@REDACTED
Thu Jun 26 21:46:19 CEST 2014


Here is what came to:

% Read a file as a binary, knowing the contents is UTF-8 encoded text.
read_utf8_file(Name) ->
    {ok, Binary} = file:read_file(Name),
    {_, Skip} = unicode:bom_to_encoding(Binary),
    <<_:Skip/unit:8, Contents/binary>> = Binary,
    Contents.

Thanks to all for your advices.



On 26 Jun 2014, at 18:23, Loïc Hoguin <essen@REDACTED> wrote:

> file:read_file/1 reads a file as a sequence of bytes. It doesn't know what kind of file it is, or how to interpret it. It's not file:read_text_file/1 or similar, it's just read_file. It's not high level at all, it's just a convenient shortcut.
> 
> On 06/26/2014 05:58 PM, Camille Troillard wrote:
>> Hi list,
>> 
>> This is a simple question, yet I haven’t found the right answer.
>> Using Erlang/OTP 16B03-2:
>> 
>> I read a file using file:read_file(“my_utf8_file.txt”).
>> 
>> The result binary contains the 3 BOM bytes. I was not expecting that. Since this is such a high-level call, isn’t file:read_file/1 supposed to get rid of the byte order mark?
>> 
>> So, how do you professional Erlang users read the contents of a UTF-8 encoded file on Erlang 16B03?
>> 
>> 
>> All the bast,
>> Cam
>> 
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>> 
> 
> -- 
> Loïc Hoguin
> http://ninenines.eu




More information about the erlang-questions mailing list