[erlang-questions] source file encoding

Justus mapandfold@REDACTED
Fri Dec 30 03:47:42 CET 2011


IMHO, encoding is an inner property of a file. Ideally, a reader
detects the encoding properly. But in practice, it's better to give
the reader some hints, either inside or outside of the file.

I prefer the inside one, because the source file is then perfectly
self-described.

The OUTSIDE way also has the following problem that you already know:
"All included files will be read and parsed using same encoding
Encoding, which may produce unexpected results, if they are in fact
encoded using different encodings."

On Thu, Dec 29, 2011 at 6:15 PM, Witold Baryluk
<baryluk@REDACTED> wrote:
>
> I belive this is bad solution. ENcoding should be defined OUTSIDE of the
> file itself, because if not it creates chicken-egg proble, how to open
> and read file if it is not yet know its encoding? Same problem is is HTML,
> when charset can be defined in head section. It is much better to just
> define it in HTTP header using charset= property of Content-Type.
> It is also much faster (doesn't need any buffering, and other tricks).
>
> It may work in case of UTF-8, because it should be relativly safe
> to work on ASCII files as UTF-8 files, and then switch to UTF-8
> after such declaration.
>
> A charset should be defined outside, manually, or using some form of
> file-system metadata (extended atributes), or using BOM marker.
>
> http://en.wikipedia.org/wiki/Byte_order_mark
>
> My solution is just SINGLE line of change in epp.erl: around line 252
>
> server(Pid, Name, Path, Pdm, Encoding0) ->
>     process_flag(trap_exit, true),
> -    case file:open(Name, [read]) of
> +    case file:open(Name, [read, {encoding, utf8}]) of
>
>
> Rest of patch is actually configurability noise. It is pretty safe
> to do this change, because most of existing erlang files are written using
> ASCII subset of latin1 encoding, so are safe to read also as UTF-8.
>
>
> Regards,
> Witek
>
> --
> Witold Baryluk
> JID: witold.baryluk // jabster.pl



-- 
Best Regards,
Justus



More information about the erlang-questions mailing list