[erlang-questions] source file encoding

Witold Baryluk baryluk@REDACTED
Thu Dec 29 11:15:17 CET 2011


On 12-29 16:17, Justus wrote:
> Great.
> 
> But have you ever considered the Python way?
> 
> http://docs.python.org/tutorial/interpreter.html#source-code-encoding

I belive this is bad solution. ENcoding should be defined OUTSIDE of the
file itself, because if not it creates chicken-egg proble, how to open
and read file if it is not yet know its encoding? Same problem is is HTML,
when charset can be defined in head section. It is much better to just
define it in HTTP header using charset= property of Content-Type.
It is also much faster (doesn't need any buffering, and other tricks).

It may work in case of UTF-8, because it should be relativly safe
to work on ASCII files as UTF-8 files, and then switch to UTF-8
after such declaration.

A charset should be defined outside, manually, or using some form of
file-system metadata (extended atributes), or using BOM marker.

http://en.wikipedia.org/wiki/Byte_order_mark

My solution is just SINGLE line of change in epp.erl: around line 252

server(Pid, Name, Path, Pdm, Encoding0) ->
     process_flag(trap_exit, true),
-    case file:open(Name, [read]) of
+    case file:open(Name, [read, {encoding, utf8}]) of


Rest of patch is actually configurability noise. It is pretty safe
to do this change, because most of existing erlang files are written using
ASCII subset of latin1 encoding, so are safe to read also as UTF-8.


Regards,
Witek

-- 
Witold Baryluk
JID: witold.baryluk // jabster.pl



More information about the erlang-questions mailing list