sext library - new encoding

Ulf Wiger ulf.wiger@REDACTED
Mon Feb 21 20:01:56 CET 2011


I added a variant of base32 encoding to the sext sortable serialization library.

http://github.com/esl/sext

The reason was to have an encoding that can be used in file names without great difficulty.

Example:

Eshell V5.8.1  (abort with ^G)
1> sext:encode_sb32(dict:new()).
<<"200000091IP5KR3N8040K0000000K00000G0K00000G0K0000080K00002G0K00001G100000081200H008G04802401200H008G04802401200H008G"...>>
2> sext:decode_sb32(v(1)).
{dict,0,16,16,8,80,48,
      {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
      {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}

Obviously, the sorting properties are preserved. To achieve this, I had to change the alphabet a bit, so it is not *actually* base32.

There is some blowup:

X = dict:new(),
term_to_binary(X):  60 bytes
sext:encode(X): 121 bytes
sext:encode_sb32(X): 200 bytes

OTOH, if used for encoding "key"-style terms, sizes should still be manageable.

BR,
Ulf W

PS Why not base64 instead? Because I ran into some trouble selecting good edge and pad symbols while still being file system friendly. I don't see why it couldn't be added later if it's deemed important, and I have more time. :)

Ulf Wiger, CTO, Erlang Solutions, Ltd.
http://erlang-solutions.com





More information about the erlang-questions mailing list