String representation in erlang

ke.han ke.han@REDACTED
Tue Sep 13 19:46:53 CEST 2005


Yes, 8 bytes for a single character is hideously too long.  You can 
encode any character on the planet with only 4 bytes.  You can use 
variable byte encoding to further reduce overall length for complex 
languages.  I would like to see these types of things corrected one day. 
  Until then, I'll settle for understanding binary strings better. ;-)

I understand the basic form: <<"abcde">>.  What I don't understand is if 
this works with multi-byte (in particular variable-byte) encoding 
schemes (utf-8, 16, etc...).  Does the binary string only work for 8 bit 
characters?  ...any examples of other use?

I think the kind of code example missing is something with Chinese (or 
pick your complex language) and English characters in the same "string", 
allow a user to enter the string into a yaws web form and stores the 
"string" in erlang and then redisplays the string.  For binary strings, 
I don't even know where to start with this simple task.
thanks, ke han

hinus Pollard wrote:
> Hi there
> 
> According to the Erlang efficiency guide a string is internally represented as 
> a list of integers, thus consuming 2 words (8 bytes on a 32bit platform) of 
> memory *per* character.
> 
> The attached code is an attempt at reducing the memory footprint of strings in 
> erlang (passing between functions etc etc).
> 
> The basic idea is to pack a string into n byte sized integers and unpacking 
> them on the other side. The text file called compare.txt also shows the 
> memory needed to represent strings in normal erlang strings and this string 
> packing.
> 
> Normal erlang strings are 2 words/character. The packed representation uses 1 
> word of memory per list element plus n bytes/wordsize per integer element, 
> where every integer element contain n characters.
> 
> Deficiencies:
> If the string length is not divisible by n, space is wasted (the string gets 
> padded with zeros). 
> 
> Usage:
> Pick your the integer representation length.
> packstring/1 takes a string returns a list of n byte integers
> unpackstring/1 takes an integer representation and returns a string.
> 
> There is a simple test suite in test/0.
> 
> If anyone can improve upon this code, please do. If this was an exercise in 
> futility, please let my know, I've only been programming erlang for 2 weeks 
> and still need to learn all the gotchas ;)
> 




More information about the erlang-questions mailing list