String representation in erlang

Joe Armstrong (AL/EAB) joe.armstrong@REDACTED
Tue Sep 13 16:17:29 CEST 2005




> erl
Eshell V5.3  (abort with ^G)
1> S="abcdefghijklmnopqrstuvwxyz".
"abcdefghijklmnopqrstuvwxyz"
2> size(term_to_binary(packer:packstring(S))).
42
3> size(term_to_binary(S)).
30

Anyway if you do like this:

	1) Write as clearly as possible
	2) Measure
	3) Optimise if necessary

You will probably never ever need to get to step 3 and compress your strings to save space. You might need to compresses them on disk to save space but then you need
real compression, like LZSS...

Anyway you might like to ask "what is a big string" - for me
big starts at  1Meg/characters - below this optimisations aren't worth bothering
about.

I have written many programs that manipulate book length texts as strings
and had no space worries.

<< aside a string to big int function and the inverse *is* useful
   since for moderatly small files (say < 1 K) you can convert them to a big int
   N and then rsa encode them with N ^ A mod B :-) - this works beautifully
   but is a wee bit slow - I am not joking here - I have done this >>


/Joe




> -----Original Message-----
> From: owner-erlang-questions@REDACTED
> [mailto:owner-erlang-questions@REDACTED]On Behalf Of Thinus Pollard
> Sent: den 13 september 2005 12:11
> To: erlang-questions@REDACTED; danie@REDACTED
> Subject: String representation in erlang
> 
> 
> Hi there
> 
> According to the Erlang efficiency guide a string is 
> internally represented as 
> a list of integers, thus consuming 2 words (8 bytes on a 
> 32bit platform) of 
> memory *per* character.
> 
> The attached code is an attempt at reducing the memory 
> footprint of strings in 
> erlang (passing between functions etc etc).
> 
> The basic idea is to pack a string into n byte sized integers 
> and unpacking 
> them on the other side. The text file called compare.txt also 
> shows the 
> memory needed to represent strings in normal erlang strings 
> and this string 
> packing.
> 
> Normal erlang strings are 2 words/character. The packed 
> representation uses 1 
> word of memory per list element plus n bytes/wordsize per 
> integer element, 
> where every integer element contain n characters.
> 
> Deficiencies:
> If the string length is not divisible by n, space is wasted 
> (the string gets 
> padded with zeros). 
> 
> Usage:
> Pick your the integer representation length.
> packstring/1 takes a string returns a list of n byte integers
> unpackstring/1 takes an integer representation and returns a string.
> 
> There is a simple test suite in test/0.
> 
> If anyone can improve upon this code, please do. If this was 
> an exercise in 
> futility, please let my know, I've only been programming 
> erlang for 2 weeks 
> and still need to learn all the gotchas ;)
> 
> -- 
> 
> Thinus Pollard
> 
> Mobile: +27 72 075 2751
> 



More information about the erlang-questions mailing list