String representation in erlang

Per Gustafsson per.gustafsson@REDACTED
Tue Sep 13 16:18:20 CEST 2005


Hi

I think that there is a simpler solution to this problem:

packstring(String) ->
    list_to_binary(String).

unpackstring(Binary) ->
    binary_to_list(Binary).

this will change the list representation of a string into binary format 
which will require 2 * WordSize + number of chars of String bytes if the 
string is less than 64 chars and 5 * Wordsize + number of chars of 
string bytes if the string is longer than 64 chars.

In addition to this the larger strings will only be passed as a 
reference if it is sent in a message.


Per Gustafsson

Thinus Pollard wrote:

>Hi there
>
>According to the Erlang efficiency guide a string is internally represented as 
>a list of integers, thus consuming 2 words (8 bytes on a 32bit platform) of 
>memory *per* character.
>
>The attached code is an attempt at reducing the memory footprint of strings in 
>erlang (passing between functions etc etc).
>
>The basic idea is to pack a string into n byte sized integers and unpacking 
>them on the other side. The text file called compare.txt also shows the 
>memory needed to represent strings in normal erlang strings and this string 
>packing.
>
>Normal erlang strings are 2 words/character. The packed representation uses 1 
>word of memory per list element plus n bytes/wordsize per integer element, 
>where every integer element contain n characters.
>
>Deficiencies:
>If the string length is not divisible by n, space is wasted (the string gets 
>padded with zeros). 
>
>Usage:
>Pick your the integer representation length.
>packstring/1 takes a string returns a list of n byte integers
>unpackstring/1 takes an integer representation and returns a string.
>
>There is a simple test suite in test/0.
>
>If anyone can improve upon this code, please do. If this was an exercise in 
>futility, please let my know, I've only been programming erlang for 2 weeks 
>and still need to learn all the gotchas ;)
>
>  
>
>------------------------------------------------------------------------
>
>Comparison of erlang strings vs packed strings. Left hand column is the
>string length, second column is the bytes erlang uses to represent that
>string, third - ninth column is the bytes needed to represent the packed
>string. pack[n] refers to a packed string using n byte integers to store
>the string.
>
>Chars	erlang	pack4	pack8	pack12	pack16	pack20	pack24	pack32
>	bytes	bytes	bytes	bytes	bytes	bytes	bytes	bytes
>0	0	0	0	0	0	0	0	0
>1	8	8	12	16	20	24	28	36
>2	16	8	12	16	20	24	28	36
>3	24	8	12	16	20	24	28	36
>4	32	8	12	16	20	24	28	36
>5	40	16	12	16	20	24	28	36
>6	48	16	12	16	20	24	28	36
>7	56	16	12	16	20	24	28	36
>8	64	16	12	16	20	24	28	36
>9	72	24	24	16	20	24	28	36
>10	80	24	24	16	20	24	28	36
>11	88	24	24	16	20	24	28	36
>12	96	24	24	16	20	24	28	36
>13	104	32	24	32	20	24	28	36
>14	112	32	24	32	20	24	28	36
>15	120	32	24	32	20	24	28	36
>16	128	32	24	32	20	24	28	36
>17	136	40	36	32	40	24	28	36
>18	144	40	36	32	40	24	28	36
>19	152	40	36	32	40	24	28	36
>20	160	40	36	32	40	24	28	36
>21	168	48	36	32	40	48	28	36
>22	176	48	36	32	40	48	28	36
>23	184	48	36	32	40	48	28	36
>24	192	48	36	32	40	48	28	36
>25	200	56	48	48	40	48	56	36
>26	208	56	48	48	40	48	56	36
>27	216	56	48	48	40	48	56	36
>28	224	56	48	48	40	48	56	36
>29	232	64	48	48	40	48	56	36
>30	240	64	48	48	40	48	56	36
>31	248	64	48	48	40	48	56	36
>32	256	64	48	48	40	48	56	36
>33	264	72	60	48	60	48	56	72
>34	272	72	60	48	60	48	56	72
>35	280	72	60	48	60	48	56	72
>36	288	72	60	48	60	48	56	72
>37	296	80	60	64	60	48	56	72
>38	304	80	60	64	60	48	56	72
>39	312	80	60	64	60	48	56	72
>40	320	80	60	64	60	48	56	72
>41	328	88	72	64	60	72	56	72
>42	336	88	72	64	60	72	56	72
>43	344	88	72	64	60	72	56	72
>44	352	88	72	64	60	72	56	72
>45	360	96	72	64	60	72	56	72
>46	368	96	72	64	60	72	56	72
>47	376	96	72	64	60	72	56	72
>48	384	96	72	64	60	72	56	72
>49	392	104	84	80	80	72	84	72
>50	400	104	84	80	80	72	84	72
>51	408	104	84	80	80	72	84	72
>52	416	104	84	80	80	72	84	72
>53	424	112	84	80	80	72	84	72
>54	432	112	84	80	80	72	84	72
>55	440	112	84	80	80	72	84	72
>56	448	112	84	80	80	72	84	72
>57	456	120	96	80	80	72	84	72
>58	464	120	96	80	80	72	84	72
>59	472	120	96	80	80	72	84	72
>60	480	120	96	80	80	72	84	72
>61	488	128	96	96	80	96	84	72
>62	496	128	96	96	80	96	84	72
>63	504	128	96	96	80	96	84	72
>64	512	128	96	96	80	96	84	72
>65	520	136	108	96	100	96	84	108
>66	528	136	108	96	100	96	84	108
>67	536	136	108	96	100	96	84	108
>68	544	136	108	96	100	96	84	108
>69	552	144	108	96	100	96	84	108
>70	560	144	108	96	100	96	84	108
>71	568	144	108	96	100	96	84	108
>72	576	144	108	96	100	96	84	108
>73	584	152	120	112	100	96	112	108
>74	592	152	120	112	100	96	112	108
>75	600	152	120	112	100	96	112	108
>76	608	152	120	112	100	96	112	108
>77	616	160	120	112	100	96	112	108
>78	624	160	120	112	100	96	112	108
>79	632	160	120	112	100	96	112	108
>80	640	160	120	112	100	96	112	108
>81	648	168	132	112	120	120	112	108
>82	656	168	132	112	120	120	112	108
>83	664	168	132	112	120	120	112	108
>84	672	168	132	112	120	120	112	108
>85	680	176	132	128	120	120	112	108
>86	688	176	132	128	120	120	112	108
>87	696	176	132	128	120	120	112	108
>88	704	176	132	128	120	120	112	108
>89	712	184	144	128	120	120	112	108
>90	720	184	144	128	120	120	112	108
>91	728	184	144	128	120	120	112	108
>92	736	184	144	128	120	120	112	108
>93	744	192	144	128	120	120	112	108
>94	752	192	144	128	120	120	112	108
>95	760	192	144	128	120	120	112	108
>96	768	192	144	128	120	120	112	108
>97	776	200	156	144	140	120	140	144
>98	784	200	156	144	140	120	140	144
>99	792	200	156	144	140	120	140	144
>100	800	200	156	144	140	120	140	144
>  
>
>------------------------------------------------------------------------
>
>%%%-------------------------------------------------------------------
>%%% File    : packer.erl
>%%% Author  : Thinus Pollard <thinus@REDACTED>
>%%% Description : Pack erlang strings (8 bytes/char according to the
>%%%               erlang efficiency guide) into a list of integers
>%%%               (1 byte / char).
>%%%
>%%% Created : 12 Sep 2005 by Thinus Pollard <thinus@REDACTED>
>%%%-------------------------------------------------------------------
>-module(packer).
>
>%% define the size of the to use in bytes
>-define(BYTES, 32).
>
>-define(BITS, (?BYTES * 8)).
>
>%% API
>-export([packstring/1,unpackstring/1]).
>-export([test/0]).
>
>%%====================================================================
>%% API
>%%====================================================================
>%%--------------------------------------------------------------------
>%% Function: packstring/1
>%% Description: Takes Erlang String and returns a list of integers of
>%%              size BYTES containing this String.
>%%--------------------------------------------------------------------
>packstring(String) ->
>    packstring(String, []).
>
>%%--------------------------------------------------------------------
>%% Function: unpackstring/1
>%% Description: Takes List of BYTES sized integers and returns the
>%%              repesented string.
>%%--------------------------------------------------------------------
>unpackstring(List) ->
>    unpackstring(List, []).
>
>%%--------------------------------------------------------------------
>%% Function: test/0
>%% Description: Takes List of BYTES sized integers and returns the
>%%              repesented string.
>%%--------------------------------------------------------------------
>test() ->
>    test("", "Empty string"),
>    test("T", "Single character string"),
>    test("This is a String to be packed", "'Normal' sized string containing spaces"),
>    test("0123456789abcdefghijklmnopqrstuvwxyz", "'Normal' sized string without spaces"),
>    test("000000000011111111112222222222333333333344444444445555555555"
>         "6666666666777777777788888888889999999999aaaaaaaaaabbbbbbbbbb"
>         "ccccccccccddddddddddeeeeeeeeeeffffffffffgggggggggghhhhhhhhhh"
>         "iiiiiiiiiijjjjjjjjjjkkkkkkkkkkllllllllllmmmmmmmmmmnnnnnnnnnn"
>         "ooooooooooppppppppppqqqqqqqqqqrrrrrrrrrrsssssssssstttttttttt"
>         "uuuuuuuuuuvvvvvvvvvvwwwwwwwwwwxxxxxxxxxxyyyyyyyyyyzzzzzzzzzz", "Longish string without spaces").
>    
>%%====================================================================
>%% Internal functions
>%%====================================================================
>
>%%--------------------------------------------------------------------
>%% Function: packstring/2
>%% Description: Takes Erlang String and returns a list of integers of
>%%              size BYTES containing this String.
>%%--------------------------------------------------------------------
>packstring([], Res)->
>    lists:reverse(Res);
>packstring(String, Res) ->
>    case string:len(String) > ?BYTES - 1 of
>	 true -> %% at least BYTES characters left
>	    Working = string:substr(String, 1, ?BYTES),
>	    WC = list_to_binary(Working),
>	    <<WB:?BITS>> = WC,
>	    packstring(string:substr(String, ?BYTES + 1), [WB|Res]);
>	false -> %% we need to zero pad the remaining string to BYTES characters
>	    String2 = lists:append(String, lists:duplicate(?BYTES - string:len(String), 0)),
>	    packstring(String2, Res)
>    end.
>
>%%--------------------------------------------------------------------
>%% Function: unpackstring/2
>%% Description: Takes List of BYTES sized integers and returns the
>%%              repesented string.
>%%--------------------------------------------------------------------
>unpackstring([], Res) ->
>    %% drop the padded zeros (if any)
>    Fun1 = fun (X) ->
>		   X /= 0
>	   end,
>    Res1 = lists:filter(Fun1, Res),
>    lists:reverse(Res1);
>unpackstring([H|T], Res) ->
>    %% take integers 1 by 1 and decode
>    R = buildBin(H, ?BITS, []),
>    unpackstring(T, R ++ Res).
>
>%%--------------------------------------------------------------------
>%% Function: buildBin/3
>%% Description: Takes a binary, number of bits representing that binary
>%%              and a result list. Returns a list containing the binary
>%%              broken into 8bit chunks
>%%--------------------------------------------------------------------
>buildBin(_Bin, 0, Res) ->
>    lists:reverse(Res);
>buildBin(Bin, Bits, Res) ->
>    Bits2 = Bits - 8,
>    <<A:8, B:Bits2>> = <<Bin:Bits>>,
>    Res2 = Res ++ binary_to_list(<<A>>),
>    buildBin(B, Bits - 8, Res2).
>
>%%--------------------------------------------------------------------
>%% Function: test/2
>%% Description: Test suite: encodes a string, decodes it and compares the
>%%              original string with the decoded string.
>%%--------------------------------------------------------------------
>test(String, Desc) ->
>    R = packstring(String),
>    Size = (length(R) * 4) + (?BYTES * length(R)),
>    StringL = string:len(String),
>    S = unpackstring(R, []),
>    error_logger:info_msg("Test string description: ~p~n"
>			  "Original string: ~p~n"
>			  "Packing string into list of ~p byte sized integers~n"
>			  "Packed string: ~p~n"
>			  "Unpacked string: ~p~n"
>			  "Strings match: ~p~n"
>			  "-----~n"
>			  "Stats~n"
>			  "-----~n"
>			  "String length: ~p~n"
>			  "Size of erlang string (bytes): ~p~n"
>			  "Size of packed string (bytes): ~p~n", [Desc, String, ?BYTES, R, S, String == S, StringL, StringL * 8, Size]).
>  
>




More information about the erlang-questions mailing list