Programming components

Joe Armstrong (AL/EAB) <>
Mon Aug 29 10:37:17 CEST 2005


David Hopwood wrote

> Sent: den 25 augusti 2005 18:13
> To: 
> Subject: Re: Programming components
> 
> 
> Joe Armstrong (AL/EAB) wrote:
> > Erlang has no strings - ie there is NOT a string data type.
> > 
> > "abc" is NOT a string but shorthand for [97,98,99]
> > 
> > Now suppose I see [97, 98, 99] on some I/O channel
> > what does this mean? is it to be interpreted as a list of 
> three integers
> > or as a string? - nobody knows.
> > 
> > Most Erlang programmers choose to represent strings as 
> "lists of integers" and
> > write "abc" in their programs when they wish to create a string.
> > 
> > I am suggesting they should use <<"abc">> (a binary)
> > 
> > This all has to do with making a simple type system that 
> mays Erlang terms onto
> > XML data structures in an isomorphic manner.
> 
> How does using binaries solve this problem? A binary is an 
> octet sequence;
> "strings" are character sequences (usually represented as 
> code unit sequences).
> 

This interpretation takes place *only* at the component interface

I'll make an example that (hopefully) clarifies this point

Let's take a simple example of a server

Let's suppose I want to make an "FTP" server.

With exactly two commands, which are specified like this:
	
	Type result = {"ok", str()}|{"error","enofile"}

	{"get", str()} => result()
	{"mget", [str()]} => [result()]

	A => B means "if you send me an instance of type A I'll reply with an
      instance of type B"

get - gets one file
mget gets multiple files

1) The server receives (say)

	Conetent-Length:1234
	Content-Type: text/erlBinary
	[1234 bytes]

Because the content type is erlBinary the server knows that this 1234 bytes
is an erlang term encoded with term_to_binary

2) The server does binary_to_term on the data it has received

    The reconstructed term is
 
	{<<"get">>, <<"somefilename">>}

    or
	{<<"mget">>, [<<"file1">>, <<"file2">>]}

    The server checks that these are instances of {"get", str()} or
{"mget", [str()]}

    Anything else is illegal - in the reconstructed term lists mean lists
and NOT strings. Binaries are always interpreted as strings.

    These rules apply only at the interface.

3) The server MUST reply with

	{<<"ok">>,<<"file contents">> or {"error", "enofile"}

   to the first request, or, with something like

	[{<<"ok">>,<<"contents of file2">>}, {<<"error">>, ...
    
   in the second case.

    All of this can be checked at the interface, given the specification we can
check these forms to see if they are allowed by the interface desription.

    Saying that "strings are represented as binaries" means "an instance of the
str() type is represented as a binary in the interface"

	<aside> - with an XML interpretation the story goes like this:

	we receive

	Content-Type: text/myXML
	Content-Length: 1234
	<tuple>
	  <str>get</str>
	  <str>filename</str>
	</tuple>

      This must be parsed as: {<<"get">>,<<"filename">>}
 
	So that after input processing we get the same as if we had performed a
binary data transfer and done binary_to_term on the data.
    
	Then we just proceed as before

	[[ Note we have to be careful with <str>XXXX</str>
	   The XXXX should be encoded to remove any "<" characters ...]]

      </aside>




> A binary can be used to *represent* a string in a charset 
> with 8-bit code
> units, but you also need some way to distinguish such a thing 
> from "raw"
> octet sequences, and possibly to distinguish representations 
> using different
> charsets if more than one charset is supported.
> 

This is a different problem :-)

> (I'm using the W3C character encoding model here, which is 
> the model that
> XML uses. See <http://www.w3.org/TR/charmod/>.)
> 
> -- 
> David Hopwood <>
> 


	/Joe



More information about the erlang-questions mailing list