records generated from UBF

Erik Pearson erik@REDACTED
Mon Apr 21 19:50:30 CEST 2003


Hi Shawn,

Thanks for the explanations!

I think we are coming at the problem from different sides, though. I'm 
trying to see this from the "Erlang talking to the outside world" point 
of view, which in turn implies "the outside world talking to the 
outside world". In this approach, Erlang conventions don't really 
matter. What matters is the internal consistency of UBF, and its 
ability to convey any type of information. At least that is what _I_ 
want out of it :)

 From this point of view, my comments are below:

On Monday, April 21, 2003, at 08:52 AM, Shawn Pearce wrote:

> Erik Pearson <erik@REDACTED> wrote:
>> Thanks, I didn't notice the comma sneaking in there as whitespace.
>> However, I am still somewhat concerned:
>>
>> - how do you disambiguate {"Hi" 'greeting', "world" 'planet'}? It 
>> seems
>> that this could be either {Obj1 Obj2 Obj3 Obj4} or {Obj1 Obj3 Obj4} or
>> {Obj1 Obj2}.
>
> Its clearly {Obj1 Obj2 Obj3 Obj4}.  ' ' (space) and ',' (comma) are 
> both
> whitespace characters to UBF.  Thus its 
> {"Hi"'greeting'"world"'planet'},
> which is a 4 object tuple.  Now UBF(B) may define that the atom
> 'greeting' represents a type which is called greeting, however a
> type can only be made of up simple types.  Thus in order to create
> a UBF(B) type of 'greeting' it would be necessary to encode as
>
> {"Hi" {'greeting', "world"}, 'planet'}
>
> which if you look at Erlang records is exactly how a greeting record
> would be encoded.  (Tuple holding the tuple name as the first term
> and the values as the rest. So clearly the example you give cannot
> even be a UBF(B) type.

The problem is that the object

   "Hi" 'greeting' $

is a perfectly valid UBF value -- a semantically tagged string. I 
believe that any base object (string, constant, integer, binary) can be 
extended by adding a semantic tag after it. Thus anywhere that an 
object can appear, you should be able to add a constant value after it 
and have it recognized as a semantic tag. My problem is that within a 
structure this doesn't seem to work unless you have a specific 
byte-code for separating objects (perhaps better thought of as a way of 
terminating an object).

That was the point of the example. I hope!

>
>> I must be missing something crucial here.
>>
>> I have a new concern too -- how to represent null (i.e. missing, 
>> blank)
>> values?
>
> Use an atom.  In Erlang (which UBF has borrowed a lot from), null
> is generally defined (by convention) to be the atom 'undefined'.
> Thus if you want a null string or a null integer, the UDF(B) must
> allow either an Int or the atom 'undefined'.  Otherwise you cannot
> define a null Int.  (Same for the other types.)

Yeah, I guess one can say that typeless nulls (ala lisp nil or java 
null) can be represented, but that typed nulls would be in the realm of 
UBF(B). Either you'd need to use more structure (e.g. 'undefined' 
'int32' $), or complex types in order for UBF(B) to know what you are 
talking about. I guess that when you need this level of interpretation, 
you should be dealing with UBF(B) anyway.

However, I do think that it is a relatively important distinction that 
the UBF(A) encoding does not allow for the encoding of a null value 
into its objects (with type).

>
>> For strings it would be
>> "" $
>>
>> for constants
>> '' $
>>
>> for binary
>> 0 ~~ $
>>
>> (or
>> ~0~~ $
>> if the tilde prefix is preferable),
>>
>> but for integers it would be
>>
>>  $
>>
>> .. that is, a blank. But this raises a few issues:
>>
>> 1. How do you carry the type information for a blank integer?
>>
>> 2. Is there a valid concept of typeless null? That is not only is the
>> value blank, but it also carries not type information?
>
> Yes, the atom 'undefined' does this by convention, but this of course
> means you cannot use the atom 'undefined' to mean a non-typless-null
> value.  :)  It sounds ugly, but in practice it works better than say
> the typeless SQL null.

Actually, I think that just an empty object in UBF(A) is a typeless 
null. UBF(A) doesn't mention it, but it sesms like a reasonable 
interpretation  that an empty object is a null object.

  $
is a null object -- no data, no type.

however, there can't be any tagged null because
  'tag' $

would be interpreted as the constant 'tag'.

What do the empty structure and list mean?
{ } $
# & $


>
>> 3. It seems impossible to represent a null tagged integer -- it would
>> be indistinguishable from a plain constant.
>
> Erlang has no concept of a null integer, it uses 'undefined' (by
> convention).  You could define the atom 'null' or 'nullint' or 'nan'
> to mean the same thing(s), but its your code that must define the
> meaning of these atoms.  I much prefer the Erlang system of doing it
> by convention, rather than forcing it on you.

Again, I'm just grappling with how to encode objects at the A level -- 
what can and can't be done -- using just the UBF spec as the guideline.

>
>> Do you or does anyone else have experience with confronting these
>> issues?
>>
>> I can see that UBF(B) might help with some of these issues, since the
>> usage of values within a particular context would imply their type.
>> However, at this point I'm just concerned with getting plain old 
>> UBF(A).
>>
>> Also, one of the things I'm trying to keep an eye on is places where
>> the spec introduces more work for the parser-builder. In my case, I
>> would like to use this with multi-vendor data and message exchange. In
>> these cases, having a clear, easy-to-implement spec is very important.
>> People need to implement this stuff in all sots of weird languages and
>> often in a hurry!
>
> I'm not quite sure what you mean here.  Almost every language/system
> I have worked in this far has a concept of a linked list, a concept
> of an array, of a constant (atom), and numerics/strings/binaries.  Thus
> you should be able to quickly read the UBF(A) spec and see the mapping
> to your language environment, and just map it.  Perhaps the UBF(A) spec
> just isn't clear on how one would handle null values?

The problem has more to do with making sure that the mapping is as 
strightforward as possible.  Also, I want as few of these questions as 
possible coming my way!

Some issues that I've come across so far are:

- encoding for binaries is made a little more complex than (I think) 
necessary, since there is no initial bytecode to signal the type. The 
parser has to keep going past the integer to determine if the object is 
an integer or a binary.

- encoding of structures seems ambiguous with regard to separating 
objects

- encoding of lists is non-intuitive in two senses --
   - for those who don't deal with functional languages, the idea of a 
list in reverse order may not make a lot of sense
   - there is no explicit list terminator, unlike for structures.
   (both of these seem to not fare well in the "human readable" goal 
stated in the spec.)

Now, I'm not saying that these are insurmountable problems, perhaps 
they are just issues that I'd like to understand better.


>
> One thing I like about UBF(A) is that it should only take a few days
> to write a parser, while XML is estimated to take about 3 weeks.  Thus
> most developers should be able to bring a UBF(A) parser online in a 
> very
> short time, perhaps about the same amount of effort required to just
> write a simple SAX or DOM parser (which uses an existing parser).  :)

Yes, I really love the idea and implementation of UBF(*). I wrote most 
of a UBF(A) parser as a passenger driving to and from a mini-vacation 
(I conveniently left my wallet at home and couldn't drive...) I just 
rewrote the parser last night to try a different approach. Even the 
harder stuff doesn't take _that_long to do.

Erik.


>
> -- 
> Shawn.
>
>   When smashing monuments, save the pedstals -- they always come in 
> handy.
>   		-- Stanislaw J. Lem, "Unkempt Thoughts"
>
Erik Pearson
Adaptations
desk +1 510 527 5437
cell +1 510 517 3122




More information about the erlang-questions mailing list