records generated from UBF

Erik Pearson erik@REDACTED
Tue Apr 22 18:55:08 CEST 2003


Hi Joe,

Thanks for the clarification. I think the reason it was an issue for me 
was the approach I had taken to write the parser, which was more akin 
to:

- start reading a stream, byte at a time.
- feed bytes to whitespacecomsumer()
- on the first non-whitespce bytecode , you know what the upcoming 
object is
- now feed bytes to a function which knows how to collect that object, 
getint(), getstring(), etc.
- that function: collects the bytes, when it sees the object 
termination condition, turns them into native object, returns that 
object and the next function to use.

I've since rewritten the parser to rely on generalized stream-consuming 
functions which respond to different types of byte collection and 
detection requests (collect-until, collect-while, etc.). It is now 
utilizes something of a little stack (actually an associative list to 
store facts about each object as they are discovered - value, tag, 
termination bytecode). It is implemented in Common Lisp, so it is a 
little weird.

Anyway, not that you wanted to know all of _that_ ...

On Tuesday, April 22, 2003, at 01:25 AM, Joe Armstrong wrote:

> On Sat, 19 Apr 2003, Erik Pearson wrote:
>
>> After a little more fiddling around, I have a practical question about
>> UBF(A):
>>
>> The representation of a "binary" type is:
>>
>>    Int ~blahblah~
>>
>> Where Int is an integer, and ~ is the delimiter for the binary data,
>> and blahblah is a stream/array of bytes Int long (and the space is 0 
>> or
>> more spaces)
>>
>> In order resolve the ambiguity with regular Int , the otherwise simple
>> parsing becomes more complicated. It seems to me, unnecessarily
>> complex. That is, the parser first determines that there is an int, 
>> and
>> then it has to keep going to determine whether it really is an int or
>> whether it will become a binary.
>>
>> What if the binary representation was something like
>>
>>    ~ Int ~blahblah~
>>
>> Then the initial ~ would be a nice and and simple flag for the parser
>> that binary value was coming up. This would be consistent with all of
>> the other simple types, for which the initial byte serves as a flag 
>> for
>> the type (" for string, ' for contants, 0-9 or - for number (oops, or
>> binary)...)
>>
>
> IMHO this would be more difficult to parse :-)

I guess simplicity is in the eye of the beholder! Seriously, though, I 
guess there are multiple issues to be juggled here...

- merit of the design (elegance, extensibility, etc.)
- efficiency (size of objects, ease of parsing)
- simplicity (amount of code to implement, number of words in the spec.)
- ease of implementation in various languages (since this is a "glue" 
between disparate systems, this is important.)

Of course, what really grabbed me about UBF was that you (Joe) attacked 
some of these issues head-on, without the usual XML b.s. (pardon me).

I think these issues become less relevant to the end user if they don't 
have to implement their own clients and servers -- if there are free 
clients (and possibly servers) for multiple platforms (java, tcl, c, 
python, perl, vb, c#, com, scheme, lisp :) etc.) as hinted in the UBF 
materials, then implementation issues are much lower in importance. 
With free clients on your platform and language of choice, and a free 
(and free) Erlang server, I guess no-one could complain about anything!



Erik.

>
>   You should  think of UBF(A)  as a byte  code program designed  to be
> executed by a push down autonoma.  As soon as you recognize *anything*
> you just push it onto the stack.
>
> So to parse:
>
> 	123 ~.....~
>
> The parser proceeds as follows:
>
>
> 1) Recognize that "1" is the start of an integer
> 2) go get the integer
> 3) Blank terminates the integer so push the recognized integer
>    "123" onto the stack
> 4) Recognize ~ - this means "here comes a memory buffer"
>    the length will be on the top of the stack
>    Pop the stack
> 5) collect 123 bytes into a memory buffer
> 6) push the memory buffer onto the stack
> 7) collect a ~
>
> Step 7 is (or course) not necessary.
>
> UFB(A) has a few "features" to make it not only easy to parse
> but easy to read and write - Thus the trailing ~ is just there to make
> it easier to read.
>
> <aside>
>
> In UBF(A)
>
> 	Opcode 123 means start a new struct
> 	       125 means end of struct
> 	       44  means element separator within a struct
>
> So if you hit a byte 123 on input you start collecting a struct etc.
>
> Coincidently 123, 125 and 44 are the ASCII byte codes for "{", "}" and 
> ","
>
> (but then again perhaps this was by design rather than accident :-)
>
>
> </aside>
>
>> Finally, just a quick correction to the spec at
>>
>> http://www.sics.se/~joe/ubf/site/ubfa.html
>>
>> I believe the structure should be defined with comma separators rather
>> than just whitespace
>>
>>      { Obj1, Obj2, ..., Objn }
>>
>> rather than
>>
>>      { Obj1 Obj2 ... Objn }
>>
>> The examples show the comma separators.
>
> Again this is just to make it easier to read - commas and white space
> mean the same.

(But above didn't you say that comma is the bytecode for element 
separation; if it is also the same as whitespace how can it separate. 
Perhaps I need to enlarge the font on my monitor...)

I don't know if you saw my other notes, but the problem I was having is 
the separation of objects within the structure. At that time I did not 
realize that the bytecode for delimiting tags was the backquote -- I 
thought it was single quote and thus the tag was just a constant. In 
that case, one would have a problem keeping track of objects without an 
explicit separator. With that confusion removed, that particular issue 
goes away.

Another issue is that without an object separator (or the ? "null" 
bytecode you suggested) you can't include missing/null objects because 
it will just look like more whitespace (although you can include blank 
strings, constants, etc. -- just not untagged integers)

Erik.


>
> /Joe
>
>
>
>> Thanks,
>>
>> Erik.
>>
>>
>> On Wednesday, April 16, 2003, at 09:18 AM, Erik Pearson wrote:
>>
>>> Thanks Ulf,
>>>
>>> I hadn't actually looked for the java stuff in the download, since it
>>> the java section of the site says that it is not done yet! Well, 
>>> there
>>> it is!
>>>
>>> It works fine so far.
>>>
>>> FWIW, this is probably what how I'll attempt to use it first -- What
>>> I'm doing is integrating it into my web server scripting language (a
>>> very close offshoot of Tcl, from the Jacl tcl implementation, that 
>>> is,
>>> Tcl implemented in Java). I'm starting with just UBF(A), since for my
>>> purposes gettting two or more disparate network services (web or
>>> other) to exchange information is the first goal. The first bit of
>>> work will be to implement a pleasant java/tcl interface so that 
>>> UBF(A)
>>> can be easily scripted on the web server. Since UBF(A) is
>>> straightforward, that part should be pretty easy, and is mostly
>>> working now. After that, UBF(B)...
>>>
>>> On the other side of the equation are several other related network
>>> services or applications which  are variously implemented (currently)
>>> in Tcl and CL. Now, the discovery of UBF has prompted me to rethink
>>> the usage of Erlang for these same services. (I had been using 
>>> Erlang,
>>> but issues with mnesia caused me to put it aside for a while...)
>>>
>>> BTW, the very practical and straightforward nature of UBF fits right
>>> into why Erlang has always been so appealing (to me.)
>>>
>>> I'll check back later (after my very short vacation) in if I get some
>>> interesting results.
>>>
>>> Thanks,
>>>
>>> Erik.
>>>
>>>
>>> On Tuesday, April 15, 2003, at 01:43 AM, Ulf Wiger wrote:
>>>
>>>> On Tue, 15 Apr 2003, Erik Pearson wrote:
>>>>
>>>>> I'd like to give UBF a try. It looks really great, and
>>>>> thanks for your addition to it.
>>>>
>>>> Good. (:
>>>>
>>>> Joe is, I believe, in Iceland this week, undoubtedly
>>>> spending quality time trying out all the hot baths there.
>>>> Let's see how he wants to play it when he gets back.
>>>> Otherwise, you can certainly download whatever he has made
>>>> available on his website, and I can send you my modified
>>>> version, with absolutely no guarantees. ;-)
>>>>
>>>>
>>>>> What would be great, though, is if someone can share a Java
>>>>> implementation (or any other implementation out there, e.g.
>>>>> Tcl). I need it to glue together a Java servlet to Erlang
>>>>> (or whatever).
>>>>>
>>>>> From the UBF paper, it appears that a Java implementation
>>>>> was largely completed. It would be great to either use that
>>>>> code, or start with it and complete the implementation far
>>>>> enough to get something working.
>>>>
>>>> There are two Java implementations that I know of. Luke's
>>>> Java client that's included in Joe's care package, and Jon
>>>> Åkergården's Athletics client prototype. Jon's Java code can
>>>> be downloaded from http://www.it.kth.se/~e98_jak/
>>>> However, Jon is actively working on the code right now, so
>>>> the downloadable version is almost certainly outdated.
>>>>
>>>>> From a quick glance (and be warned: I've yet to write even
>>>> Hello World in Java), Luke's UBF decoder seems to lack
>>>> caching support(*), and Jon's decoder lacks support for
>>>> binaries; I'm also unsure about whether Jon's decoder
>>>> handlers escaping properly, but Luke's does.
>>>>
>>>> If you take the union of the two, you probably have a
>>>> complete decoder, but they don't seem totally compatible to
>>>> me.  ;)  Someone else with more knowledge of Java will
>>>> perhaps see things differently.
>>>>
>>>> /Uffe
>>>>
>>>> (*) This is only a problem if the other side uses caching.
>>>> Joe's Erlang-based UBF encoder will take advantage of
>>>> caching, so a corresponding decoder will of course have to
>>>> as soon as there is something cachable in the messages.
>>>>
>>>> -- 
>>>> Ulf Wiger, Senior Specialist,
>>>>    / / /   Architecture & Design of Carrier-Class Software
>>>>   / / /    Strategic Product & System Management
>>>>  / / /     Ericsson AB, Connectivity and Control Nodes
>>>>
>>>>
>>> Erik Pearson
>>> Adaptations
>>> desk +1 510 527 5437
>>> cell +1 510 517 3122
>>>
>>>
>> Erik Pearson
>> Adaptations
>> desk +1 510 527 5437
>> cell +1 510 517 3122
>>
>
>
Erik Pearson
Adaptations
desk +1 510 527 5437
cell +1 510 517 3122




More information about the erlang-questions mailing list