On 14/02/2008, <b class="gmail_sendername">Christian S</b> <<a href="mailto:chsu79@gmail.com">chsu79@gmail.com</a>> wrote:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br> Parsing a quoted string as a token from leex is difficult if you know<br> that the end-quote might not be included in the chunk you just fed<br> into leex, but the next chunk read from the tcp stream.</blockquote><div>
<br>No, you're wrong here, leex has been designed to handle just this case. See next bit below.<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
With fully recursive grammars I can see how one wants to let yecc<br> handle it, but a quoted string is not really recursive: You cant have<br> a quoted string inside a quoted string the same way you can have, say,<br> an if-expression inside an if-expression inside an if-expression etc<br>
in a programming language.<br> <br> Leex is a tool I would use for when I know I have some file of finite<br> length and I could do a two-pass parsing with yecc as the second<br> stage, I would not use it for tokenizing SMTP/IRC/NNTP...</blockquote>
<div><br>This is where you are wrong and have missed how leex works. From above as well.<br><br>The i/o system was designed to handle just this case where you receive your data in chunks and you need to be able to handle the collecting of the data from these chunks into the correct units in a re-entrant. In this case tokens. Or it could be lines, or records, or all the tokens in an Erlang form, or ... . A process which is an IoDevice for the io module functions can do just this, this is what makes it an IoDevice. Unfortunately there is no good write up describing the i/o system and the proper interface needed this functionality is not properly defined in the io module. There was a description in the old book but not in the released sections. Someday if I get time I will fix it.<br>
<br>Now leex was designed to fit into the i/o system so it can handle getting data in chunks in a re-entrant fashion. It depends on which functions in the generated file you call. That file has the same interface as the erl_scan module. The string/2/3 functions take a complete string and return the tokens in it. This is a one-shot deal.<br>
<br>However the functions token/3 and tokens/3 are re-entrant. Token will read one token, while tokens will read all the tokens up to a a token which was declared as {end_token, ... }. Like dot ". " in Erlang. You first call them with a continuation of [], if there are enough characters then it returns {done,Result,LeftOverChars} otherwise it there weren't enough characters it returns {more,Continuation}. Then you call the function again with Continuation and more characters, and so until you get what you need or your characters run out. No more characters is signaled by calling with 'eof' instead of characters, an empty list does not have this effect. Check the documentation for erl_scan (though they don't have the token function).<br>
<br>So leex can do exactly what you want.<br><br>Unfortunately yecc doesn't have the same type of interface so it is not re-entrant and you have to give it all the tokens in one go. Even more unfortunately it could have been written in such a way. And could be rewritten as well. :-(<br>
<br>Hope this helps. I will try to find some examples of code. Otherwise check in the file modules.<br><br>Robert<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I'm looking for a better tool (as in quicker and easier code to<br> maintain/extend) than writing protocol parsing "by hand".<br> <br> PS.<br> I reserve the right to be completly mistaken about everything.<br>
</blockquote></div><br>