<p dir="ltr">I need parsing of a nginx-like config.<br>

In my current implementation I have described almost all keywords in peg grammar, so any unknown keyword will stop parsing.</p>

<p dir="ltr">If I have correctly understood Sean, better will be to validate after parsing.</p>

<p dir="ltr">I need to step back and read about lalr vs ll difference first.</p>

<div class="gmail_quote">On Nov 14, 2015 9:44 PM, "Robert Virding" <<a href="mailto:rvirding@gmail.com">rvirding@gmail.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div>If you specifically want an LL(1) parser generator I have written one for Erlang and LFE, spell1, <a href="https://github.com/rvirding/spell1" target="_blank">https://github.com/rvirding/spell1</a>. You need a tokeniser for it, for example one generated with leex. The LALR(1) grammars of yecc are more general than LL(1) but spell1 does have a few benefits over yecc:<br><br></div>- It is reentrant in that you don't need to pass in all the tokens in go but can keep adding them until it has gotten enough.<br></div>- It can handle passing in too many tokens and just returns the left-overs.<br><br></div>Yecc crashes on both of these cases. These aren't really a problem for erlang as you have the '.' which marks the end but I needed it for LFE.<br><br></div>Robert<br><br></div><div class="gmail_extra"><br><div class="gmail_quote">On 14 November 2015 at 18:33, Max Lapshin <span dir="ltr"><<a href="mailto:max.lapshin@gmail.com" target="_blank">max.lapshin@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><p dir="ltr">Got it.</p>

<p dir="ltr">So, it is just a bad way of usage.</p>

<p dir="ltr">I decided to put semantics and most of logic inside config to make validation as early as possible. And it seems that it is a misusage.</p>

<p dir="ltr">OK, I will think what to do including option "ignore and relax" because only one user suffers from this problem.</p>

<p dir="ltr">Thanks for hints and explanation!<br>

 </p><div><div>

<div class="gmail_quote">On Nov 14, 2015 8:20 PM, "Sean Cribbs" <<a href="mailto:seancribbs@gmail.com" target="_blank">seancribbs@gmail.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi Max,</div><div><br></div>My general impression of your grammar is that it conflates syntax and semantics. Your configuration lines seem to be structured roughly like KEYWORD VALUES* ( "{" BLOCK "}" )? ";", but you create a single rule for every possible configuration item. This means at the top-level you have to create ordered choices between completely unrelated extents. Every config item you parse is going to go through the same backtracking among ~30 alternatives, with nothing able to be memoized (because the stem of each one is unique). This is a quintessential worst-case for the types of parsers that neotoma currently generates.<div><div><br></div><div>I suggest you rethink the parser, focus on what are the shared structures and try to extract those. Use fewer rules, try not to treat each config item as a special case, and then validate the structure after parsing it. Alternatively, you could pick up and use cuttlefish which has a much simpler configuration language and built-in semantic validation and transformation and command-line tools.</div><div><br></div><div>As you say earlier in the thread, you could also use leex and yecc. I understand the desire not to, but their performance characteristics are better known and more predictable. They might be worth it.</div><div><br></div><div>I feel pretty responsible for the limitations you encountered. Ford's thesis clearly outlines optimizations that neotoma does not do, including more efficient matching of terminals via "switch", cost calculation to determine what to memoize and inline, unrolling recognition logic instead of using parser-combinators, and more. I'm working on a rewrite, but it's not ready yet. </div><div><br></div><div>I'm sorry. I hope the suggestions above help you focus the grammar into something more usable.</div></div><div><br></div><div>Sean</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Nov 14, 2015 at 10:27 AM, Max Lapshin <span dir="ltr"><<a href="mailto:max.lapshin@gmail.com" target="_blank">max.lapshin@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I've sent example from production to Sean.<div><br></div><div>


<p><span>(<a href="mailto:flussonic@127.0.0.1" target="_blank">flussonic@127.0.0.1</a>)2> timer:tc(fun() -> config2_format:parse(element(2,file:read_file("big.conf"))), ok end).</span></p>

<p><span>{29727926,ok}</span></p></div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Nov 14, 2015 at 4:05 PM, French, Michael <span dir="ltr"><<a href="mailto:michael.french@cgi.com" target="_blank">michael.french@cgi.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">The concept of 'token' is fluid in PEGs. The terminal/non-terminal distinction might not work. For example, the definition of 'alphanumeric' might appear in several different 'tokens', rather than repeating char classes, which means the token rules become non-terminals.<br>

<br>

Maybe use a hint in the grammar, like using an upper-case name for rule LHS to indicate a 'token' (which is similar to a convention in Antlr). Then always memo-ize the token rules, but not (necessarily) the others that have lower-case rule names.<br>

<br>

BR<br>

Mike<br>

<br>

________________________________________<br>

From: <a href="mailto:erlang-questions-bounces@erlang.org" target="_blank">erlang-questions-bounces@erlang.org</a> [<a href="mailto:erlang-questions-bounces@erlang.org" target="_blank">erlang-questions-bounces@erlang.org</a>] on behalf of Joe Armstrong [<a href="mailto:erlang@gmail.com" target="_blank">erlang@gmail.com</a>]<br>

Sent: Friday, November 13, 2015 11:25 PM<br>

To: Sean Cribbs<br>

<span>Cc: Erlang-Questions Questions<br>

Subject: Re: [erlang-questions] speed of neotoma parser<br>

<br>

</span><div><div>PEG parsers are notoriously inefficient. How about having a separate<br>

tokenization pass, and parse token instead of characters. At a guess<br>

this would be far faster since you'd backtrack over completed tokens<br>

rather than characters.<br>

<br>

/Joe<br>

<br>

On Fri, Nov 13, 2015 at 3:02 PM, Sean Cribbs <<a href="mailto:seancribbs@gmail.com" target="_blank">seancribbs@gmail.com</a>> wrote:<br>

> Max,<br>

><br>

> Do you have a link to your grammar? I can probably poke at it and give you<br>

> some tips.<br>

><br>

> However, I am well aware of performance problems with neotoma -- with large<br>

> grammars or large inputs it drags. Yes, there are general problems for PEGs<br>

> in Erlang, but its current implementation is particularly naive and<br>

> wasteful. I'm working on a rewrite, but it's a complete overhaul (and more<br>

> faithful to the thesis and reference implementation "Pappy"). Since it's not<br>

> core to my day-job, I've only been able to work on the rewrite occasionally<br>

> in my free time.<br>

><br>

> On Thu, Nov 12, 2015 at 12:07 PM, Max Lapshin <<a href="mailto:max.lapshin@gmail.com" target="_blank">max.lapshin@gmail.com</a>> wrote:<br>

>><br>

>> Yes, Louis, I also think that there may be a simple way of speeding it up.<br>

>><br>

>> I'm only afraid that I will have to open my university book and remember<br>

>> what LL-1 means and how it differs from LALR =)<br>

>><br>

>> Ok, will try to profile it first.<br>

>><br>

>> _______________________________________________<br>

>> erlang-questions mailing list<br>

>> <a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>

>> <a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>

>><br>

><br>

><br>

> _______________________________________________<br>

> erlang-questions mailing list<br>

> <a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>

> <a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>

><br>

_______________________________________________<br>

erlang-questions mailing list<br>

<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>

<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>

_______________________________________________<br>

erlang-questions mailing list<br>

<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>

<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>

</div></div></blockquote></div><br></div>

</div></div></blockquote></div><br></div>

</blockquote></div>

</div></div><br>_______________________________________________<br>

erlang-questions mailing list<br>

<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>

<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>

<br></blockquote></div><br></div>

</blockquote></div>