[erlang-questions] speed of neotoma parser

Max Lapshin max.lapshin@REDACTED
Sat Nov 14 18:33:45 CET 2015


Got it.

So, it is just a bad way of usage.

I decided to put semantics and most of logic inside config to make
validation as early as possible. And it seems that it is a misusage.

OK, I will think what to do including option "ignore and relax" because
only one user suffers from this problem.

Thanks for hints and explanation!
On Nov 14, 2015 8:20 PM, "Sean Cribbs" <seancribbs@REDACTED> wrote:

> Hi Max,
>
> My general impression of your grammar is that it conflates syntax and
> semantics. Your configuration lines seem to be structured roughly like
> KEYWORD VALUES* ( "{" BLOCK "}" )? ";", but you create a single rule for
> every possible configuration item. This means at the top-level you have to
> create ordered choices between completely unrelated extents. Every config
> item you parse is going to go through the same backtracking among ~30
> alternatives, with nothing able to be memoized (because the stem of each
> one is unique). This is a quintessential worst-case for the types of
> parsers that neotoma currently generates.
>
> I suggest you rethink the parser, focus on what are the shared structures
> and try to extract those. Use fewer rules, try not to treat each config
> item as a special case, and then validate the structure after parsing it.
> Alternatively, you could pick up and use cuttlefish which has a much
> simpler configuration language and built-in semantic validation and
> transformation and command-line tools.
>
> As you say earlier in the thread, you could also use leex and yecc. I
> understand the desire not to, but their performance characteristics are
> better known and more predictable. They might be worth it.
>
> I feel pretty responsible for the limitations you encountered. Ford's
> thesis clearly outlines optimizations that neotoma does not do, including
> more efficient matching of terminals via "switch", cost calculation to
> determine what to memoize and inline, unrolling recognition logic instead
> of using parser-combinators, and more. I'm working on a rewrite, but it's
> not ready yet.
>
> I'm sorry. I hope the suggestions above help you focus the grammar into
> something more usable.
>
> Sean
>
> On Sat, Nov 14, 2015 at 10:27 AM, Max Lapshin <max.lapshin@REDACTED>
> wrote:
>
>> I've sent example from production to Sean.
>>
>> (flussonic@REDACTED)2> timer:tc(fun() ->
>> config2_format:parse(element(2,file:read_file("big.conf"))), ok end).
>>
>> {29727926,ok}
>>
>> On Sat, Nov 14, 2015 at 4:05 PM, French, Michael <michael.french@REDACTED>
>> wrote:
>>
>>> The concept of 'token' is fluid in PEGs. The terminal/non-terminal
>>> distinction might not work. For example, the definition of 'alphanumeric'
>>> might appear in several different 'tokens', rather than repeating char
>>> classes, which means the token rules become non-terminals.
>>>
>>> Maybe use a hint in the grammar, like using an upper-case name for rule
>>> LHS to indicate a 'token' (which is similar to a convention in Antlr). Then
>>> always memo-ize the token rules, but not (necessarily) the others that have
>>> lower-case rule names.
>>>
>>> BR
>>> Mike
>>>
>>> ________________________________________
>>> From: erlang-questions-bounces@REDACTED [
>>> erlang-questions-bounces@REDACTED] on behalf of Joe Armstrong [
>>> erlang@REDACTED]
>>> Sent: Friday, November 13, 2015 11:25 PM
>>> To: Sean Cribbs
>>> Cc: Erlang-Questions Questions
>>> Subject: Re: [erlang-questions] speed of neotoma parser
>>>
>>> PEG parsers are notoriously inefficient. How about having a separate
>>> tokenization pass, and parse token instead of characters. At a guess
>>> this would be far faster since you'd backtrack over completed tokens
>>> rather than characters.
>>>
>>> /Joe
>>>
>>> On Fri, Nov 13, 2015 at 3:02 PM, Sean Cribbs <seancribbs@REDACTED>
>>> wrote:
>>> > Max,
>>> >
>>> > Do you have a link to your grammar? I can probably poke at it and give
>>> you
>>> > some tips.
>>> >
>>> > However, I am well aware of performance problems with neotoma -- with
>>> large
>>> > grammars or large inputs it drags. Yes, there are general problems for
>>> PEGs
>>> > in Erlang, but its current implementation is particularly naive and
>>> > wasteful. I'm working on a rewrite, but it's a complete overhaul (and
>>> more
>>> > faithful to the thesis and reference implementation "Pappy"). Since
>>> it's not
>>> > core to my day-job, I've only been able to work on the rewrite
>>> occasionally
>>> > in my free time.
>>> >
>>> > On Thu, Nov 12, 2015 at 12:07 PM, Max Lapshin <max.lapshin@REDACTED>
>>> wrote:
>>> >>
>>> >> Yes, Louis, I also think that there may be a simple way of speeding
>>> it up.
>>> >>
>>> >> I'm only afraid that I will have to open my university book and
>>> remember
>>> >> what LL-1 means and how it differs from LALR =)
>>> >>
>>> >> Ok, will try to profile it first.
>>> >>
>>> >> _______________________________________________
>>> >> erlang-questions mailing list
>>> >> erlang-questions@REDACTED
>>> >> http://erlang.org/mailman/listinfo/erlang-questions
>>> >>
>>> >
>>> >
>>> > _______________________________________________
>>> > erlang-questions mailing list
>>> > erlang-questions@REDACTED
>>> > http://erlang.org/mailman/listinfo/erlang-questions
>>> >
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20151114/915d07b8/attachment.htm>


More information about the erlang-questions mailing list