Proposal: Binary operator precedence in pest grammar

Currently the pest grammar defines a binary expression as
expression = { expression_term ~ (operation_binary ~ expression_term)* } link.

Precedence climbing is performed afterward using the pest PrecClimber link. This is used in the Leo compiler link.

@acoglio has proposed modifying the pest grammar to perform precedence climbing - therefore removing the need to use the PrecClimber in the compiler. Each binary operator would be modeled similar to the Java language specification link.

Binary operators will be grouped by precedence.

Boolean operators: ||, &&
Relational operators: ==, >=, <= , >, <
Additive operators: -, +
Multiplicative operators: *, /, **

Defining precedence operator rules in the pest grammar will strengthen the Leo language since there will be a clearly defined way of forming the abstract syntax tree when parsing a Leo file. It will also give a clear path for defining the operational semantics of the language.

This proposal is a work in progress. A github issue formalizing implementation will be linked at a later date.

3 Likes

To exemplify, let us consider (generic) grammar rules

<primary> ::= <identifier> |  "(" <expression> ")" | ...
<multiplication> ::= <primary> |  <multiplication> "*" <primary>
<additition> ::= <multiplication> | <addition> "+" <multiplication>

These rules implicitly define the relative precedence of addition and multiplication, and the fact that they are left-associative.

For instance, according to the rules, the expression

x + y * z

can only be parsed as

tree 1:
  +
 / \
x   *
   / \
  y   z

and not as

tree 2:
    *
   / \
  +   z
 / \
x   y

because for tree 2 the expression x + y * z would have to be a <multiplication>, but then x + y cannot be a <primary>. Instead, for tree 1 the expression x + y * z is an <addition>.

Similar situation for x * y + z.

Also similarly, the rules force x + y + x to be parsed as (x + y) + z and x * y * z as (x * y) * z – i.e. to be left-associative.

The above rules are left-recursive though, so they need to be modified for Pest. Techniques to eliminate left recursion may change the desired associativity (requiring some post-parsing adaptation), but may still preserve the desired precedence. This page https://en.wikipedia.org/wiki/Left_recursion explains and exemplifies. So we’ll need to think this though a bit more, and ponder the tradeoffs. (The current parser + precedence climber work fine, so this does not seem urgent.)

The left-recursive rules may be used as is for user-oriented documentation – as done in the Java and C specifications, for example. We could also consider formally proving the equivalence of the left-recursive user-oriented grammar to the Pest grammar.

3 Likes

@acoglio intresting post