8.4. Conflict Tips

Conflicts arise from ambiguities in the grammar. That is, some input sequences may possess more than one parse. Shift/reduce conflicts are benign in the sense that they are easily resolved (Happy automatically selects the shift action, as this is usually the intended one). Reduce/reduce conflicts are more serious. A reduce/reduce conflict implies that a certain sequence of tokens on the input can represent more than one non-terminal, and the parser is uncertain as to which reduction rule to use. It will select the reduction rule uppermost in the grammar file, so if you really must have a reduce/reduce conflict you can select which rule will be used by putting it first in your grammar file.

It is usually possible to remove conflicts from the grammar, but sometimes this is at the expense of clarity and simplicity. Here is a cut-down example from the grammar of Haskell (1.2):

exp     : exp op exp0
        | exp0

exp0    : if exp then exp else exp
        ...
        | atom

atom    : var
        | integer
        | '(' exp ')'
        ...

This grammar has a shift/reduce conflict, due to the following ambiguity. In an input such as

if 1 then 2 else 3 + 4

the grammar doesn't specify whether the parse should be

if 1 then 2 else (3 + 4)

or

(if 1 then 2 else 3) + 4

and the ambiguity shows up as a shift/reduce conflict on reading the 'op' symbol. In this case, the first parse is the intended one (the 'longest parse' rule), which corresponds to the shift action. Removing this conflict relies on noticing that the expression on the left-hand side of an infix operator can't be an exp0 (the grammar previously said otherwise, but since the conflict was resolved as shift, this parse was not allowed). We can reformulate the exp rule as:

exp     : atom op exp
        | exp0

and this removes the conflict, but at the expense of some stack space while parsing (we turned a left-recursion into a right-recursion). There are alternatives using left-recursion, but they all involve adding extra states to the parser, so most programmers will prefer to keep the conflict in favour of a clearer and more efficient parser.

8.4.1. LALR(1) parsers

There are three basic ways to build a shift-reduce parser. Full LR(1) (the `L' is the direction in which the input is scanned, the `R' is the way in which the parse is built, and the `1' is the number of tokens of lookahead) generates a parser with many states, and is therefore large and slow. SLR(1) (simple LR(1)) is a cut-down version of LR(1) which generates parsers with roughly one-tenth as many states, but lacks the power to parse many grammars (it finds conflicts in grammars which have none under LR(1)).

LALR(1) (look-ahead LR(1)), the method used by Happy and yacc, is tradeoff between the two. An LALR(1) parser has the same number of states as an SLR(1) parser, but it uses a more complex method to calculate the lookahead tokens that are valid at each point, and resolves many of the conflicts that SLR(1) finds. However, there may still be conflicts in an LALR(1) parser that wouldn't be there with full LR(1).