Going back to our earlier expression-parsing example,
wouldn't it be nicer if we didn't have to explicitly separate
the expressions into terms and factors, merely to make it
clear that '*'
and '/'
operators bind more tightly than '+'
and
'-'
?
We could just change the grammar as follows (making the appropriate changes to the expression datatype too):
Exp : let var '=' Exp in Exp { Let $2 $4 $6 } | Exp '+' Exp { Plus $1 $3 } | Exp '-' Exp { Minus $1 $3 } | Exp '*' Exp { Times $1 $3 } | Exp '/' Exp { Div $1 $3 } | '(' Exp ')' { Brack $2 } | int { Int $1 } | var { Var $1 }
but now Happy will complain that there are shift/reduce
conflicts because the grammar is ambiguous - we haven't
specified whether e.g. 1 + 2 * 3
is to be
parsed as 1 + (2 * 3)
or (1 + 2) *
3
. Happy allows these ambiguities to be resolved by
specifying the precedences of the
operators involved using directives in the
header[2]:
... %right in %left '+' '-' %left '*' '/' %% ...
The %left
or %right
directive is followed by a list of terminals, and declares all
these tokens to be left or right-associative respectively. The
precedence of these tokens with respect to other tokens is
established by the order of the %left
and
%right
directives: earlier means lower
precedence. A higher precedence causes an operator to bind more
tightly; in our example above, because '*'
has a higher precedence than '+'
, the
expression 1 + 2 * 3
will parse as 1
+ (2 * 3)
.
What happens when two operators have the same precedence?
This is when the associativity comes into
play. Operators specified as left associative will cause
expressions like 1 + 2 - 3
to parse as
(1 + 2) - 3
, whereas right-associative
operators would parse as 1 + (2 - 3)
. There
is also a %nonassoc
directive which indicates
that the specified operators may not be used together. For
example, if we add the comparison operators
'>'
and '<'
to our
grammar, then we would probably give their precedence as:
... %right in %nonassoc '>' '<' %left '+' '-' %left '*' '/' %% ...
which indicates that '>'
and
'<'
bind less tightly than the other
operators, and the non-associativity causes expressions such as
1 > 2 > 3
to be disallowed.
The precedence directives, %left
,
%right
and %nonassoc
,
assign precedence levels to the tokens in the declaration. A
rule in the grammar may also have a precedence: if the last
terminal in the left hand side of the rule has a precedence,
then this is the precedence of the whole rule.
The precedences are used to resolve ambiguities in the grammar. If there is a shift/reduce conflict, then the precedence of the rule and the lookahead token are examined in order to resolve the conflict:
If the precedence of the rule is higher, then the conflict is resolved as a reduce.
If the precedence of the lookahead token is higher, then the conflict is resolved as a shift.
If the precedences are equal, then
If the token is left-associative, then reduce
If the token is right-associative, then shift
If the token is non-associative, then fail
If either the rule or the token has no precedence, then the default is to shift (these conflicts are reported by Happy, whereas ones that are automatically resolved by the precedence rules are not).
The precedence of an individual rule can be overriden, using context precedence. This is useful when, for example, a particular token has a different precedence depending on the context. A common example is the minus sign: it has high precedence when used as prefix negation, but a lower precedence when used as binary subtraction.
We can implement this in Happy as follows:
%right in %nonassoc '>' '<' %left '+' '-' %left '*' '/' %left NEG %% Exp : let var '=' Exp in Exp { Let $2 $4 $6 } | Exp '+' Exp { Plus $1 $3 } | Exp '-' Exp { Minus $1 $3 } | Exp '*' Exp { Times $1 $3 } | Exp '/' Exp { Div $1 $3 } | '(' Exp ')' { Brack $2 } | '-' Exp %prec NEG { Negate $2 } | int { Int $1 } | var { Var $1 }
We invent a new token NEG
as a
placeholder for the precedence of our prefix negation rule.
The NEG
token doesn't need to appear in
a %token
directive. The prefix negation
rule has a %prec NEG
directive attached,
which overrides the default precedence for the rule (which
would normally be the precedence of '-') with the precedence
of NEG
.