Parsing / Context-Free Grammar
Context-Free Grammar (CFG) is a grammar where production (or rewrite) rules are in the form of:
$$
A \rightarrow \alpha
$$
$A$ is a single nonterminal symbol, and $\alpha$ is the string of terminals or nonterminals, it can also be empty.
A terminal (or token) is a symbol that does not appear on the left side of the arrow of any production rule.
In every production rule, a nonterminal on the left side of the arrow can always be replaced by everything on the right side of the arrow.
For example, the following grammar defined an integer in BNF◹ syntax:
<digit> ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
<integer> ::= ['-'] <digit> {<digit>}
In this grammar, <digit>
and <integer>
are nonterminals, and the symbols (-, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9) are the terminals.
Formally, a Context-Free Grammar is defined by a 4-tuple:
$$
G = (V, \Sigma, R, S)
$$
Where:
- $V$ is a set of nonterminal symbols that we have in the grammar.
- $\Sigma$ is a set of terminals, which make up the content of the sentences.
- $R$ is a set of production rules of the grammar, sometimes can be symbolized as $P$.
- $S$ is the start symbol of the grammar.
For example, to parse an algebraic expression with the variables $x$, $y$ and $z$, like this:
$$
(x + y) * x - z * y / (x + x)
$$
We have a grammar:
$$
G = (\{ S \}, \{ x, y, z, +, -, *, /, (, ) \}, R, S)
$$
With the following production rules:
S → x | y | z
S → S + S
S → S - S
S → S * S
S → S / S
S → (S)
Another example, a grammar $G$ to match all palindromes of the characters $\{ a, b \}$ like $aa$, $aabaa$, $aabbaa$, $bab$,…
$$
G = (\{ S \}, \{ a, b \}, R, S)
$$
S → aSa
S → bSb
S → ε
The last rule is called ε-production, which means $S$ be rewritten as an empty string.