How Parsers Recover From Syntax Errors: A Case Study of the Zen Compiler — Part 1

Samuel Rowe
4 min readApr 6, 2020

There are two types of people in this world: intellects who love the semicolon and “ultra” intellects who despise the day this dreaded symbol came into existence. Many programmers belong to the latter category. If you are new to programming or have never written a single line of code in your life, you may be wondering what makes me think so. If you ask a programmer fluent in a C-like programming language, he/she would tell you that missing semicolons are one of the most common syntax errors programmers make, and this includes programmers who live and breath code. Therefore, a well designed compiler should be smart enough to recover from syntax errors.

It has been two years since I wrote the Zen parser. Until today, the parser generated duplicate errors whenever it encountered a syntax error. As I am coming closer to finishing the alpha version of the compiler, I decided that it was time for me to implement an error recovery strategy for the parser. In this article, I describe about the basic error recovery strategies from the knowledge I acquired implementing one.

Zen is a general purpose programming language designed to build simple, reliable and efficient programs. You can find the source code of the compiler here. In this two part article, I will be showing you the panic mode error recovery strategy that I implemented in the Zen compiler. You will learn what panic mode error recovery strategy is shortly.

Before we being, a parser is a component within the compiler which groups tokens together to recognize constructs that a programmer writes. The grammar of the programming language that the compiler was designed to compile determines how the tokens are grouped together.

What are Syntax Errors?

From the perspective of a parser, a syntax error is a situation in the parser where the parser cannot proceed further because it has no viable alternative to follow. In other words, a syntax error occurs when you try to write constructs that are considered invalid according to the grammar of the programming language.

Here is a little brain teaser for you. If a parser has no viable alternative to follow when a syntax error occurs, doesn’t that mean the parser is basically stuck? If you think yes, you are partially right. However, you do not see your compiler crashing when you make syntax errors. This is because the parser implements an intuitive mechanism known as the error recovery strategy.

Error Recovery Strategy

An error recovery strategy is a mechanism that the parser triggers when a syntax error is encountered. It is the responsibility of the error recovery strategy to prevent generation of duplicate error messages and resynchronize the parser to a state where the input can be parsed further to recognize other syntax errors, if any.

Here is a list of the most basic error strategies. Please note that the names of the strategies may vary in other compiler design resources.

  1. Abort Strategy
  2. Token Insertion/Deletion Strategy
  3. Panic Mode Strategy

The Abort Strategy: When All Hell Breaks Loose, Say Goodbye

When a syntax error occurs, the parser being the good Samaritan, reports you the syntax error with an informative message so you can correct it. Informative error messages help you find your mistakes and correct them. Once you fix all the syntax errors, the parser succeeds allowing the compiler to proceed with the phases that follow the syntax analysis.

You must remember that the manner in which the parser responds to ungrammatical input is important. Imagine a parser that reports and causes the compiler to terminate on the first syntax error it encounters. Such a parser is known to implement the abort strategy. This is the simplest error recovery strategy. This strategy is counter productive to you during development because there may be more than one syntax error in your source code.

The parser in the Python interpreter is an example that implements this strategy.

The Token Insertion/Deletion Strategy: “Let There BE a Token, or NOT”

In this strategy, the parser tries to insert an imaginary token that may resynchronize the state of the parser. If the insertion of the imaginary token caused another syntax error, the parser promptly removes the imaginary token. The parser then tries to delete a token from the input assuming that the deletion may cause the parser to resychronize. Please note that the order in which the insertion/deletion occurs may vary from implementation to implementation.

The main problem with this error recovery strategy is that the parser may enter an infinite loop. In my opinion, this makes the insertion/deletion strategy fairly complicated to implement. Therefore, the compiler developer must be careful when implementing it.

Panic Mode Strategy: There’s No Problem Eating Cannot Fix

When the parser encounters an invalid input, the current rule cannot continue, so the parser recovers by skipping tokens until it a possible resynchronized state is achived. The control is then returned to the calling rule. This technique is known as the panic mode strategy.

The trick here is to discard tokens only until the lookahead token is something that the parent rule of the current rule expects. For example, in Java, if there is a syntax error in a throw statement, the parser discards tokens until a semicolon token or other relevant token is encountered. Here the semicolon token is known as a terminator.

The Zen parser implements the panic mode strategy. In my opinion, this is the most effective strategy among the techniques I have discussed in this article because it works remarkably well. Additionally, it is very easy to implement.

Conclusion

The error recovery strategy is an important mechanism that provides the parser with flexibility over how error messages are generated, the content of the error messages, and most importantly how the parser recovers from syntax errors.

In the second part of this article, we will see how to implement this strategy.

--

--

Samuel Rowe

With software development, there is always something new to discover. Designing a platform that is helpful to millions of users is my ultimate goal.