The Import/Export Mechanism Implemented by the Zen Compiler

Samuel Rowe
5 min readApr 4, 2020

As new features are implemented, the source code of a project grows in complexity. It becomes hard to maintain the source code when everything is crammed into a single source file. Therefore, an intuitive feature of a programming language is to allow programmers to split their source code into multiple files. The symbols defined in these source files can be imported and exported outside a given source file or compilation unit as needed.

In Zen, the import statement allows a compilation unit to refer to external entities such as functions and classes. Without the use of the import statement, the only way to refer to an entity outside the current compilation unit is to use a fully qualified name. Further, all top-level entities declared in Zen are exported by default. As of this writing, there is no way to override this behavior.

Zen is a general purpose programming language designed to build simple, reliable and efficient programs. I have been developing Zen for the last three years. You can find the source code of the compiler here and the virtual machine here.

Albeit the import/export mechanism is complicated for most languages, it is an interesting topic. Further, the import/export mechanism varies greatly from language-to-language. For example, in C/C++ the preprocessor and the linker play important roles for exporting and importing symbols during static analysis. Whereas, in languages such as Python, importing and exporting of symbols are performed at runtime. This is one of the main reasons why this important topic is left out in most compiler design courses and books.

The following article describes the import/export mechanism employed by the Zen compiler behind the scenes.

Entity Forms

A compilation unit can import entities that are either compiled or non-compiled. (Note that the design assumes that the form of an entity is transparent to the end user, i.e., the programmer.)

  1. Entities whose equivalent “.feb” files exist on the filesystem in the entity lookup directories specified to the compiler. These entities are referred to as compiled entities.

2. Entities that are part of the current compilation batch. These entities are referred to as non-compiled entities.

The Global Symbol Cache

Given we are dealing with two forms of entities (compiled and non-compiled), the design should allow for an abstraction which allows layers above the import/export mechanism to work seamlessly without having to worry about an entity's form. In order to implement such an abstraction, a central repository known as the global symbol cache is maintained by the compiler.

The global symbol cache holds symbols which represent external entities. Here, an external entity refers to any entity that is defined outside a compilation unit. Internally, the global symbol cache uses a hash map to keep track of the registered symbols. The various phases of the compiler can request the global symbol cache to acquire symbols corresponding to entities. However, symbols exported from compilation units that are part of the current compilation batch are not available until the definition phase of the compiler is over. In other words, non-compiled entities are not available until the definition phase is complete.

The global symbol cache satifies a symbol request in one of the following ways:

1. If the requested symbol is found in the internal hash map, it is immediately returned.

2. If the requested symbol is not found in the internal hash map, the cache searches the entity lookup directories to find a binary entity. If found, it loads the binary entity using the embedded binary entity loader and a corresponding symbol is created, inserted in the internal hash map, and returned. Otherwise, the request fails and null is returned. This allows the compiler to load external entities from their binary counterpart without the requesting party to manually deal with external entities.

What about non-compiled external entities? We know that such entities are part of the current compilation batch. The compiler can take advantage of this fact. During the definition phase, the compiler registers any symbol that is considered as an external symbol to the global symbol cache. This allows compilation units in the current compilation batch to reference symbols declared in another compilation unit. For example, consider two compilation units BirdWatch.zen and Sparrow.zen in the current compilation batch which declare the BirdWatch and Sparrow classes, respectively. Further, assume that BirdWatch.zen imports Sparrow. During the definition phase corresponding to Sparrow.zen, a class symbol for Sparrow is registered in the global cache. This allows BirdWatch to refer to the external entity Sparrow which is part of the current compilation batch.

Phases of the Compiler

For multiple input source files, the compiler takes a source file and subjects it to various phases in a parallel sequence, where the phases include, lexical analysis, syntax analysis, semantic analysis (which is divided into the defintion phase and the resolution phase), optimization, and code generation. However, the same flow might not be suitable for the mechanism described here. Why? If you think about it, the previous flow does not provide an oppurutunity to the compiler to register non-compiled entities. In other words, the phases of the compiler are specific to each compilation unit which prevents exchange of symbols between compilation units.

Therefore, in order to accomodate the mechanism described here, the flow of the compiler has to be altered. In the older flow, a source file was not processed until the previous source file was completely processed, i.e., subjected to all the phases from lexical analysis to code generation. In the new flow, all the input source files are subjected to a single phase before moving on to the next phase. This allows the compiler to mediate the exchange of symbols between compilation units.

Conclusion

What started out as an internal documentation for the import/export mechanism employed in the Zen compiler, turned out be an article which I enjoyed authoring.

The aim of this article is to provide programmers who are designing their own programming languages with a possible design for a simple import/export mechanism. I am by no means an expert in compiler design. I may be wrong about a few things that I mentioned in this article. In my defense, I do not have a formal training in compiler design. Please feel free to express your thoughts in the comments section.

Further, if you are interested in contributing to Zen, please feel free to contact me at samuelrowe1999@gmail.com.

--

--

Samuel Rowe

With software development, there is always something new to discover. Designing a platform that is helpful to millions of users is my ultimate goal.