Pretty Printers using Box Models - mn-mikke/Model-driven-Pretty-Printer-for-Xtext-Framework GitHub Wiki
This page presents principles and concepts of pretty-printing. Especially the page deals with the concept of box models in more detail, which represents possibility how to express settings of a pretty printer declaratively.
Pretty Printer
The pretty printer ensures functionalities of code formatting and syntax highlighting. As the parser for some specific language transforms code written in the language to the AST, which is free from formatting information, so the pretty printer is the opposite from the perspective of code parsing. The pretty printer conversely transforms the AST to the code and enriches it with formatting characters. The Figure 1 demonstrates the main purpose of the pretty printer.
Figure 1
A diagram depicting the transformation cycle of a code when a pretty printer is used.
Ad-hoc Pretty Printer
This kind of pretty printers is the best-known and the most widespread. These pretty printers are located in most current code editors or development environments intended for concrete imperative languages like C, C++, Java, Pascal, etc. Each of them allows for formatting only the language to which the pretty printer is dedicated. The result of pretty-printing can be affected only by limited configurability. Settings of the pretty printer mostly offers only certain places with limited domain to change such as the definition of the character sequence for indenting, setting whether the left brace identifying the start of the method should be on the new line or be preceded by one whitespace, etc.
Generic Pretty Printer
This concept is the opposite of ad-hoc pretty printers. The correct generic pretty printer should be able to format an arbitrary number of languages and the options how to configure the formatting of a given language should be very wide. Nowadays, it is difficult to find some commercial projects, where this type of pretty printers were used. This concept is rather a matter of theoretical sphere and its realizations are mostly contained in research projects as a byproduct.
In order to ensure that the generic pretty printer is able to format more languages which differ not only in details, the formatting rules determining a code appearance of a certain language have to be linked to some specification of a given language which is a grammar. The interconnection is performed in generic pretty printers through pretty-print tables that contain formatting rules linked to rules of a grammar. This fact extends the possibilities to set up a code appearance in comparison with possibilities of ad-hoc pretty printers because the formatting rules can be easily changed, deleted or added. The pretty-print tables together with the AST of a given code further represent an input for the generic pretty-printer (see [Pretty Printer for Every Occasion](http://reference.kfupm.edu.sa/content/ p/r/a_pretty_printer_for_every_occasion__188340.pdf) paper for details). The mentioned formatting rules may be obtained manually as well as may be generated from annotated grammar rules with the help of some heuristics (see [Pretty Printer for Every Occasion](http://reference.kfupm.edu.sa/content/ p/r/a_pretty_printer_for_every_occasion__188340.pdf) paper for details). The following listing outlines what a pretty-print table can look like.
Listing
A sample of a pretty-print table published in [Pretty Printer for Every Occasion](http://reference.kfupm.edu.sa/content/ p/r/a_pretty_printer_for_every_occasion__188340.pdf). The table represents a mapping of grammar rules to corresponding formatting rules. Grammar rules written in the [Syntax Definition Formalism (SDF)](http://pdf.aminer.org/001/067/569/the_ syntax_definition_formalism_sdf_reference_manual.pdf) are located on the left side of dashes and further formatting rules are located on the right side.
"package" Name ";" → PackagedDeclaration − H [KW ["package"] H hs=0 [1 ";"]],
"import" Name ";" → ImportDeclaration − H [KW ["import"] H hs=0 [1 ";"]],
"import" Name "." "*" ";" → ImportDeclaration − H [KW ["import"] H hs=0 [1 "." "*" ";"]]
The generic pretty printer brings advantages in high formatting configurability and possibilities to format more languages. Some generic pretty printers allow for formatting code into more formats specifying the same appearance of the code as for example plain text, Latex format or HTML. This feature is usually realized by division of the pretty-printer into a front-end and a back-end. The front-end of the pretty-printer is responsible for transforming pretty-print tables and AST into the intermediate language expressing a code formatting. Then the beck-end transforms the intermediate language into a given format specifying the appearance of a given code. Since the back-end of the pretty-printer in itself is not generic, a back-end has to exist for each format. A schema on the Figure 2 reflects information contained in this paragraph.
Figure 2
A schema of a generic pretty-printer with three back-ends.
Box Representation
The concept using an intermediate language was mentioned in the previous paragraph. This intermediate language tends to be the box representation which is a data structure formed from elements called boxes.
Box
The box is a construction element of the box representation. This element can be either a string token related to some terminal rule of the grammar or a group of other elements among which vertical and horizontal relative positions or an indentation are defined as it can be seen in the Figure 3. This means that the box representation is also a composite box because the box representation is essentially a tree structure with regard to composing boxes.
Figure 3
An example of the box representation defining the appearance of the if statement from a C-based language.
Box Language
Even though the box representation enables to define the appearance of a code written in some language, it is necessary to define what types of boxes will be used and how they will be assembled together. Therefore the box language serves for this purpose. The box language consists of operators that define creating of composite boxes. Each operator is related to a particular composite box type like a horizontal box, a vertical box or indenting box. The operators can be further configured using parameters which are reflected into corresponding composite boxes for example where it is possible to change spacing between inner boxes, a spacing character, etc. The mentioned operators are applied in formatting rules in pretty-print tables where usages of operators encapsulates keywords, calls of grammar rules and other usages of operators. The composition of operator's usages form a tree structure similarly like boxes in the box representation. The usages of operators can be seen in formatting rules on right sides in the Listing.
Since the concept of operators and their usages will be often mentioned in the remaining text, the following terminology is introduced.
Box Model
Since usages of operators of a box language serves as a pattern for the resulting box representation, a collection of formatting rules related to grammar rules of a particular language will be called a box model.
Box Meta-model
Since the types of operators and their parameters may be much more as well as the count of box languages, a set of operators and relevant parameters which can be used will be called a box meta-model. In other words, the box language will be referred to as a box meta-model.
Existing Box Meta-models
This section contains a description of box meta-models that was published in papers or whose implementations are located in realized research projects. The concrete names and references of the mentioned papers and projects are mentioned bellow. The describing meta-models will be called by the name of a relevant project or by surnames of relevant paper's authors.
Five already existing box meta-models have been introduced and described in the text above. This is enough information to design a box meta-model that meets all requirements for code formatting. This ideal box meta-model is realized in this project with an exception.