Integrating the Box Model into Xtext Framework - mn-mikke/Model-driven-Pretty-Printer-for-Xtext-Framework GitHub Wiki

So far resources allowing for defining box models were introduced but it has not been mentioned how to interconnect box models with pretty-printing concepts contained in the Xtext framework (see code formatting and the syntax highlighting) yet. The purpose of this chapter is to design a solution of this problem. In other words, it has to be designed behavioral implementations of operators so that be able to cooperate with a code of the Xtext framework realizing code formatting and syntax highlighting.

Syntax Highlighting

This concept is essentially an associating defined text styles with individual parts of code. Moreover, the process of association can be realized lexically or semantically. These two ways correspond to usages of highlight operators in terminal formatting rules (the lexical way) and parser formatting rules (the semantic way).

Behavioral Implementation of Operators

The definition of a highlight operator written in the corresponding DSL is formed from parameters whose current values together specify a particular text style (see the [Basic Operators](https://github.com/mn-mikke/Model-driven-Pretty-Printer-for-Xtext-Framework/wiki/MetaModLang-(Language-for-Deﬁning-Box--Meta-models\)) section) and text styles are realized in the Xtext framework by the TextStyle class (see the Listing 1 on this page. Thus the behavioral implementation of an operator should translate a text style expressed by values of parameters into the TextStyle class. Moreover, it should be only one implementation enough because highlight operators and its usages differ only in values of parameters.

Highlighting Configuration

Before text styles be able to associated with parts of some code, they have to be registered (see the Text Styles section). The text styles specified by usages of highlight operators can be registered by creating a new implementation of the IHighlightingConfiguration interface (see the Listing 2 on this page that would have an access to a box model describing a formatting configuration for a given language. The implementation would override the configure method so that the method get all usages of highlight operators from all formatting rules and instantiate a behavior implementation related to each usage. A given behavior implementation translates values of usages into the TextStyle class and the method registers it under a specific identifier.

The task how to choose a suitable identifier for text styles related to usages from formatting rules dedicated for terminals is simple. If the formatting rule is the default, the identifier will be the string "default". Further, if the formatting rule references a terminal rule, the identifier will be a name of the terminal rule. But also if the formatting rule specifies an appearance of keywords matching to a certain regular expression. Since identifying text styles by regular expressions is confusing, the corresponding formatting rule will be enriched with name, which will serve as an identifier for the text style, such as depicted in the Listing 2 on [this page](https://github.com/mn-mikke/Model-driven-Pretty-Printer-for-Xtext-Framework/wiki/ModelLang-(Language-for-Defining-Box-Models\)).

The situation of identifying text styles defined by usages contained in formatting rules referencing formatting rules is not so easy because the usage has not unique identifier and this kind of formatting rule can contain more usages of operators. Thus it has to be created a new concept of identifying these text styles. One possible solution is to create identifiers from a name of the parser rule and a suffix expressing hierarchical position of an usage of an operator among other usages contained in the formatting rule. The following listing typifies a format of identifiers for these usages of operators.

Listing 1

The listing shows what identifiers for usages of operators presented in the Listing 1 on [this page](https://github.com/mn-mikke/Model-driven-Pretty-Printer-for-Xtext-Framework/wiki/ModelLang-(Language-for-Defining-Box-Models\)) look like.

<H> : Greetings.1
<F c="#00ff00"> : Greetings.1.1
<F c="#ff0000"> : Greetings.1.2

Lexical Highlighting

The Lexical Highlighting discusses that associating text styles to parts of code lexically can be realized in the Xtext framework by extending the AbstractAntlrTokenToAttributeIdMapper class and overriding its calculateId method. Since lexical highlighting is a only matter of formatting rules dedicated for terminals, the calculateId method of a new extension having an access to the box model will work as follows. If a value of the tokenName parameter will have the "RULE_" prefix, the token is parsed by utilizing a terminal rule, it will be searched a corresponding formatting rule for terminal rule by the token name without the prefix in the box model. If the formatting rule exists and contains an usage of a highlight operator, it will be returned an identifier of a corresponding text style. If a value of the token name is enclosed in apostrophes or quotation marks, the token is a keyword and subsequently will be tried whether the keyword matches to pattern of any keyword formatting rule. If the formatting rule exists and contains an usage of a highlight operator, it will be returned an identifier of a corresponding text style. In case it is not returned any identifier based on the previous conditions, it will be returned the identifier of the default text style.

Semantic Highlighting

The Semantic Highlighting section discusses that associating text styles to parts of code in a semantic way can be realized in the Xtext framework by implementing the ISemanticHighlightingCalculator interface and overriding its provideHighlightingFor method that should contain callings of the addPosition method on the second parameter which it associates some segment of code with some text style. Bounds of individual segments of a code reflecting semantics can be obtained from the first parameter because the parameter allows for getting a node model, which is an AST-based structure, and elements of the node model have informations about bounds of segments. Although, an element of the node model is linked to corresponding defining element of a grammar rules, it has not clear yet how to get a corresponding text style defined by an usage of an operator. Each defining element has a composite element or reference in a formatting rule and also it shares the same qualified name with the element of formatting rule as it is shown in the Listing 3 on [this page](https://github.com/mn-mikke/Model-driven-Pretty-Printer-for-Xtext-Framework/wiki/ModelLang-(Language-for-Defining-Box-Models\)). Moreover, it is unknown which text style belongs to the element of formatting rule.

A solution solving the whole problem could be to create an initialization of a new implementation of the interface. The initialization would involve a traversal of formatting rules referencing parser rules. The traversal will serve to clarify which usage of some highlight operator is a closest ancestor of a certain element of a formatting rule. The result of the traversal will be a map containing associations between qualified names of elements of formatting rules and identifiers of text styles related to usages of highlight operators that are closest ancestors of given elements. Since the qualified name contained in the map can also belong to a defining element of a grammar rule, the provideHighlightingFor method of the new implementation will exploit the map to obtain an identifier of text style for a given segment of code. It may become the situation that an element of formatting rule has no ancestor that is an usage of a highlight operator and thus its identifier is not in the map. In this case, the method will use the identifier of the default text style. Moreover, the map do not have to contain identifiers of composite defining elements because elements of the node model reference only essential defining elements.

Listing 2

The listing depicts what a map for grammar that contains one rule presented should look like in the Listing 1 on [this page](https://github.com/mn-mikke/Model-driven-Pretty-Printer-for-Xtext-Framework/wiki/ModelLang-(Language-for-Defining-Box-Models\)). Identifiers of defining elements are obtained from the Listing 3 on [this page](https://github.com/mn-mikke/Model-driven-Pretty-Printer-for-Xtext-Framework/wiki/ModelLang-(Language-for-Defining-Box-Models\)) and identifiers of text styles from the Listing 1.

Greetings.|.0. .0.good -> Greetings.1.1 // <F c="#00ff00">
Greetings.|.0. .1.morning -> Greetings.1.1 // <F c="#00ff00">
Greetings.|.1.hello -> Greetings.1.2 // <F c="#ff0000">

It might seem that some conflicts exist in associating text styles with a concrete segment of code because a node having some children covers the same code like its children together. But if the method will traverse a node model so that the parent node will be visited firstly and the its children, there will be no conflict because the Xtext framework allows a developer to redefine a text style for an arbitrary segment of code.

Code Formatting

The CodeFormatting section discusses that code formatting can be realized in the Xtext framework by extending the AbstractDeclarativeFormatter class and overriding the configureFormatting method, which has only one parameter. Calling methods on the parameter serves to creating a formatting configurations for a particular grammar. Since rules of this formatter only allows a developer to statically define mutual positions of tokens by utilizing defining elements of grammar rules and have no information about formatted tokens, it can not be expressed behavior of horizontal-vertical operators formatting tokens on the base of the total length of tokens by this way. Thus specifying code formatting by utilizing a box model can not be realized through the original method of specification as well as specifying syntax highlighting is realized.

After thoroughly reading trough the code of the Xtext framework it is possible to find out that higher layers of Xtext's code exploits some implementation of the INodeModelFormatter interface to format code. Any implementation of this interface has to contain an overriding of the format method. The method has three parameters where the first one is a node model of formated code, the other parameters are offset and length of a code dedicated to be formated. Moreover, the method has to return an instance of the IFormattedRegion interface containing a string of already formatted code, an offset and length. The default implementation of this interface uses the OneWhitespaceFormatter class or extension of the AbstractDeclarativeFormatter class to know how code should be formatted. Thus the interconnection between Xtext's code dealing with code formatting and a box model can be realized by creating a new implementation of the INodeModelFormatter interface that would exploit some box model as well as the default implementation uses mentioned formatters. Further, it would be better if the implementation did not inherit from the interface directly but the implementation was extended from the AbstractNodeModelFormatter class containing the default implementation of the IFormattedRegion interface as an inner class.

Positional Operators

Since behavior implementations of positional operators will represent inner nodes of the formatting tree structure, they should contain references to children. Some implementations have to know how a code formated by its children will be length in order to select a variant of behavior. This is the case of behavior implementations of horizontal-vertical operators. But also behavior of a parent in the formatting tree structure can affect behavior of its children such as a indenting operators only if its parent formats vertically. Thus it has to be firstly calculated how the parent should work. Since obtaining information about length from formatted code from children, which may be recalculated several times, is slow, implementations should contain dedicated methods for obtaining length of the first row, length of the last row, length of the largest row and count of rows of a potential formatted code.

Although, behavior implementations of most of positional operators will assemble formated code of their children as well as some inner node of a node model expresses whole code represented by its children, it can be found some positional operators whose behavior implementation will not assemble formated code of their children but will add some spacing characters to formated code of each child and will leave assembling to its parent. A good example is the indenting operator. Thus behavior implementations of this kind will be cloned so many times that each child will have own behavior implementation of this kind as a parent in order to behavior implementations be consist.

Implementing Concept

The next step is to describe how the format method of the new implementation of the INodeModelFormatter interface should work. The method can utilize the node model of a formatted code and a box model expressing how the code should be formatted. Since leafs of the node model represent particular tokens and character sequences of an original code formatting and other nodes are related to rule calls, the solution could be to transform the node model into an another tree structure expressing the code formatted by a box model. The new tree structure should also contain tokens as leafs or if a token corresponds to a terminal rule for which a usage of transforming operator exists, the token would be replaced by the behavior implementation of the transforming operator that encapsulates the token. Other nodes of the tree structure would be represented by behavior implementations of positional operators thus ensuring relative positioning between tokens.

The transformation of the node model to the new tree structure should be carried so that leafs are transformed at first and other nodes are subsequently transformed from lower to higher layers. Since the inner node of the node model represent a rule call of a parser rule, the inner node will be transformed into a tree structure of the behavior implementations that corresponds to usages of positional operators from a formatting rule dedicated for the given parser rule. In order to this concept make sense, behavior implementations have to be initialized by values of usages of positional operators. Moreover, the formatting rules related to a parser rule have to contain an usage of a positional operator as root of the definition of the formatting rule and each parser rule of a given grammar has to have one formatting rule in the box model.

Figure 1

A schema depicts a partial node model of some code whose some grammar rules are presented in the Listing 2 on [this page](https://github.com/mn-mikke/Model-driven-Pretty-Printer-for-Xtext-Framework/wiki/HeurLang-(Language-for-Defining-Heuristics-of-the-Initial-Box-Model\)). There are some gray nodes expressing sequences of blank characters from the original formatting. The node model is transformed into a formatting tree structure of behavior implementations, which is inspired by formatting rules from the same listing.

Formatting of the Node Model

The transformation of a node model into formatting tree structure might seem straightforward. But a node model differs from the classical AST in some cases. These distinctions are a consequence of integrating actions defined in a grammar into the node model and they have to be eliminated during the transformation.

Figure 2

A schema depicts a segment of a node model of some code whose some grammar rules are presented in the [listing]((https://github.com/mn-mikke/Model-driven-Pretty-Printer-for-Xtext-Framework/wiki/The-Grammar-Language\)) and expresses how the node model might differ from a classical AST when the grammar contains some rules with actions.

Action Distinction

The next step of the formatting procedure would be to serialize the new tree structure into text so that the implementations will recursively apply its behavior from leafs to root.

Behavioral Implementation of Operators

It has already been roughly mentioned what behavior implementations of transforming and positional operators should look like. Now behavior implementations will be considered in greater detail.

Since behavior implementations of both kinds of operators have to work according to values of usages of operators, behavior implementations should contain a method that initializes a given implementation by values of usages of operators.

Transforming Operators

Since behavior implementations of transforming operators will serve as leafs in the formatting tree structure, they should allow for storing original tokens in themselves. But they also should be able to transform these tokens into a required format by a new dedicated method.

Workflow

Languages allowing a user to manage a generic pretty printer and realization concepts of a generic pretty printer following the original code of the Xtext framework were designed in the previous text. It occurs one problem how to propagate these innovations into the Xtext framework so that a language developer can use the generic pretty printer designed by this way and do not have to register all partial changes of a implementation of syntax highlighting and code formatting for developed language in a Google Guice configuration manually. Moreover, it has to be somehow solved how to activate generation of an initial box model from heuristic rules.

The Configuration section discusses that the behavior of the Xtext framework can be customized so that a workflow configuration file dedicated for a developed language allows for choosing which and how concepts of the framework (see the Workflow section) will be used for the language. The workflow configuration file consists of declarations of used components. These components serves to erase old generated code and other useful task, but the most important component is called Generator serving for generating a meta-model from a grammar, generating code of frameworks concept for developed language and register it into a Google Guice configuration. This component delegates its tasks to some subcomponents called fragments. The fragment is an arbitrary class extending the AbstractGeneratorFragment class. A developer of a new fragment can override the method for checking fragment's parameters defined in the workflow configuration file, methods allowing for adding some binding rules into a Google Guice configuration and not least methods that generate code into dedicated directories for framework's concept and have access to a grammar of a developed language.

The previous paragraph indicates that the workflow configuration file seems like a good way how the language developer could set the designed generic pretty printer as a pretty printer for a developed language. Utilization of this concept would require creating a fragment that would add bindings for new implementation of syntax highlighting, a fragment that would add bindings for new implementation of code formatting, a fragment that would obtain a box model from a corresponding DSL file and would mediate it to implementation of pretty printing concepts, but also a fragment allowing for starting generating an initial box model from heuristic rules. Realizations of the first two fragments is straight-forward as resulting from the previous so the following text will discuss only realizations of the last two.

Starting Generation of the Initial Box Model

The main purpose of this fragment would be to generate a DSL file containing an initial box model. Therefore a model of heuristic rules has be obtained from a corresponding DSL file, transformed to a box model which will be subsequently stored in the textual form. Standardly, the Xtext framework generates an model provider obtaining models from a DSL file together with generation of a meta-model. Further, because generating methods of fragments have an access to grammar, the transformation can be done according to the design of [HeurLang](https://github.com/mn-mikke/Model-driven-Pretty-Printer-for-Xtext-Framework/wiki/HeurLang-(Language-for-Defining-Heuristics-of-the-Initial-Box-Model\)). The [Model Transformation into a Text Representation](https://github.com/mn-mikke/Model-driven-Pretty-Printer-for-Xtext-Framework/wiki/HeurLang-(Language-for-Defining-Heuristics-of-the-Initial-Box-Model\)) section discusses that obtaining an initial box model from heuristic rules and its serialization into DSL code should be realized by utilizing the Xtend2 language. But the Generator component contains a concept of code generation dedicated for its fragments realized by utilizing templates of the Xpand. This problem can be solved by utilizing extensions of templates written in the Xtext1 language which is completely different from Xtend2. Since extensions allow for calling Java code, it can be used the design of [HeurLang](https://github.com/mn-mikke/Model-driven-Pretty-Printer-for-Xtext-Framework/wiki/HeurLang-(Language-for-Defining-Heuristics-of-the-Initial-Box-Model\)). All these facts will allow a user to start up generating an initial box model when Xtext's workflow will be running.

Mediation of a Box Model

This fragment should mediate a box model for concepts of the generic pretty printer, which will subsequently use it. Obtaining a box model can be realized by utilizing a corresponding provider as it was mentioned in the previous section. Since the Highlighting Configuration section designs usages of operators that will have own qualified names, the box model should be post-processed and given names be calculated in order to qualified names do not have to be calculated from scratch at each request to obtain them. Therefore the one solution how to mediate a box model in a rational form could be that this fragment creates an provider that obtains a model from a DSL file, post-processes it, offers it to concepts of generic pretty printer. Another solution could be that the fragment obtains a box model from a DSL file, post-processes it, stores it into a [XMI](<http://www.omg.org/spec/XMI/2.4.1/PDF/) file and creates a provider that loads a box model from the XMI file and offers it to concepts of generic pretty printer. This solution moves the time requirements associated with calculation of qualified names from the run time of pretty printer into the run time of the code generation, where speed is not so important.