The Grammar Language - mn-mikke/Model-driven-Pretty-Printer-for-Xtext-Framework GitHub Wiki

The Grammar Language is a language intended to define grammar of new languages. The language is self-describing which means that the grammar of the Grammar Language is written in the Grammar Language.

Listing

A grammar of a simplified object oriented language which allows for defining the class, inheritance, inner classes of the class, torsos of methods (methods without bodies), package and import name spaces.

// 1 - Declaration of a grammar
grammar cz.gpp.Example with org.eclipse.xtext.common.Terminals hidden(WS, ML_COMMENT, SL_COMMENT) // Hidden terminals.

// 2 - Declaration of a output meta-model
generate example "http://www.gpp.cz/Example"

// 3 - Definition of input meta-models
import "http://www.eclipse.org/emf/2002/Ecore" as ecore

// 6 - Root parser rule
Model:
   package = Package // 10 - Ordered group
   ((imports += Import)* & class = Class) // 11 - Unordered group
;
// 4 - Terminal rule
terminal ID: '^'?('A'..'Z'|'_') ('A'..'Z'|'_'|'0'..'9')*;

// 5 - Enum
enum Modifier : public | protected | private;

// Other parser rules

// 7 - Usage of input meta-model elements
QualifiedName returns ecore::EString:
   ID (',' ID)* // 14 - Frequency of occurrence
;

Package: 'package' name=QualifiedName; // 13 - Assignment

Import: 'import' className=QualifiedName;

Class:
   (abstract?='abstract')? 'class' name=ID
   ('extends' superClass=[Class|ID])?
   '{'
   (methods += Method | internalClasses += Class)*  // 12 - Alternatives
   '}'
;

Method:
   visibility=Modifier returnValue=[Class|ID] name=ID
   '(' (parameters+=Parameter (',' parameters+=Parameter)*)? ')'
   '{' body=INT* '}'
;

Parameter: SpecificParameter name=ID;

SpecificParameter returns Parameter: IntParameter | StringParameter | ObjectParameter;

// 8 - Cross reference
ObjectParameter: {ObjectParameter} type=[Class|ID];

// 9 - Action
IntParameter: {IntParameter} 'int';

StringParameter: {StringParameter} 'string';

Common Declarations

Declaration of the Grammar

The declaration (see the Listing's comment n.1) should contain a qualified name of the grammar. Then it is possible to import the grammar rules from another grammar by definition of the grammar's name after the "with" keyword. Although, the grammar of the Grammar Language allows for defining an import of grammar rules from more grammars, this possibility is explicitly disabled. Further, hidden rules can be defined, whose parsed result is not intended to be a part of the final model, in this part of the grammar. This feature serves primarily for definition of comments or terminal separators, such as a tab, a white space, etc.

Output Meta-model

When the meta-model of the final model is not available and a language developer wants to generate the given meta-model from the grammar, the generated meta-model should be declared. The grammar's name and [URI](http://en.wikipedia.org/wiki/Uniform_resource_ identifier) with the HTTP schema, under which the meta-model will be registered for the later import from another grammar specification, follows after the "generate" keyword (see the Listing's comment n.2 ).

Input Meta-models

When it is necessary for the result of the meta-model to contain elements of other meta-models, the meta-models have to be imported. This action can be executed by the following declaration (see the Listing's comment n.3). The URI with HTTP schema of the imported meta-model follows after the "import" keyword and further the alias of meta-model, which is useful for referencing meta-model's elements from the grammar, follows after the "as" terminal.

Grammar Rules

Although, it is possible to convert some code to the model, the template of code have to be defined as well as the model template is represented by the meta-model. The grammar performs the role of the code template. As it is well known that the definition of the grammar consists of a number of rules defining nonterminals on the basis of terminals and nonterminals. However, the Xtext framework fulfills this concept, the Grammar Language is designed with respect to the relationship between the grammar and the generated model.

Terminal Rules

As it is common in the world of parsers, the terminal has to be defined before it is used in a grammar rule. Definitions of terminal rules represents the lexical analysis of the parser, which is usually realized by regular expressions. The terminal rules of the Grammar Language (see the Listing's comment n.4), which serve to defining terminals, do not represent any exception. After the "terminal" keyword follows the terminal's name and the given regular expression that are separated by a colon. If the "returns" keyword does not follow after the terminal's name then the sample parsed by the regular expression will be represented by ''ecore::EString'' in the final model, otherwise the name of any type from the Ecore meta-model should follow after the mentioned keyword.

Enums

Even though the regular expressions provide opportunity to define textual enumerations, the Xtext framework offers special enumeration rules (see the Listing's comment n.5). The name of the rule and the textual enumeration follow after the "enum" terminal. The situation of the final types for generated model is similar to the terminal rules. When the developer wants to use a type from the Ecore model different than ''ecore::EString'', he has to put the "returns" terminal with the name of the type after rule's name.

Parser Rules

Parser rules are essentially grammar rules specifying nonterminals. Parser rules fullfill the role of syntax analysis of the parser. Each rule begins with the nonterminal's name (see the Listing's comment n.6). If it is necessary, the "returns" keyword and the name of a type, which could be from any imported meta-model or it does not have to be anywhere specified, follows after the nonterminal's name (see the Listing's comment n.7). The type will be subsequently a part of the generated meta-model. Furthermore, the rule contains the colon followed by a given definition of the rule, whose possible details will be described later. As it is well-known that the grammar has to contain one initial rule in order to specify the root of the Abstract Syntax Tree (AST), the Grammar Language denotes this rule by the first position among all rules (see the Listing's comment n.6).

Defining Elements of the Parser Rule

As it was mentioned earlier, the definition of the nonterminal depends on usage of other terminals and nonterminals. The following text describes how to use defined terminals and nonterminals in the Grammar Language.

Essential Elements

These grammar elements are essential undivided building blocks for the definition of a nonterminal. The set of possible elements includes the following.

Keyword - The keyword is an arbitrary string enclosed in quotation marks or apostrophes in terms of the grammar definition. This means that the code has to contain the string of the keyword on a given position.
Rule Call - The rule call is essentially any usage of a terminal, enumeration or parser rule. If the language developer wants to use some rule, then the name of the rule should be typed.
Cross Reference - If the Grammar Language contained only rule calls, the final model would be every time a tree structure. Cross references bring the opportunity to integrate cycles to the final model. As a parser rule produces a model element which is equivalent to some part of the parsed code, the cross reference enables to refer to the model element (see the Listing's comment n.8). The element which corresponds to a parser rule containing a cross reference will include the reference to the element specified by the cross reference from the model's point of view. On the other hand, the code with cross references has to meet certain appurtenances. Cross references are strings enclosed in square brackets where the string is a type of the referenced element. Moreover, the referenced element has to contain the "name" feature whose value serves to rapport with a token belonging to a cross reference.
Action - Although, the Xtext framework does not support actions that are responsible for the semantic analysis and that are well-known from common languages for compiler development, a certain sort of actions are contained in the Grammar Language (see the Listing's comment n.9). Actions are defined into curly brackets and are important due to two reasons. One of them serves to creation of model elements. Consider the situation when it is necessary to create the element from the parser rule which has no defining elements or contain only defining elements that do not cause an instantiation of the element such as keywords, then the name of instantiated type enclosed in curly brackets represents the action which instantiate the element. Furthermore, the actions can be used for an assignment of a final model element to a collection owned by another model element.

Composite Elements

This kind of defining elements of the parser rule assembles essential or other composite elements together. The main difference among types of the composite element is what the relation is among the individual sub elements.

Ordered Group - This assembly of defining elements is the most natural. The sub elements are separated by a sequence of white characters (see the Listing's comment n.10) and it must hold that the parts of code corresponding to the given sub elements have to be sorted by the order of the sub elements. For example in some code it must hold that a rule call follows after a certain keyword from some position in the code.
Unordered Group - This assembly has the opposite approach to code ordering. When two or more defining elements are the sub elements of an unordered group then the parts of code corresponding to the given sub elements can be sorted arbitrarily. Furthermore, the sub elements are separated by the ampersand (see the Listing's comment n.11).
Alternative - Alternatives (see the Listing's comment n.12) allow for the sub elements that their possible parts of code can occur on the same place elsewhere. In other words, some part of code has to correspond one of the sub elements of the alternative. The sub elements are separated by the vertical bar.
Assignment - Assignments (see the Listing's comment n.13) are special composite elements encapsulating only one sub element. The assignment exploits alternatives for encapsulation of more sub elements so that the alternative is used as a sub element of an assignment. Assignments are intended to associate the results of defining sub elements with the features of the final model element of the parent parser rule. The definition of an assignment has the following format. The assignment operator follows after feature's name and further it follows a keyword, a rule call, a cross reference or an alternative. The assignment operator has three variants. The first is "=" that stores the result of sub element to a given feature. The second is "+=" that adds the result of sub element to a given feature, which has to be a collection. And the last is "?=" that transforms an occurrence of sub element to boolean sign which will be stored to a given feature.

Definition of Occurrence

Although, it has been mentioned how to define elements of the parser rule, it has not been told yet how to define occurrence multiplicity of a defining element. The authors of the Xtext framework were inspired by common regular expressions (see the Listing's comment n.14). The Grammar Language use the question mark for one possible occurrence, the plus character for the sure occurrence that can be multiple, the asterisk for the possible multiple occurrence and one sure occurrence is defined by no character which represents default behavior.