ArchitectureGlossary - TypeCobolTeam/TypeCobol GitHub Wiki
Some definitions useful to understand the main concepts of TypeCobol:
Concept | Definition |
---|---|
Token | Elementary text part of source code. A token cannot be subdivided, tokens are produced during Scanning step |
SyntaxProperty | Used to associate a Token to a value. It's especially useful for codegen and languageServer |
SymbolDefinition | One or more token used to define something. It can be a variable, a program name, a file name, ... |
SymbolReference | One or more token used to reference something. It can be a variable, a program name, a file name, ... |
URI | Deprecated. It's a duplicate of SymbolReference/SymbolDefinition. It'll be deleted |
StorageArea | Represent a memory zone associated with a reference. |
Variable | Can be a StorageArea or a constant value (like a literal) |
CodeElement | A CodeElement is an object which represent a Cobol statement. Eg. MoveStatement, InitializeStatement. CodeElement cannot be linked together. An IfStatement cannot contains another statement. You have to use Node for that |
Node | A node is usually associated with a CodeElement. Node can be linked together in a tree structure. So you can have an IfNode which contains other Node |
Example:
01 Var1 pic X.
01 Var2 pic X.
move Var1 to Var2
On the first line: 01
, Var1
, pic
, X
, .
are tokens. The Move statement is made of 4 significative tokens move
, Var1
, to
and Var2
, all of which separated by space-separator tokens.
See classes : TypeCobol.Compiler.Scanner.Token
and TypeCobol.Compiler.Scanner.Scanner
Example:
01 Var1 PIC X(02) GLOBAL.
The GLOBAL
keyword will be turned into a SyntaxProperty<bool>
which associates the GLOBAL
keyword with the logical value true
. It materializes the presence of the GLOBAL modifier for this data definition.
See classes : TypeCobol.Compiler.CodeElements.SyntaxProperty
and TypeCobol.Compiler.Parser.CodeElementBuilder
Example:
IDENTIFICATION DIVISION.
PROGRAM-ID. MyPgm1.
The token MyPgm1
will be turned into a SymbolDefinition
with a type of ProgramName
. The SymbolDefinition
class associates a name (materialized by a token) with basic type information indicating the nature of the definition.
See classes : TypeCobol.Compiler.CodeElements.SymbolDefinition
and TypeCobol.Compiler.Parser.CobolWordsBuilder
Example:
PERFORM para-1 THRU para-4
Both para-1
and para-4
tokens will be turned into SymbolReference
instances. Just like SymbolDefinition
, a type is associated with the reference to describe the nature of the name used. Here the names are ambiguous because para-1
and para-4
may refer to paragraphs or sections.
See classes : TypeCobol.Compiler.CodeElements.SymbolReference
and TypeCobol.Compiler.Parser.CobolWordsBuilder
Example:
01 var1 PIC X.
01 var2 PIC X.
MOVE var1 TO var2
Both var1
and var2
are Variable
objects using a StorageArea
. A StorageArea
object is defined primarily by a SymbolReference
and two booleans IsReadFrom
and IsWrittenTo
indicating how the data is used in the current context. So here var1
is read and var2
is written.
See classes : TypeCobol.Compiler.CodeElements.StorageArea
and TypeCobol.Compiler.Parser.CobolExpressionsBuilder
Example:
MOVE 'A' TO var1
Both the literal 'A'
and var1
are represented as Variable
objects for this statement. Variable
objects are abstractions for all items manipulated inside a given Cobol statement. Here 'A'
will be defined with an AlphanumericValue
and the IsLiteral
property will be true, whereas var1
will get a StorageArea
pointing to the 'var1' symbol.
See classes : TypeCobol.Compiler.CodeElements.Variable
and TypeCobol.Compiler.Parser.CobolExpressionsBuilder
Example:
IF var1 = 'C' THEN
DISPLAY "OK"
ELSE
DISPLAY "KO"
END-IF
IF var1 = 'C' THEN
, DISPLAY "OK"
, ELSE
, DISPLAY "KO"
and END-IF
are all distinct CodeElement
instances. Usually a CodeElement
spans a single line of code, it is an intermediate structured representation between Tokens and Nodes. Unlike a Node
, a CodeElement
has no parent-children relationships but it already captures the nature of statements or declarations. Here the two DISPLAY
lines are described as DisplayStatement
code elements, both using a literal variable.
CodeElements are produced during the SyntaxCheck step.
See classes : TypeCobol.Compiler.CodeElements.CodeElement
and its derived classes, TypeCobol.Compiler.Parser.CobolStatementsBuilder
, TypeCobol.Compiler.Parser.CodeElementsParserStep
Example:
IF var1 = 'C' THEN
DISPLAY "OK"
ELSE
DISPLAY "KO"
END-IF
Using the same example as above, the whole code is turned into a Node
, the If
node produced here has 3 children a Then
node, an Else
node and an End
node. Then
and Else
nodes contains each one children which are theire respective display statements. The condition is located on the root If
node.
Nodes are the final structured representation of a Cobol program, in fact the whole source file is represented as a single Node object, this root node is the AST - Abstract Syntax Tree of the supplied source code.
Nodes are produced during the SemanticCheck step.
See classes : TypeCobol.Compiler.Nodes.Node
and its derived classes, TypeCobol.Compiler.CupParser.NodeBuilder.ProgramClassBuilder
, TypeCobol.Compiler.Parser.ProgramClassParserStep