TypeCobolMainPhases - TypeCobolTeam/TypeCobol GitHub Wiki

TypeCobol project - Code analysis steps

Compilation pipeline : Compiler/

Input : libraryName, textName, sourceFileProvider, compilerOptions

Output : TextDocument, TokensDocument, ProcessedTokensDocument, SyntaxDocument, SemanticsDocument

Namespace : TypeCobol.Compiler

Class Description
CompilationProject Collection of Cobol files compiled together, maintains a shared cache of preprocessed files.
CompilationDocument Partial compilation pipeline, from file loading to preprocessor step, used for COPY files.
CompilationUnit Complete compilation pipeline, from file loading to semantic analysis step, used for PROGRAM files.

Overview of the documents providing the results of the successive compilation steps :

Namespace Class Description
TypeCobol.Compiler.File CobolFile Cobol source text file.
TypeCobol.Compiler.Text TextDocument Lines of unicode characters.
TypeCobol.Compiler.Scanner TokensDocument Lines of lexical tokens (before preprocessing).
TypeCobol.Compiler.Preprocessor ProcessedTokensDocument Iterator on processed tokens, after compiler directives parsing, COPY and REPLACE directive implementation.
TypeCobol.Compiler.Parser SyntaxDocument Lines of code elements (set of tokens matched by a grammar production) and code model with unresolved symbol references.
TypeCobol.Compiler.TypeChecker SemanticsDocument Code model with resolved symbol references, data model, data & control flow analysis.

Overview of the events sent between these documents to implement the incremental compilation pipeline :

Class Change event
CobolFile CobolFileChangedEvent
TextDocument TextChangedEvent
TokensDocument TokensChangedEvent
ProcessedTokensDocument TokensChangedEvent
SyntaxDocument CodeElementChangedEvent
SemanticsDocument CodeModelChangedEvent

Step 1 : Compiler/File - File loading, Characters decoding, Line endings

Step 1.1 : File loading

Input : libraryName, textName

Output : binary Stream

Namespace : TypeCobol.Compiler.File

Class Description
SourceFileProvider Enables the compiler to find Cobol source files referenced by name in the Cobol syntax (collection of Cobol text libraries filtered by libraryName).
ICobolLibray Common interface for Cobol text libraries (dictionary of files indexed by textName), could be implemented as a remote dataset on the mainframe, a repository in a version control system, or a simple directory on the local machine.
CobolFile Abstract class representing a Cobol text file, with OpenInputStream and OpenOutputStream methods.

Implementation for files found in Windows directories on the local machine :

Class Description
LocalDirectoryLibrary Implementation of an ICobolLibrary as a local directory containing Cobol text files.
LocalCobolFile Implementation of a CobolFile as a text file in the local filesystem.

Step 1.2 : Characters decoding & Line endings

Input : binary Stream, ibmCCSID (IBM Coded Character Set ID), fixedLineLength or endOfLineDelimiter

Output : IEnumerable<char> (stream of Unicode chars with normalized Cr/Lf line endings)

Namespace : TypeCobol.Compiler.File

Class / Method Description
IBMCodePages Gets the .Net Encoding equivalent for an IBM Coded Character Set ID.
CobolFile.ReadChars Reads the characters of the source file as Unicode characters. Inserts additional Cr/Lf characters at end of line for fixed length lines.

Step 2 : Compiler/Text - Text lines, Source text areas

Step 2.1 : Text lines

Input : IEnumerable<char> (stream of Unicode chars with normalized Cr/Lf line endings)

Output : ITextDocument, a list of ITextLines

Namespace : TypeCobol.Compiler.Text

Class Description
ITextDocument Interface enabling the integration of the Cobol compiler with any kind of text editor. A document is both : an array of characters which can be accessed by offset from the beginning of the document, and a list of text lines, which can be accessed by their index in the list. A document sends notifications each time one of its lines is changed.
ITextLine Interface enabling the integration of the Cobol compiler with any kind of text editor. Each line has an index to describe its position in the document.

Implementation for a read-only text document in memory :

Class Description
TextDocument Immutable Cobol text document for batch compilation. Document loaded once from a file and never modified.
TextLine Immutable Cobol line for batch compilation. Text line loaded once from a file and never modified.

Implementation for an interactive text editor in TypeCobolStudio :

Class Description
AvalonEditTextDocument Adapter used to implement the TypeCobol.Compiler.Text.ITextDocument interfaceon top of an AvalonEdit.TextDocument instance.

Step 2.2 : Source text areas (columns reference format)

Input : ITextLine, ColumnsLayout

Output : TextLineMap, a list of source text areas (SequenceNumber, Indicator, Source, Comment)

Namespace : TypeCobol.Compiler.Text

Class Description
ColumnsLayout CobolReferenceFormat : fixed-form reference format / Columns 1-6 = Sequence number / Column 7 = Indicator / Columns 8-72 = Text Area A and Area B / Columns 73+ = Comment, or FreeTextFormat : there is not limit on the size a source line / the first seven characters of each line are considered part of the normal source line and may contain COBOL source code / column 1 takes the role of the indicator area / there is no fixed right margin, but floating comment indicators : *>.
TextAreaType Enumeration of the standard text areas : SequenceNumber, Indicator, Source, Comment.
TextArea Portion of a text line with a specific meaning.
TextLineMap Partition of a COBOL source line into reference format areas (also detects a list of compiler directives keywords which can be encountered before column 8 even in Cobol reference format).
TextLineType Line types defined in the Cobol reference format : Source, Debug, Comment, Continuation, Invalid, Blank.

Step 3 : Compiler/Scanner - Lexical analysis, Line continuations

Step 3.1 : Lexical analysis

Input : textLine, previousTokensLine, compilerOptions

Output : IList<Token>, IList<Diagnostic>, initial & final scan states

Namespace : TypeCobol.Compiler.Scanner

Class Description
Scanner Partitions a Cobol source text line into lexical tokens (Cobol words).
Token Substring of the source text corresponding to a character string or a separator. A character-string is a character or a sequence of contiguous characters that forms a COBOL word, a literal, a PICTURE character-string, or a comment-entry. A separator is a string of contiguous characters used to delimit character strings.
TokenType Enumeration of the 454 different types of tokens defined by the Cobol syntax, arranged in token families (see TokenFamily enum).

View of the source text file as a tokens document :

Class Description
TokensLine List of tokens and diagnostics found by scanning one line of text.
TokensDocument View of a source document after the lexical analysis stage as lines of tokens.
TokensLineIterator Iterator over tokens stored in TokensLines and originating from a single document (with token type filtering).

Implementation of context-sensitive lexical analysis :

Class Description
MultilineScanState Internal Scanner state propagated from one line to the other when compiling a complete source file.

Step 3.2 : Line continuations

Namespace : TypeCobol.Compiler.Scanner

Class / Method Description
Scanner.ScanTokensLine Handle continuation from the previous line : Any sentence, entry, clause, or phrase that requires more than one line can be continued in Area B of the next line that is neither a comment line nor a blank line. The line being continued is a continued line; the succeeding lines are continuation lines.
ContinuationToken Class used for tokens which are a continuation of other tokens starting on previous lines.

Step 4 : Compiler/Preprocessor - Compiler directives, COPY & REPLACE

Step 4.1 : Compiler directives parsing

Input : IList<TokensLine> where Cobol statement tokens are mixed with compiler directives tokens

Output : IList<ProcessedTokensLine> where the tokens forming one compiler directive are grouped into one specific single token

Namespace : TypeCobol.Compiler.Preprocessor

Class Description
CobolDirectivesParser Parser generated from the Antlr4 grammar file CobolDirectives.g4 (in project TypeCobol.Grammar), used to match compiler directive tokens when specific compiler directive starting keywords are encountered.
CompilerDirectiveBuilder Builds a CompilerDirective object (see below) while visiting its parse tree.
ProcessedTokensLine Different list of tokens where the tokens forming one compiler directive have been grouped into one specific single token (see below).

Namespace : TypeCobol.Compiler.Scanner

Class Description
TokensGroup Replaces a list of tokens all located on the same source line by a single token. Used to mark the limits of compiler directives and replaced token groups.
CompilerDirectiveToken After the text preprocessing phase, this single token replaces all the tokens contributing to a compiler directive. If the compiler directive spans several text lines, one single CompilerDirectiveToken will be created on the first line, and generic ContinuationTokenGroups will be created on the following lines and will reference this first token. This token also holds a reference to a corresponding CompilerDirective object.

Namespace : TypeCobol.Compiler.Directives

Class Description
CompilerDirective (14 derived classes) Object representing a compiler-directing statement : a statement that causes the compiler to take a specific action during compilation. You can use compiler-directing statements for the following purposes : - Extended source library control (BASIS, DELETE, and INSERT statements) - Source text manipulation (COPY and REPLACE statements) - Controlling compiler listing (*CONTROL, *CBL, EJECT, TITLE, SKIP1, SKIP2, and SKIP3 statements) - Specifying compiler options (CBL and PROCESS statements).
IBMCompilerOptions You can direct and control your compilation by using compiler options or by using compiler-directing statements (compiler directives). CBL and PROCESS compiler directives can change the compilation options on the fly.

Step 4.2 : COPY directive implementation

Input : IList<ProcessedTokensLine>

Output : IDictionary<CopyDirective, ImportedTokensDocument>, CopyTokensLinesIterator

Namespace : TypeCobol.Compiler.Preprocessor

Class Description
(Directives.)CopyDirective Object describing all the attributes of the COPY compiler directive (including the REPLACING clause).
IProcessedTokensDocumentProvider The project system is free to implement the most efficient strategy to build and cache ProcessedTokenDocuments for all the files imported by COPY directives. This interface is implemented by the class Compilation Project.
ImportedTokensDocument Local view of a ProcessedTokensDocument imported by a COPY directive in another ProcessedTokensDocument. Handles a nested replace iterator to implement the REPLACING clause on top of of an tokens line iterator on the imported document.
CopyTokensLinesIterator Iterator over tokens stored in a ProcessedTokensDocument. This iterator handles COPY directives : it returns the tokens from the main document AND all tokens imported from secondary documents. This iterator does not handle REPLACE directives : it simply returns REPLACE CompilerDirectiveTokens which will be handled at another level by a ReplaceTokensLinesIterator.

Step 4.3 : REPLACE (and COPY REPLACING) directive implementation

Input : ITokensLinesIterator

Output : ReplaceTokensLinesIterator

Namespace : TypeCobol.Compiler.Preprocessor

Class Description
(Directives.)ReplaceDirective Object describing all the attributes of the REPLACE compiler directive.
(Directives.)ReplaceOperation SingleTokenReplaceOperation : one comparison token => zero or one replacement token. PartialWordReplaceOperation : one pure partial word => one replacement token. SingleToMultipleTokensReplaceOperation : one comparison token => more than one replacement tokens. MultipleTokensReplaceOperation : one first + several following comparison tokens => zero to many replacement tokens.
ReplacedToken ReplacedToken : Token placeholder used to implement the REPLACE and COPY REPLACING compiler directives in the most common case when a single source token is replaced by a single replacement token. ReplacedPartialCobolWord : Token placeholder used to implement the REPLACE and COPY REPLACING compiler directives when the variable part of a partial Cobol word is replaced by a prefix or suffix. ReplacedTokenGroup : Token placeholder used to implement the REPLACE and COPY REPLACING compiler directives in the less common case when a list of source tokens are replaced by a list of replacement tokens.
ReplaceTokensLinesIterator Implements the REPLACE directives on top of an underlying tokens iterator. Returns ReplacedToken objects each time a ReplaceOperation is performed.

Step 5 : Compiler/Parser - Code elements parsing, Code model

Step 5.1 : Code elements parsing

Input : ITokensLinesIterator

Output : IList<CodeElement>

Namespace : TypeCobol.Compiler.Parser

Class Description
CobolParser Parser generated from the Antlr4 grammar file Cobol.g4 (in project TypeCobol.Grammar), used to match Cobol statements after all preprocessor steps have been applied to the source tokens. The starting rule called to match code elements and get a linearized view of the Cobol syntax (useful for incremental parsing) is CobolParser.codeElement().
CodeElementBuilder Builds a CodeElement object (see below) while visiting its parse tree.
TracingCobolParser Utility class used to attach the parsed CodeElements to line numbers in the main source document (necessary for incremental parsing - Not yet implemented).

Namespace : TypeCobol.Compiler.CodeElements

Class Description
CodeElement (117 derived classes) The Cobol syntax can be decomposed in 117 elementary code elements : entries, headers, identifications, paragraphs, statements, statement conditions, statement ends. Objects collecting all the properties of these code elements. At this step, the nested structure of the Cobol syntax is not represented, and the symbol references are not resolved.
Symbol Symbols defined in the Cobol syntax.
SymbolReference Reference to a symbol defined in the Cobol syntax.

Step 5.2 : Code model

Input : IList<CodeElement>

Output : Program or Class

Namespace : TypeCobol.Compiler.Parser

Class | Description ---|--- ProgramClassBuilder | Builds a Program or Class object (see below) while visiting its parse tree.

Not yet implemented.

Namespace : TypeCobol.Compiler.CodeModel

Class Description
Program Object graph (tree) representing a Cobol program.
Class Object graph (tree) representing a Cobol class.

Step 6 : Compiler/TypeChecker - Semantic analysis, Type checking

Input: TODO

Output: TODO

Namespace: TypeCobol.Compiler.CodeModel

Class | Description ---|--- SymbolTable | Symbols declared in a given scope

Namespace: TypeCobol.Compiler.CodeElements.Expressions

Class | Description ---|--- QualifiedName | Fully qualified symbol reference

Step 7 : Compiler/Generator - Cobol source code generation from TypeCobol extended syntax

TO describe.

⚠️ **GitHub.com Fallback** ⚠️