Lexeme Investigations - DevOps-MBSE/AaC GitHub Wiki

Yaml tokens are parsed in _cache, specifically _get_file_content_cache_entry(), scan_string(), and scan_file().

These tokens are used in _parse_source.parse_str(). Here they are sorted into a number of token lists.
The list value_tokens is then passed to the function get_lexemes_for_definition(), which returns a list of lexeme objects for each token.

The lexeme list is returned to the parse() funtion, which is called in LanguageContext.parse_and_load.

LanguageContext.parse_and_load() -> _parse_source.parse() -> _parse_source._parse_str() -> _cache.scan_string()

In defintion_parser, lexemes are primarily used in the function get_location_str(), which is called several times throughout the class for LanguageError exceptions. In populate_sub_fields(), a list of lexemes specific to that field called sub_lexemes are created.

This is a chapter taught in a class from UCI, a university in Irvine, California. It teaches about Tokens and Lexical structures as they are used in the Python interpreter. https://ics.uci.edu/~pattis/ICS-31/lectures/tokens.pdf

This is an article written from the perspective of creating a new programming language. https://hackernoon.com/lexical-analysis-861b8bfe4cb0

This is an article describing the usage of Lexemes in compilers. https://pgrandinetti.github.io/compilers/page/where-compilers-use-regular-expressions/

In general, it seems as though the usage of Lexemes in AaC is relatively simple compared to how they are used in compilers/interpreters. In the python interpreter, Lexemes are representations of the smalled piece of written code, and are devided into categories such as Operators(such as +, -, ==), Identifiers(such as variable and function names), and Comments, among others. In AaC, the Lexeme class contains a SourceLocation for the parsed item, the file it came from, and its value as a string. They are just used to identify where a specific item comes from in a yaml file, usually for the purpose of error reporting.