Roslyn Structured Trivia - KirillOsenkov/Bliki GitHub Wiki
(from an internal explanation by Cyrus Najmabadi)
Compiler has:
- SyntaxNodes. A reference type. Composed of other SyntaxNodes, and...
- SyntaxTokens. A value type. Composed of leading and trailing...
- SyntaxTrivia. A value type. Which is normally a single atom. But which can be 'structured' and is then composed of a... SyntaxNode
So a SyntaxTrivia itself is not a node. It is just trivia. But in a few cases it can wrap a node. Let's look at those examples.
- Directives. When you have
#region foo
this will be a SyntaxTrivia on the token that follows. But that trivia can be decomposed to get the richly structured node corresponding to the RegionDirectiveSyntax. - Doc comments. When you have
/// <summary>
... this is a SyntaxTrivia on the token that follows. But that trivia can be decomposed to get the richly structured DocCommentSyntax below it (where you can grab the xml pieces out of. - SkippedTokens. One of the lesser known types. This, like all other trivia is trivia on the token before it or the token after it (depending on if it is leading or trailing trivia). It can be decomposed to get a rich node it wraps. That rich node itself then contains nothing but a sequence of tokens that were skipped by the parser.
The reason for '3' is that Roslyn REQUIRES that every character of the original source be in the final tree, and that the original source be exactly reconstructible from the tree (in math terms, it's one-to-one and 'onto'). This is also called 'a bijection'.
So let's look at '3' in practice:
here we have this errant @ token that the compiler has no clue what it can do with.
So how was this represented? Well, up through the 'equals token' everything was going great. But at that point we go "off the rails" (up until we resync back on the semicolon). In this case, the compiler synthesizes a missing 'empty identifier' (since an EqualsValueClause = value node requires a non-null value). But it also went and attached 3 pieces of trivia to it.
specifically one trivia for each of the elements in:
@ 0
there is first a SkippedTokensTrivia for @. then a normal space trivia for the <space>
that follows it. then a SkippedTokenTrivia for '0'. Importantly, both the first and last trivia are trivia.
But, like all "Structured trivia", you can dive into their structure.
And that looks like this:
There's the identifierName, which is empty, with 3 trivia. The first and last trivia are SkippedTokensTrivia. And you can get to the tokens underneath. In this case just a single "BadToken" for the '@' and a single NumericLiteral token for the 0.
So, going back to your questions:
Is SkippedTokensTriviaSyntax structured trivia? It is stored as nodes? But SyntaxTrivia isn't nodes? I'm so confused!
Yes, it is structured trivia. You can see that here:
public sealed partial class SkippedTokensTriviaSyntax : StructuredTriviaSyntax
So it is a node.
But SyntaxTrivia isn't nodes Correct. Trivia itself is a simple SyntaxTrivia struct, which shows up as in the leading-trivia and trailing-trivia lists on tokens.
in the case of structured-trivia, you have the SyntaxTrivia struct on the token. That trivia will return 'true' for:
And you can get from the SyntaxTrivia instance to the StructuredTriviaSyntax with this helper:
This is how we bridge Node->Token->Trivia->BackToNodeInTheCaseOfStructuredTrivia