GraphQL Lexer Technical Documentation - slashmo/graphql-swift-repl GitHub Wiki

On this Wiki page, you can learn how I've implemented my GraphQL Lexer.

Lexing

Lexing is the process of turning a set of characters, better known as String, into a set of tokens. A token represents a meaningful substring of the original String with a start and an end position, and an optional value.

Overall structure

I split the Lexer into two objects: Lexer and Lexer.Token, both of which are structs.

Lexer.Token

As there are multiple types of tokens I started with implementing a Lexer.Token.Kind enum with cases for each type. The Token struct uses this enum as one of its properties. The other properties hold the position and an optional value for these types: .string , .comment, .int, .float, and .name. I decided to name the types Kinds, because naming a type Type sounds like a bad thing to do 😁

Lexer

The Lexer struct is itself split into a private and public API.

Private API

Cont

The internals use another struct called Cont, which holds a single property called run of type (inout Substring) throws -> (Token, Cont)?. The implementation of this run property is up to the creator of the instance. It's basically a way to describe the lexing of a single token and a reference to the Cont instance that should be used to lex the next token. That's why the return value is a tuple of both Token & Cont. A call to run can also return nil to indicate that no more tokens are to be lexed.

Besides holding a mutable instance of a Cont the Lexer holds a mutable instance of the remaining substring.

startState & consumeToken

The startState method is the initial value of the cont property. It returns a Cont and implements its run property by removing the first character from the remaining substring and by returning the result of a call to consume(_:startingAt:)s returned Cont. From then on it's basically just calling other private methods that return Conts based on the characters UTF-8 codepoint.

Public API

The public API allows users to initialize new Lexers and to get the next lexed token by calling the advance() method. More information on these two can be found in their respective doc-blocks. Internally, the advance() method calls conts run property to lex the next token. Given it finds one it assigns the Cont contained in the return from the run call to cont, and returns the token.

Error handling

Errors are handled by throwing instances of GraphQLError. These errors are exposed to the user through the advance() method and hold the failure's location and a human-readable message.

Example

Let's say you want to lex a very simple query like this:

{
  hello
}

You'd start by constructing a new Lexer, passing in the query, and calling advance as long as it's return value is not nil:

do {
  var lexer = try Lexer(lexing: "{ hello }")
  while let token = try lexer.advance() {
    print(token)
  }
} catch let error as GraphQLError {
  print("\(error.start): \(error.message)")
} catch {
  print(error)
}