ParsingCaseStudies - oils-for-unix/oils GitHub Wiki

What algorithms and tools do production-quality languages use for parsing?

  • lexing
  • parsing, including operator precedence parsing
  • AST representation

TODO: Add links

POSIX Shells

  • bash: multiple parsers.
    • yacc is used for the command language (judged to be a mistake in retrospect).
    • recursive descent (Wirth-style) is used for arithmetic, i.e. $(( ))
    • recursive descent for boolean language (in parse.y)
    • ad hoc parser for variable/word language
  • dash: hand-written lexer, recursive descent parser, generated AST nodes
  • mksh: stateful lexer
  • zsh:

Alternative Shells

See ExternalResources

PowerShell?

Build Tools

  • Ninja: re2c for lexer (for speed, used to be hand-coded), recursive descent for parser.
  • Bazel: hand-coded lexer and recursive descent parser

Domain-Specific Languages

  • sqlite: uses bespoke Lemon parser generator (bottom up).
  • protobuf compiler: hand-written lexer and recursive descent parser. (What about other schema languages?)

Other Programming Languages

  • Python
    • hand-coded lexer with indentation stack
    • parser generated with bespoke pgen.c
    • generated AST, with Zephyr ASDL
  • Ruby
    • big yacc grammar
    • JRuby: Jay, yacc for Java
  • Perl
  • PHP
  • Lua: hand-coded lexer and recursive descent parser in C. There is no AST because it generates code while parsing!
  • Wren: hand-coded lexer, recursive descent with Pratt parsing.
  • Dart
  • CoffeeScript: hand-coded lexer with regexes, token "fixups", JISON bottom up parser
  • Java
  • C#
  • Clang -- hand-written parser, enormous hand-written AST with C++ classes
  • Go: hand-written C that was automatically converted to Go
  • Rust
  • Swift
  • OCaml
  • Haskell
  • Awk: sort of a poster child for yacc.
  • Scientific languages
    • Julia: lexer and parser are hand-written in femtolisp! Enables Julia macros.
    • R: src/main/gram.y is 3500 lines
    • Mathematica?
  • JavaScript
    • v8
    • duktape
    • mujs
    • narcissus (JS )

Tiny Languages

  • TCC
  • tinypy -- Pratt parsing in Python.

Parser Generator Success Stories

As counter point:

  • R uses yacc
  • Ruby uses yacc. JRuby uses it too.

(bash uses yacc, but it's not a success story.)