OSH Parser - oils-for-unix/oils GitHub Wiki
These facts are useful for the parsing contest.
Facts
- 15 lexer modes (lexical state)
- 233 IDs (token types / node types) in 23 kinds (
core/id_kind_test.py
shows this) - 3 recursive descent parsers (command, word,
[[
) - 1 Pratt parser (arithmetic)
[
fallback reusesosh/bool_parser
- TODO: modify asdl.py to show these stats?
- X product types
- X sum types with X alternatives
Facts Requiring Dynamic Instrumentation
- what CPython opcodes does it use?
- how many lines of code does it use in CPython? (Compare with execution.)
- What is the distribution of ASDL string and array lengths per node type?
- note: there are several uses of string, not just token. Is this a good or bad optimization?
Other Parser Components That Do String Manipulation
- Brace detection -- this is a separate metaprogramming pass (doesn't depend on input). This is a recursive parser, although it operates entirely on token types and not chars/strings?
- Per-Word Algorithms
core/glob_.py
LooksLikeGlob
GlobEscape
GlobUnescape
(in case of no matches, may not be necessary)
- regex escape, for passing to
regcomp()
(not done yet)
- checking validity of names:
for invalid-var in a b; do ...
readonly invalid-var
Runtime String Manipulation
core/word_eval.py
-- after evaluating VarOp arguments, we compile globs to Python regexes, e.g. for${x%foo*}
- IFS splitting (this is quite slow and needs to be sped up!)
core/args.py
-- this is not a recursive parserecho -e
-- backslash escapes (andprintf
if it turns out we need it as a builtin)read
without -r -- backslash escapes are parsed
Other Notes on Porting to C++
- Polymorphism:
- Reader
FileLineReader
: file system e.g.source
,stdin
StringLineReader
:eval
,-c
andPS4
VirtualLineReader
: here docs.
BoolParser
can taketest_builtin._StringWordEmitter
orWordParser
Arena
instances? Not sure that requires polymorphism, since there is one type right now. We might have different policies for tools vs. the runtime though.
- Reader