OSH Optimization Log - oilshell/oil GitHub Wiki

OSH Parser

Log

This section is a record of how OSH was optimized.

  • 0.2.alpha1 -- fake release to measure parsing speed
  • 0.2
    • here doc optimization
  • 0.3
    • Optimization
      • generate C code with core/lexer_gen.py and re2c (double code generation)
      • generate Python code with asdl/gen_python.py (rather than Python metaprogramming)
      • TODO:
        • Id equality optimization
    • Benchmarks
      • added virtual memory benchmarks. NOTE: This caught a regression due to making new Id instances!!!

(release notes: my somewhat risky style of metaprogramming feels like it's working. I'm trading off this risk for the risk of having "a huge pile of C code that nobody can modify")

Performance Regressions

  • For some reason, CPython on Ubuntu is faster than my OVM build. I'm compiling with -O3. Is this because of profile-guided optimization?
  • Probably will regress: when switching to OPy bytecode. The compiler doesn't generate exactly the same bytecode.

Discarded Ideas

Because these are FAQs:

  • PyPy (measured)
  • Cython (not measured, but discarded in principle)

Optimization Ideas

  • ASDL could be C extension instead of Python. Inline integers instead of having PyObject pointers for them.
  • Coalescing tokens more aggressively -- will save space.
  • combine span and token structs.
  • later: rewrite LineLexer/Lexer so Read() isn't so hot?
  • OHeap size optimization Ideas:
    • empty array can be encoded as Ref(0) rather than a Ref to something with length 0.

Benchmark Ideas

  • Size benchmarks for parser / LST
    • Python heap stats (revive "pry")
    • More oheap benchmarks. Including compressed files.
    • Compressed address space!!!
  • Runtime Benchmarks
    • abuild -h -- easy
    • find a configure script that executes in like 10 seconds? dash?
      • OCaml's hand-written configure. 3.5 seconds with OSH. 2.7 seconds with bash. same with dash.
      • OK so OSH is sort of usable.