Tokenizer - lyriarte/Cm7b5 GitHub Wiki

String tokenizer

A String tokenizer is the part of a compiler that performs lexical analysis.

  • Match regular expressions in a text.
  • Return text content associated with the regexp.

Bare-bones parser

git checkout tokenizer
cat tokenizer.l

Lex-based number tokenizer

git checkout tokenizer
git clean -f
make
echo "123 * + 456 -" | ./tokenizer

The parser outputs an error for the first non-recognised character.

echo "123 hello" | ./tokenizer