Parsing UTF 8 input - nathansobo/treetop GitHub Wiki
It’s not difficult to parse UTF-8 input in Treetop.
If you’re running Ruby 1.8 or older, you can just require 'active_support'
and pass input.mb_chars
to the parser. String#mb_chars
creates a multibyte-safe proxy for string methods that would normally choke on multibyte characters. It’s not free, of course. Expect your parser to be about 10% slower.
If you have Ruby 1.9, you don’t have to do anything special. Strings in 1.9 are (mostly) encoding aware. If you do require active_support, String#mb_chars just returns self. Thus, requiring active_support is the easy way to run one version of your parser on multiple Ruby versions.