CSTR Proposal - oils-for-unix/oils GitHub Wiki

Related: TSV2 Proposal

Update: This is done, and it's now called QSN: Quoted String Notation. See the qsn/ directory.


Issue 582 is to implement CSTR

Intro

Rationale: ls --escaped and stat print filenames with 0xFF bytes differently! We want to document and formalize this small format.

Naming

CSTR doesn't stand for anything; it's basically short for "C String". It's spelled a bit like JSON.

Rough Sketch

It's basically a single quoted string with \ escapes that can express any byte string. We use single rather than double quotes to reduce confusion with JSON.

These are valid strings in the CSTR format:

  • '' - empty string
  • 'foo'
  • '\t\n'
  • 'foo \xFF'
  • 'nul bytes \0 ok \0'

Diff from JSON

It could be easier to describe CSTR as a "diff" from the JSON string format.

  • Take a JSON string "foo bar\n"
  • Change double quotes to single quotes: 'foo bar\n'
    • This also means that JSON's "\"" becomes '"'
    • Conversely, CSTR's '\'' is "'" in JSON.
  • And add the ability to express bytes: 'foo bar \xFF \n'. We should probably keep the ability to express code points like \u00FF.

Parser

It can be implemented in any number of ways, but it's a regular language so Oil's common style with re2c should work very well.

http://www.oilshell.org/blog/2019/12/22.html#appendix-a-oils-lexer-uses-two-stages-of-code-generation

Printer

  • If it doesn't know the encoding, it will always print \x00 for non-printable characters.
    • Common special cases: \t \r \n \'. Not sure about \0.
  • If it does know the encoding, it can print code points like \u1234.

Relation to TSV2

CSTR is a subset of TSV2. TSV2 might not be implemented in Oil v1, but CSTR is necessary for basic shell functionality like displaying filenames and argv arrays.

Unquoted variant. "bob" is valid because it doesn't TABs.

name   age
bob    10
name   age
'bob'  10

Relation to Python's repr()

  • I think the main difference that in Python, "'" is valid. In CSTR it has to be '\''.

https://docs.python.org/2/library/codecs.html#python-specific-encodings

Related

Unix Tools lists tools like find which understand backslash escapes.