Summarization of selected references and their input processing part in paper "Syntax Errors" - sagr4019/ResearchProject GitHub Wiki

This content gets copy pasted into the Syntax Errors article after it has been created.

References


  • Fixed 27% programs completely and 19% partially using a multi-layered sequence-to-sequence neural network "with attention"
  • Encoder RNN for inputs
  • Decoder RNN for outputs
  • Generated a fixed-sized pool of names
  • Mapped each identifier (variable or function name) to a name from the pool
  • Mapped each literal to a special token (Ints -> NUM, strings -> STR)
  • Used a special token at the end of a token sequence

Considering the target sequence size is difficult. They encoded line numbers in the program representation where a statement S at a line L is represented by (l, s), where l and s are tokenizations of L and S. A program is k lines is represented as (l1, s1) ... (lk, sk) with l1, ..., lk line numbers and s1, ..., sk token sequences. A single output fix consists of a line number li and an associated statement s'i that fixes the statement si.

This results in a much smaller output (compared to the entire sequence), might be easier to predict.

⚠️ **GitHub.com Fallback** ⚠️