Data processing - JetBrains-Research/task-tracker-post-processing GitHub Wiki

Requirements for the source data

  1. The source data has to be in the .csv format.
  2. Activity-tracker files have a prefix ide-events. We use activity-tracker plugin.
  3. TaskTracker files can have any names with a prefix of the key of the task, the data for which is collected in this file. We use TaskTracker plugin at the same time with the activity tracker plugin.
  4. Columns for the activity-tracker files can be found in the const file (the ACTIVITY_TRACKER_COLUMN const).
  5. Columns for the task-tracker files can be found in the const file (the TASK_TRACKER_COLUMN const).

Processing

The correct order for data preprocessing is:

  1. Primary data processing. See documentation.
  2. Merge activity-tracker and task-tracker files. See documentation.
  3. Find tests results for the tasks. See documentation.
  4. Reorganize files structure. See documentation.
  5. [Optional] Remove intermediate diffs. See documentation.
  6. [Optional, only for Python language] Remove inefficient statements. See documentation.
  7. [Optional] Add int experience column. See documentation.

Note: you can use the actions independently, the data for the Nth step must have passed all the steps before it.

Available languages

  • C++
  • Java
  • Kotlin
  • Python