Data processing - JetBrains-Research/task-tracker-post-processing GitHub Wiki
Requirements for the source data
- The source data has to be in the .csv format.
- Activity-tracker files have a prefix ide-events. We use activity-tracker plugin.
- TaskTracker files can have any names with a prefix of the key of the task, the data for which is collected in this file. We use TaskTracker plugin at the same time with the activity tracker plugin.
- Columns for the activity-tracker files can be found in the const file (the ACTIVITY_TRACKER_COLUMN const).
- Columns for the task-tracker files can be found in the const file (the TASK_TRACKER_COLUMN const).
Processing
The correct order for data preprocessing is:
- Primary data processing. See documentation.
- Merge activity-tracker and task-tracker files. See documentation.
- Find tests results for the tasks. See documentation.
- Reorganize files structure. See documentation.
- [Optional] Remove intermediate diffs. See documentation.
- [Optional, only for Python language] Remove inefficient statements. See documentation.
- [Optional] Add int experience column. See documentation.
Note: you can use the actions independently, the data for the Nth step must have passed all the steps before it.
Available languages
- C++
- Java
- Kotlin
- Python