Data Pipeline - mlopatka/CANOSP2020 GitHub Wiki
Data
Data comes from 2 main places:
-
CANOSP2020_ROOT_FOLDER/CANOSP_FIREFOX_SUPPORT_QUESTIONS_TRAINING_DATA/Tagged Tickets
on Google Drive contains our tagged tickets. This needs to be updated whenever we manually tag new tickets. Then it can be exported as a CSV file to be processed. -
CANOSP2020_ROOT_FOLDER/CANOSP_FIREFOX_SUPPORT_QUESTIONS_TRAINING_DATA/SUMO-data-dump-raw/tickets.json
contains tickets pulled from SUMO.
Processing
-
fetch_ticket.py
can either pull tickets from Mozilla Support using the Kitsune API, or combine our tagged tickets with the tickets pulled from SUMO. -
ticket_to_csv.py
converts a ticket JSON with this format into a CSV file with this format. -
json_to_crowdtruth_csv.py
converts a ticket JSON with this format into a CSV file with a format compatible with the CrowdTruth library.