Dataset - Sablayrolles/debates GitHub Wiki

Debates wiki -- Dataset

Dataset

Scripts

You have the following list of dataset to help you managing it :

  • directory_creator.sh : create directories automaticly
	echo "Usage : directory_creator.sh -[h|c|y|d] (name) (year)"
	echo "-h : affiche l'aide et quitte"
	echo "-c : creer le pays 'name'"
	echo "-y : ajoute l'annee 'year' pour le pays 'name'"
	echo "-d : ajoute un debat dans le pays 'name' pour l'annee 'year'"
  • create_aam_as_files.py : create .aam and .as files (need to use directory_creator before) See below for informations about the strcuture of infos.xml

Composition

The dataset is group by country and year of debates following this tree:

dataset/
	country_name/
		year/
			num_of_debate/
				annotated/
					ac-aa/
					debate.aam
					debate.as
				brut/
					x.txt
					...
				full/
					full.txt
				reactions/
					x.txt
					...
				segmented/
					x.txt
					...
				infos.xml

Descriptions

  • full.txt contains the full script of the debate
  • brut/ contains the brut script cut in topics(questions) numbered from 1 to x
  • segmented/ contains the segmented script cut in topics(questions) numbered from 1 to x with '&' for split character (need to add auto generate method for this)
  • reactions/ same format as brut but contains also the reactions from the persons (format will be changed in the future for easier use and annotation [glozz])
  • annotated/ contains the files for glozz annoted and the parameters file (need to add auto generate method for this)
  • infos.xml contains data about the debates

(need to add script for auto generate glozz files and segmented files and annoted reations easier)

Infos.xml

The file need absolutly to respect this format :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<informations>
<debateNum>1</debateNum>
<country>usa</country>
<language>en</language>
<numberQuestion>9</numberQuestion>
<date>09/26/16</date>
<participant>
<presentator>HOLT</presentator>
<candidate>CLINTON</candidate>
<candidate>TRUMP</candidate>
</participant>
</informations>

Format

The script need to be like below:

NAME: text

NAME: text

Home wiki file : Home

⚠️ **GitHub.com Fallback** ⚠️