Specifications - Goodly/text-thresher-chrome-extension GitHub Wiki

#USER-FLOW Specifications for Text_Thresher Browser Plug-in, TT2

BACKGROUND: The text thresher browser plug-in (tt2) differs from the web-interface created for the deciding force project (tt1) in one major respect: while the df project has already identified TUAs and saved them as text files which tt1 displays in its web-interface, tt2 will require some researcher-trained user group to identify TUAs on the web prior to recruiting volunteers and crowd workers to classify/ extract variables from TUAs. (They may also need to iterate on the creation of an ontology/schema, by reading through hundreds of documents (and trial and error) before setting the schema in stone.)

I envision researchers using Text-Thresher with a very, very simple schema in which TUAs are listed as ‘topics’ — and the ‘toggle schema’ pane (which usually shows a question ontology) is replaced by definitions of each TUA type and instructions. Three other things. First, TUA identification requires a capacity to differentiate distinct TUAs of the same type (e.g. two separate protester initiated events reported upon in one news article. In practice, it is easiest sometimes) for users to highlight all the text about protester-initiated events and label them as such, and THEN, go back through and identify which text referred to Tuesday’s march and which referred to Wednesday’s rally. (I’ve made this possible, below). Second, keeping TUAs straight while reading through sometimes long documents is greatly aided by color-coding. Color-tracking is super easy cognitively, so we want to exploit that (as I do below). Third, since text sometimes relates to more than one TUA, we would need a pop-up asking users to specify whether their overlapping highlight should overwrite other highlighting or overlap with other highlighting.

SPECIFICATIONS

Research-team UoA definition, schema development, and initial TUA identification

a. prior to even booting a computer, researchers need to begin schema creation by imagining Units of Analysis that are important to their study

b. researchers install Text-Thresher plug-in

c. lead researcher establishes a Text-Thresher ‘team’ or ‘group’

d. lead researcher invites members to join the ‘team’ or ‘group’ (which allows team-members to work on/from the same schema)

e. lead researcher chooses to input a ’TUA-identification schema’ (instead of choosing to input a Full Schema’)

f. lead researcher inputs a list of TUA-types into Text-Thresher’s (TUA-identification) schema input system. These will appear as ’TUA-type tags’ in the way that ‘topics’ currently appear in TT1

f. team discusses these UoAs/TUAs ensuring that everyone agrees that the list is both comprehensive and mutually exclusive. Team edits TUA-identification schema as needed.

g. team writes, edits, iteratively-revises its TUA-identification schema which includes definitions of each UoA and a hierarchical list of variables and attributes they want to extract from each TUA

h. team inputs these definitions, variables and attributes into TT2’s schema input system either through GUI or by uploading a csv file (wherein col1=name of UoA , col2=definition of UoA, col3=varName, col4= varDefinition, col5=subvar/attributeName, col6=subvar/attribute definition, col7=subvar/attributeName, col8=subvar/attribute definitioncol…etc… UoA color (each UoA/TUA-type gets a color)

i. TT2 displays TUA-type tags in an Annotator.js box when user highlights text

j. user selects TUA-type (TT2 assumes this is the first/only TUA of that type in the document). Text takes on color

k. TT2 instructs user to highlight all other text related to the TUA (as in TT1)

l. user continues reading document searching for TUAs; iff she highlights text and selects a TUA-type she has already identified in that document, TT2 asks “Is this a new TUA?” User selects, ‘Yes’ or, ‘No, this text is part of [TUA_type] #1, [TUA_type] #2, … [TUA_type] #N’ If ‘No, etc.’ [TUA-type] #1, [TUA-type] #N are labelled with their numbers using small grey superscript circles and white font ‘1’, ‘2’, … ’n’

m. option: hovering over text that has already been highlighted brings up Annotator box with “Revise highlight” options: ‘Add to this TUA’, ‘Reassign TUA #’, ‘Delete Highlighting’). Add to this TUA automatically assigns next highlights to the TUA until user clicks ‘Done’ in Annotator box.

n. Bottom right of screen features a small box reading “All TUAs identified.” if user clicks box, TT2 displays. “Are you sure all TUAs are identified?” If user clicks Yes, I’m sure. Data (including webpage URL, date-time, all document text, and all unique TUA offsets within the article text box are saved and submitted to database).

Expanded community TUA identification

[additional training or more narrowly constructed tasks]

Team Threshing — well-specified (and/or working already) elsewhere
Expanded community Threshing — well-specified (and/or working already) elsewhere