Manual - Peder2911/Diverse_Folio_Isle GitHub Wiki

Manual

This is the deScry user guide. This guide will be updated as new features are added.

Step-by-step

(will add saner names to the menus, for now, everything is named construct*)

1 Select data source

First, you are prompted with

Select data source

The options are:

  • constructCsvDat : Read data from a csv file. If selected, you must specify a file path to read from.
  • constructDbDat : Read from a .db file. Similarly, you must specify a path if you select this option.
  • constructQueryDat: Read data from the internet, through the Montanus API.

Using the Montanus scraper:

When constructQueryDat is selected, you must enter the following data:

  • Please select source : Select the website to query
  • Please enter query : Specify the search pattern to use when querying the website.

Queries are currently formatted like this:

COUNTRY_STARTYEAR_ENDYEAR

For example:

colombia_1989_2018

The query tells the scraper what to request from the selected source. Queries with large timeframes will take quite a long time, as the Montanus scraper is very polite, and requests with a sizeable delay.

2 Select data pre-treatment

The next prompt asks you to

Select data pre-treatment

To classify text, different kinds of transformations are required, or can help the analysis. DFI is currently equipped with a couple of transformers through the Able Glooming Pasture module, including a simple NER-script.

A useful transformation when performing pattern-matching is to transform the data with the option

constructNlSep

This option splits all of the articles into sentence-entries, meaning that analysis can be performed on a sentence-basis instead of on the entire article text.

3 Select analysis method

Next, you are asked to

`Select analysis method``

deScry is equipped with several analysis methods, with different requirements and dependencies. The different methods are not comparable, and will give you quite different results. Methods can be combined through subsequent analysis of output data, for example through classifying or clustering text that has been filtered with pattern-matching. As mentioned in the README, models for vectorizing and classifying text are not included with the code.

The methods are:

  • constructClassVecs classify the data using a trained model in .rds format and a vectorizer
  • constructClusterVecs cluster the data using a trained vectorizer
  • constructPatternSearch return matches using a regex-based pattern.

A standard workflow:

To perform a search for occurrences of the words Cease-fire / ceasefire / cease-fire / Truce / truce and so on, enter the following commands:

  • select data source <- `constructQueryDat``
  • Please select source<- choose a source
  • Please enter query<- country_startyear_endyear
  • Select data pre-treatment<-constructNlSep
  • Separate sentences? (yes/no)<-yes
  • Select analysis method<-constructPatternSearch
  • Select field to search<-body
  • Select search engine<-regex
  • Enter search pattern (regex):<-[Cc]ease-?fire or [Tt]ruce or [Aa]rmistice
  • Enter outfile<-Specify a file to write

Overview

Text analysis with deScry has three stages:

  • data input
  • data preparation
  • data analysis

There are currently multiple options to choose from at each of these stages. The program uses a simple CLI (command line interface) menu system to recieve user input.