Manual - Peder2911/Diverse_Folio_Isle GitHub Wiki
Manual
This is the deScry user guide. This guide will be updated as new features are added.
Step-by-step
(will add saner names to the menus, for now, everything is named construct*)
1 Select data source
First, you are prompted with
Select data source
The options are:
constructCsvDat
: Read data from a csv file. If selected, you must specify a file path to read from.constructDbDat
: Read from a .db file. Similarly, you must specify a path if you select this option.constructQueryDat
: Read data from the internet, through the Montanus API.
Using the Montanus scraper:
When constructQueryDat is selected, you must enter the following data:
Please select source
: Select the website to queryPlease enter query
: Specify the search pattern to use when querying the website.
Queries are currently formatted like this:
COUNTRY_STARTYEAR_ENDYEAR
For example:
colombia_1989_2018
The query tells the scraper what to request from the selected source. Queries with large timeframes will take quite a long time, as the Montanus scraper is very polite, and requests with a sizeable delay.
2 Select data pre-treatment
The next prompt asks you to
Select data pre-treatment
To classify text, different kinds of transformations are required, or can help the analysis. DFI is currently equipped with a couple of transformers through the Able Glooming Pasture module, including a simple NER-script.
A useful transformation when performing pattern-matching is to transform the data with the option
constructNlSep
This option splits all of the articles into sentence-entries, meaning that analysis can be performed on a sentence-basis instead of on the entire article text.
3 Select analysis method
Next, you are asked to
`Select analysis method``
deScry is equipped with several analysis methods, with different requirements and dependencies. The different methods are not comparable, and will give you quite different results. Methods can be combined through subsequent analysis of output data, for example through classifying or clustering text that has been filtered with pattern-matching. As mentioned in the README, models for vectorizing and classifying text are not included with the code.
The methods are:
constructClassVecs
classify the data using a trained model in .rds format and a vectorizerconstructClusterVecs
cluster the data using a trained vectorizerconstructPatternSearch
return matches using a regex-based pattern.
A standard workflow:
To perform a search for occurrences of the words Cease-fire / ceasefire / cease-fire / Truce / truce and so on, enter the following commands:
select data source
<- `constructQueryDat``Please select source
<- choose a sourcePlease enter query
<-country_startyear_endyear
Select data pre-treatment
<-constructNlSep
Separate sentences? (yes/no)
<-yes
Select analysis method
<-constructPatternSearch
Select field to search
<-body
Select search engine
<-regex
Enter search pattern (regex):
<-[Cc]ease-?fire or [Tt]ruce or [Aa]rmistice
Enter outfile
<-Specify a file to write
Overview
Text analysis with deScry has three stages:
- data input
- data preparation
- data analysis
There are currently multiple options to choose from at each of these stages. The program uses a simple CLI (command line interface) menu system to recieve user input.