About - NLP-Suite/NLP-Suite GitHub Wiki
Mission Statement & Target Audience
In an age of BIG DATA, the purpose of the NLP Suite is rather to provide humanists and social scientists a wide range of computational tools for the analysis and visualization of smaller datasets, the more typical datasets humanists and social scientists use (e.g., the works of one Nobel Prize winner, a handful of in-depth interviews, a few thousand newspaper articles).
Furthermore, the NLP Suite is designed for non-specialists, for scholars with no knowledge or little knowledge of Natural Language Processing.
The NLP Suite was developed by Roberto Franzosi at Emory University with the help of many current and past Emory undergraduate students. Visit The NLP Suite Team page for more information.
What the NLP Suite does
The NLP Suite provides an easy-to-use one-stop shop for many Natural Language Processing (NLP) tasks. The NLP Suite relies on three different freeware cutting-edge parsers and annotators - spaCy, Stanford CoreNLP, Stanza - to carry out many of these tasks, in particular:
- sentence splitting
- tokenizing
- lemmatizing
- Part-of-Speech (POS) tagging
- DepRel (Dependency Relation)
- NER (Named Entity Recognition)
- parser (via several different types of parsing algorithms: spaCy, Stanford CoreNLP, Stanza)
- specialized annotators (e.g., gender, dialogue, normalized time references, coreference, sentiment analysis)
Beyond the spaCy, Stanford CoreNLP, and Stanza tools, the NLP Suite provides a broad range of options for automatic textual analysis including the cutting-edge BERT
- sentiment analysis (and related calculation and visualization of the
shape of stories
) - knowledge-base annotators (DBpedia, YAGO, WordNet)
- Subject-Verb-Object (SVO) extractor (based on Stanford CoreNLP EnhancedDependencies++, OpenIE, spaCy, Stanza)
- topic modelling (via Mallet and Gensim)
- Word2Vec (word embeddings via BERT and Gensim)
- geocoding and mapping, automatically going from texts to maps (via Stanford CoreNLP NER annotator, Nominatim or Google, and Google Earth Pro and Google Maps)
- n-grams and co-occurrences with related viewer
- word clouds
- text readability and sentence complexity measures
- nominalization
The NLP Suite provides a wide range of data visualization tools: from geographic maps, to wordclouds, network graphs, charts of all kinds, Sankey charts, sunburst charts, boxplots, treemaps, color maps.
The NLP Suite also provides a wide range of pre-processing tools such as
- file type converters from pdf/docx to txt
- file mergers and splitters
- file checkers and cleaners
- utf-8 compliance checker
- spelling checkers
- document similarities
- data handling scripts, such as
database and SQL
The tools in the NLP Suite operate at the corpus level, i.e., on a large set of documents, at the individual document level, and at the sentence level.
Unique Selling Proposition (USP)
The Unique Selling Proposition (USP) of the NLP Suite is its easiness of use. The NLP Suite interacts with the user with a set of user-friendly GUIs (Graphical User Interface) (over 50 GUIs at present), each GUI with hover-over help
on most widgets, HELP buttons
, ReadMe buttons
, reminder messages
that the user can turn On and Off, videos
, and TIPS files
for extensive explanations of the algorithms behind the GUIs (over 150 TIPS files for all GUIs at present).
Click here for more information on the NLP Suite architecture.
License
The NLP Suite is licensed under a GNU License Agreement Version 1.0, January 2020.
How to Cite the NLP Suite
Franzosi, Roberto. 2020. NLP Suite: A collection of natural language processing and visualization tools GitHub: https://github.com/NLP-Suite/NLP-Suite/wiki.
The following papers are based on the NLP Suite tools.
Published papers:
- Franzosi, Roberto. 2020. "What’s in a Text? Bridging the Gap Between Quality and Quantity in the Digital Era." Quality & Quantity. DOI: https://doi.org/10.1007/s11135-020-01067-6
- Franzosi, Roberto, Wenqin Dong, Yilin Dong. 2021. "Qualitative and Quantitative Research in the Humanities and Social Sciences: How Natural Language Processing (NLP) Can Help." Quality & Quantity. DOI: https://doi.org/10.1007/s11135-021-01235-2
- Franzosi, Roberto. 2021. "Of Narrative Time and Space: Geography Meets History in the Digital Era via Linguistics.” Digital Scholarship in the Humanities. DOI: https://doi.org/10.1093/llc/fqab090
Unpublished papers:
- Franzosi, Roberto, Wenqin Dong, Ziyang Hu, Wei Dai, Rafael Piloto, Gabriel Wang. 2020. "Automatic Information Extraction and Visualization of the Narrative Elements Who, What, When, and Where." Unpublished manuscript.
- Franzosi, Roberto, Wenqin Dong, Alberto Purpura. 2020. "The Shape of Stories." Unpublished manuscript.