1.a. Installation and Running - Mcirino/pdd GitHub Wiki

Installation

PaleoDeepDive requires installation of DeepDive and a database instance.

  1. Open Terminal and run the following command: bash <(curl -fsSL git.io/getdeepdive). This should bring up a list of options. Select the deepdive option. You may also simply run bash <(curl -fsSL git.io/getdeepdive) deepdive

  2. To run postgres (database instance), select postgres from the list of options or run the following command: bash <(curl -fsSL git.io/getdeepdive) postgres

  3. To ensure that DeepDive has installed properly, you may also run tests by selecting the run_deepdive_tests option or with bash <(curl -fsSL git.io/getdeepdive) run_deepdive_tests.

The full installation guide for DeepDive is available here.

To install PaleoDeepDive, simply download the pdd repository and run deepdive compile

DeepDive Dataflow

DeepDive applications typically follow a certain dataflow:

  1. Loading input from articles or "dark data." These can be academic articles, treatise volumes, or other literature.

  2. Natural Language Processing (NLP) markup. During this process, sentences are separated into words, with each word tagged as a potential feature. For example, this could include "Eugenia" being tagged as a genus name.

  3. Feature mapping. Based on feature identification from markup, features are mapped to each other based on predetermined relationships.

Running the Spouse Example Application

DeepDive offers an example application that is run on a small dataset, allowing it to be run on a laptop. If you would like to verify that you have installed DeepDive correctly, or if you would like to get a sense for how a DeepDive application works, running this example is recommended. A more detailed guide can be found here

  1. First, download the example by running bash <(curl -fsSL git.io/getdeepdive) spouse_example. This will create a folder for the example application.

  2. Check that the example application has everything it needs. All DeepDive applications require deepdive.conf, db.url, and schema.sql files. The example application should also contain: app.ddlog, input, labeling, mindbender, and udf. This last component is a folder that contains extractors. Once you have checked that everything is there, run deepdive compile inside the example application folder.

  3. You can download a small dataset of 1000 sampled articles to run the example on with ln -s articles-1000.tsv.bz2 input/articles.tsv.bz2 and then run deepdive do articles. This will load the article input into the database. A text editor will open with a list of commands- simply save and close the editor.

  4. You can check that the input has been loaded with deepdive query '?- articles("5beb863f-26b1-4c2f-ba64-0c3e93e72162", content).' format=csv | grep -v '^$' | tail -n +16 | head. You should see text from various articles.

  5. Next, run the markup process (by which DeepDive tags candidate features) with deepdive do sentences. Save and close the editor as before.

  6. Again, you can check that the process has been successful by running deepdive query '?- sentences("5beb863f-26b1-4c2f-ba64-0c3e93e72162", _, _, tokens, _, _, ner_tags, _, _, _).' format=csv | grep PERSON | tail. You should see lists of words matched to feature tags.

  7. Finally, we map candidate features together with deepdive do spouse_candidate.

  8. The app can also provide its expectation (probability) or every match being true if you run deepdive do probabilities. You can also run Mindtagger to identify errors in DeepDive's output.

Running PaleoDeepDive

  1. Navigate to the PaleoDeepDive folder and run deepdive compile
  2. deepdive do documents
  3. deepdive do sentences
  4. deepdive do taxon_per_doc
  5. deepdive do probabilities