Cleaning, preprocessing and loading the data - quintelligence-health/medline-dashboard GitHub Wiki

In this section we describe the scripts that we make available to clean, preprocess and load the MEDLINE dataset to elasticSearch. Before you initiate the scripts please make sure that you have downloaded the latest MEDLINE dump and change the location of that dump within those scripts. At this point the most recent versions of elasticSearch and Kibana are expected to be installed in your server.

To run the scripts the user can do

./medline_load.sh

in a Unix or Linux system, and run

medline_load.bat

in a Windows system. This will then initiate <do_all_bulk.py> that converts the CSV files in the dump to JSON files according to the mapping in <medline_mapping.sh> to create a file with the load definitions appropriate to elasticSearch.

After the scripts are run, the user must initiate Kibana and enable the new index from the management menu.

Back to the MEDLINE exploratory dashboard