Make Import Details - matthewcornell/wikitest GitHub Wiki
This page summarizes the details of importing from the CSV files.
Import runs in two phases: Load the CSV files' data and then create the unique_case table. The details of each
phase are detailed below in order of execution.
import_data.R
(See note in Servers-Databases-And-Code-Repos#data_dir-variable re: the $DATA_DIR variable.)
-
1_original_to_sql.R: Uses the$DATA_DIR/dengue_case_metadata.txtfile to identify the CSV files to load, and then loads each into its own table in theoriginal_dataschema. Loads each file's corresponding/Users/cornell/IdeaProjects/dengue-data/file_specific_codeentry. See Dengue-Cases-Database#original_data for theoriginal_dataschema's details.Note: Due to the variation we saw early on in data types and formats, this database imports all data from .csv files originally as text. We avoid R type conversions by using the
safe_read_csvfunction from our ownintegratorpackage which among other things calls read.csv as:read.csv(file = file, check.names = FALSE, colClasses = "character", stringsAsFactors = FALSE) -
2_standardize_pipeline_code.R: defines record access functions -
3_standardize_data.R: Standardizes each table loaded in step 1, as documented in thestandardized_dataschema's details at Dengue-Cases-Database#standardized_data. -
4_merge_into_case_data.R: Creates the single, mergedstandardized_data.case_datatable from individual case data tables, adding UID data as well. (See Dengue-Cases-Database#standardized_data for details.) -
5_old_counts_new_names.R: Loads the data in theoriginal_data.dengue_counts_1968_to_2005table intodengue_counts_1968_to_2005, normalizing the data by moving province information into a single province_name column, normalizing dates, and dropping unused columns. -
1_old_counts_new_smooth.R: Modifies the data in thedengue_counts_1968_to_2005table to include weekly, biweekly, and monthly counts, then uses a spline curve to interpolate in-between data, and finally saves results to theweekly_dengue_counts_1968_to_2005table.
create the unique_case table
This phase runs a sequence of SQL files that creates the final unique_case table based on the case tables built by the
previous phase, and also creates functions and additional tables, which are documented in Dengue-Cases-Database.
Here are the tables and functions in order of creation:
1_create_first_delivery_tb.sql:most_recent_deliveries()functionfirst_deliverytable
2_create_standardized_data.case_reports_with_delivery_information_tb.sql:standardized_data.case_reports_with_delivery_informationtable
3_create_unique_case_data_tb.sql:unique_case_data()functionunique_case_datatable
4_create_first_reports_tb.sql:first_reportstable
5_create_aggregate_table_fn.sql:aggregate_table()function