Make Import Details - matthewcornell/wikitest GitHub Wiki
This page summarizes the details of importing from the CSV files.
Import runs in two phases: Load the CSV files' data and then create the unique_case
table. The details of each
phase are detailed below in order of execution.
import_data.R
(See note in Servers-Databases-And-Code-Repos#data_dir-variable re: the $DATA_DIR
variable.)
-
1_original_to_sql.R
: Uses the$DATA_DIR/dengue_case_metadata.txt
file to identify the CSV files to load, and then loads each into its own table in theoriginal_data
schema. Loads each file's corresponding/Users/cornell/IdeaProjects/dengue-data/file_specific_code
entry. See Dengue-Cases-Database#original_data for theoriginal_data
schema's details.Note: Due to the variation we saw early on in data types and formats, this database imports all data from .csv files originally as text. We avoid R type conversions by using the
safe_read_csv
function from our ownintegrator
package which among other things calls read.csv as:read.csv(file = file, check.names = FALSE, colClasses = "character", stringsAsFactors = FALSE)
-
2_standardize_pipeline_code.R
: defines record access functions -
3_standardize_data.R
: Standardizes each table loaded in step 1, as documented in thestandardized_data
schema's details at Dengue-Cases-Database#standardized_data. -
4_merge_into_case_data.R
: Creates the single, mergedstandardized_data.case_data
table from individual case data tables, adding UID data as well. (See Dengue-Cases-Database#standardized_data for details.) -
5_old_counts_new_names.R
: Loads the data in theoriginal_data.dengue_counts_1968_to_2005
table intodengue_counts_1968_to_2005
, normalizing the data by moving province information into a single province_name column, normalizing dates, and dropping unused columns. -
1_old_counts_new_smooth.R
: Modifies the data in thedengue_counts_1968_to_2005
table to include weekly, biweekly, and monthly counts, then uses a spline curve to interpolate in-between data, and finally saves results to theweekly_dengue_counts_1968_to_2005
table.
unique_case
table
create the This phase runs a sequence of SQL files that creates the final unique_case
table based on the case tables built by the
previous phase, and also creates functions and additional tables, which are documented in Dengue-Cases-Database.
Here are the tables and functions in order of creation:
1_create_first_delivery_tb.sql
:most_recent_deliveries()
functionfirst_delivery
table
2_create_standardized_data.case_reports_with_delivery_information_tb.sql
:standardized_data.case_reports_with_delivery_information
table
3_create_unique_case_data_tb.sql
:unique_case_data()
functionunique_case_data
table
4_create_first_reports_tb.sql
:first_reports
table
5_create_aggregate_table_fn.sql
:aggregate_table()
function