Make Import Details - matthewcornell/wikitest GitHub Wiki

This page summarizes the details of importing from the CSV files.

Import runs in two phases: Load the CSV files' data and then create the unique_case table. The details of each phase are detailed below in order of execution.

import_data.R

(See note in Servers-Databases-And-Code-Repos#data_dir-variable re: the $DATA_DIR variable.)

1_original_to_sql.R: Uses the $DATA_DIR/dengue_case_metadata.txt file to identify the CSV files to load, and then loads each into its own table in the original_data schema. Loads each file's corresponding /Users/cornell/IdeaProjects/dengue-data/file_specific_code entry. See Dengue-Cases-Database#original_data for the original_data schema's details.

Note: Due to the variation we saw early on in data types and formats, this database imports all data from .csv files originally as text. We avoid R type conversions by using the safe_read_csv function from our own integrator package which among other things calls read.csv as:
```
read.csv(file = file, check.names = FALSE, colClasses = "character", stringsAsFactors = FALSE)
```
2_standardize_pipeline_code.R: defines record access functions
3_standardize_data.R: Standardizes each table loaded in step 1, as documented in the standardized_data schema's details at Dengue-Cases-Database#standardized_data.
4_merge_into_case_data.R: Creates the single, merged standardized_data.case_data table from individual case data tables, adding UID data as well. (See Dengue-Cases-Database#standardized_data for details.)
5_old_counts_new_names.R: Loads the data in the original_data.dengue_counts_1968_to_2005 table into dengue_counts_1968_to_2005, normalizing the data by moving province information into a single province_name column, normalizing dates, and dropping unused columns.
1_old_counts_new_smooth.R: Modifies the data in the dengue_counts_1968_to_2005 table to include weekly, biweekly, and monthly counts, then uses a spline curve to interpolate in-between data, and finally saves results to the weekly_dengue_counts_1968_to_2005 table.

create the `unique_case` table

This phase runs a sequence of SQL files that creates the final unique_case table based on the case tables built by the previous phase, and also creates functions and additional tables, which are documented in Dengue-Cases-Database. Here are the tables and functions in order of creation:

1_create_first_delivery_tb.sql:
- most_recent_deliveries() function
- first_delivery table
2_create_standardized_data.case_reports_with_delivery_information_tb.sql:
- standardized_data.case_reports_with_delivery_information table
3_create_unique_case_data_tb.sql:
- unique_case_data() function
- unique_case_data table
4_create_first_reports_tb.sql:
- first_reports table
5_create_aggregate_table_fn.sql:
- aggregate_table() function

Make Import Details - matthewcornell/wikitest GitHub Wiki

import_data.R

create the unique_case table

create the `unique_case` table