Data Preparation - sewardlee337/industry-cluster-explorer GitHub Wiki

Data to be processed and visualized by Industry Cluster Explorer are not hosted on this GitHub repository. Users must supply the datasets that they intend to visualize with the dashboard.

The dashboard is designed so that non-JSON datasets provided by the user are structured in tidy format: data come in tables, with rows representing observations and columns representing variables.

Required Data

To have a working dashboard for industry cluster analysis, the user needs to supply the following datasets:

2 CSV files containing panel data from the time period of interest (one dataset for the beginning of the time period, and one for the end).
1 CSV file with industry clusters labeled as "traded" or "local."
A separate TopoJSON file containing geospatial data for each geographic/administrative region to be mapped and analyzed.

Industry Cluster Datasets

The dashboard is currently designed to manage two sets of panel data representing industry cluster information from the beginning and end of your time period of interest. They should be in CSV format.

There is no limit to the number of columns in the two datasets, but both should at minimum contain the following columns:

cluster - Name of the cluster (e.g. "Aerospace Vehicles and Defense," "Agricultural Inputs and Services," etc.)
region - Geographic/administrative region for each observation.
employee_count - Number of employees.
enterprise_count - Number of enterprises.
revenue - Total revenues.
wages - Total wages paid to employees.

For example:

cluster	region	employee_count	enterprise_count	revenue	wages
Aerospace Vehicles and Defense	Changhua County	0	0	0	0
Aerospace Vehicles and Defense	Chiayi City	0	0	0	0
Aerospace Vehicles and Defense	Chiayi County	0	0	0	0
Aerospace Vehicles and Defense	Hsinchu City	0	0	0	0
Aerospace Vehicles and Defense	Hsinchu County	0	0	0	0
...	...	...	...	...	...

Please ensure that column headers are labeled exactly as above (i.e. with no variation in capitalization and spacing, and with no extra characters).

Cluster Types Dataset

Prepare a CSV file with two columns. One column should be labeled cluster, and contain names of the industry clusters to be analyzed. The other column should be labeled type, and should indicate whether each industry cluster is "traded" or "local."

For example:

cluster	type
Aerospace Vehicles and Defense	traded
Agricultural Inputs and Services	traded
Apparel	traded
Automotive	traded
Biopharmaceuticals	traded
Business Services	traded
...	...

Please ensure that column headers cluster and type are labeled exactly as above (i.e. with no variation in capitalization or spacing, and with no extra characters). The values under the type column, "traded" and "local," should also be uncapitalized.

TopoJSON Data

Each unique region listed under the region column of the industry cluster CSV files should have an associated TopoJSON file containing geospatial data for the respective region. The necessary TopoJSON files may be available through a government website as part of an Open Data initiative.

See the TopoJSON Wiki for more information regarding to this data format.