Data Preparation - sewardlee337/industry-cluster-explorer GitHub Wiki

Data to be processed and visualized by Industry Cluster Explorer are not hosted on this GitHub repository. Users must supply the datasets that they intend to visualize with the dashboard.

The dashboard is designed so that non-JSON datasets provided by the user are structured in tidy format: data come in tables, with rows representing observations and columns representing variables.

Required Data

To have a working dashboard for industry cluster analysis, the user needs to supply the following datasets:

  • 2 CSV files containing panel data from the time period of interest (one dataset for the beginning of the time period, and one for the end).
  • 1 CSV file with industry clusters labeled as "traded" or "local."
  • A separate TopoJSON file containing geospatial data for each geographic/administrative region to be mapped and analyzed.

Industry Cluster Datasets

The dashboard is currently designed to manage two sets of panel data representing industry cluster information from the beginning and end of your time period of interest. They should be in CSV format.

There is no limit to the number of columns in the two datasets, but both should at minimum contain the following columns:

  • cluster - Name of the cluster (e.g. "Aerospace Vehicles and Defense," "Agricultural Inputs and Services," etc.)
  • region - Geographic/administrative region for each observation.
  • employee_count - Number of employees.
  • enterprise_count - Number of enterprises.
  • revenue - Total revenues.
  • wages - Total wages paid to employees.

For example:

cluster region employee_count enterprise_count revenue wages
Aerospace Vehicles and Defense Changhua County 0 0 0 0
Aerospace Vehicles and Defense Chiayi City 0 0 0 0
Aerospace Vehicles and Defense Chiayi County 0 0 0 0
Aerospace Vehicles and Defense Hsinchu City 0 0 0 0
Aerospace Vehicles and Defense Hsinchu County 0 0 0 0
... ... ... ... ... ...

Please ensure that column headers are labeled exactly as above (i.e. with no variation in capitalization and spacing, and with no extra characters).

Cluster Types Dataset

Prepare a CSV file with two columns. One column should be labeled cluster, and contain names of the industry clusters to be analyzed. The other column should be labeled type, and should indicate whether each industry cluster is "traded" or "local."

For example:

cluster type
Aerospace Vehicles and Defense traded
Agricultural Inputs and Services traded
Apparel traded
Automotive traded
Biopharmaceuticals traded
Business Services traded
... ...

Please ensure that column headers cluster and type are labeled exactly as above (i.e. with no variation in capitalization or spacing, and with no extra characters). The values under the type column, "traded" and "local," should also be uncapitalized.

TopoJSON Data

Each unique region listed under the region column of the industry cluster CSV files should have an associated TopoJSON file containing geospatial data for the respective region. The necessary TopoJSON files may be available through a government website as part of an Open Data initiative.

See the TopoJSON Wiki for more information regarding to this data format.