3.3.2.Accessing different data sources - sj50179/Google-Data-Analytics-Professional-Certificate GitHub Wiki

From external source to a spreadsheet

When you are working in spreadsheets, there are a few different ways you can import data:

Uploading data from a CSV file in Google Sheets/Microsoft Excel

(skipped)

Importing online data using a URL in Google Sheets

In a lot of cases, a data analyst will have an existing spreadsheet and want to add additional data from another file or resource to it. One of the ways they can do this is by using the files URL or web address with the IMPORTDATA function. This function lets you import public datasets you find online to a new or existing spreadsheet. Here is an example of how you import online data:

Step 1: Open a new or existing spreadsheet.

Step 2: Copy the URL of the data source that you want to import data from.

Step 3: Input the =IMPORTDATA function and paste the URL into the parentheses. Remember to put the URL in quotes!

Note: The URL shown above is for illustrative purposes only. The census data is now available as downloadable CSV files.

You can also choose to import a certain range of data from an online source; in that case, you will use the IMPORTRANGE function instead.

Importing online data using a URL in Microsoft Excel

You can also import online data in Microsoft Excel using the From Web option:

Step 1: Open a new or existing spreadsheet.

Step 2: Click Data in the main menu and select the From Web option.

Step 3: Enter the URL and click OK.

Exploring public datasets

Open data helps create a lot of public datasets that you can access to make data-driven decisions. Here are some resources you can use to start searching for public datasets on your own:

  • The Google Cloud Public Datasets allow data analysts access to high-demand public datasets, and make it easy to uncover insights in the cloud.
  • The Dataset Search can help you find available datasets online with keyword searches.
  • Kaggle has an Open Data search function that can help you find datasets to practice with.
  • Finally, BigQuery hosts 150+ public datasets you can access and use.

Public health datasets

  1. Global Health Observatory data: You can search for datasets from this page or explore featured data collections from the World Health Organization.
  2. The Cancer Imaging Archive (TCIA) dataset: Just like the earlier dataset, this data is hosted by the Google Cloud Public Datasets and can be uploaded to BigQuery.
  3. 1000 Genomes: This is another dataset from the Google Cloud Public resources that can be uploaded to BigQuery.

Public climate datasets

  1. National Climatic Data Center: The NCDC Quick Links page has a selection of datasets you can explore.
  2. NOAA Public Dataset Gallery: The NOAA Public Dataset Gallery contains a searchable collection of public datasets.

Public social-political datasets

  1. UNICEF State of the World’s Children: This dataset from UNICEF includes a collection of tables that can be downloaded.
  2. CPS Labor Force Statistics: This page contains links to several available datasets that you can explore.
  3. The Stanford Open Policing Project: This dataset can be downloaded as a .CSV file for your own use.

Test your knowledge on accessing data sources

TOTAL POINTS 3

Question 1

A CSV file saves data in a table format. What does CSV stand for?

  • Comma-separated values
  • Calculated spreadsheet values
  • Compatible scientific variables
  • Cell-structured variables

Correct. CSV stands for comma-separated values.

Question 2

A data analyst wants to bring data from a CSV file into a spreadsheet. This is an example of what process?

  • Normalizing data
  • Importing data
  • Filing data
  • Editing data

Correct. A data analyst bringing data from a CSV file into a spreadsheet is an example of importing data.

Question 3

A CSV file makes it easier for data analysts to complete which tasks? Select all that apply.

  • Examine a small subset of a large dataset
  • Distinguish values from one another
  • Import data to a new spreadsheet
  • Manage multiple tabs within a worksheet

Correct. A CSV file makes it easier for data analysts to examine a small part of a large dataset, import data to a new spreadsheet, and distinguish values from one another.