3.1.3.Explore data types, fields, and values - sj50179/Google-Data-Analytics-Professional-Certificate GitHub Wiki

Wide data

  • Data in which every data subject has a single row with multiple columns to hold the values of various attributes of the subject

Long data

  • Data in which each row is one time point per subject, so each subject will have data in multiple rows

Transforming data

What is data transformation?

Data transformation is the process of changing the data’s format, structure, or values. As a data analyst, there is a good chance you will need to transform data at some point to make it easier for you to analyze it.

Data transformation usually involves:

  • Adding, copying, or replicating data
  • Deleting fields or records
  • Standardizing the names of variables
  • Renaming, moving, or combining columns in a database
  • Joining one set of data with another
  • Saving a file in a different format. For example, saving a spreadsheet as a comma separated values (CSV) file.

Why transform data?

Goals for data transformation might be:

  • Data organization: better organized data is easier to use
  • Data compatibility: different applications or systems can then use the same data
  • Data migration: data with matching formats can be moved from one system to another
  • Data merging: data with the same organization can be merged together
  • Data enhancement: data can be displayed with more detailed fields
  • Data comparison: apples-to-apples comparisons of the data can then be made

Data transformation example: data merging

Mario is a plumber who owns a plumbing company. After years in the business, he buys another plumbing company. Mario wants to merge the customer information from his newly acquired company with his own, but the other company uses a different database. So, Mario needs to make the data compatible. To do this, he has to transform the format of the acquired company’s data. Then, he must remove duplicate rows for customers they had in common. When the data is compatible and together, Mario’s plumbing company will have a complete and merged customer database.

Wide data is easier to read and understand. That is why data analysts typically transform long data to wide data more often than they transform wide data to long data. The following table summarizes when each format is preferred:

Wide data is preferred when Long data is preferred when
Creating tables and charts with a few variables about each subject Storing a lot of variables about each subject. For example, 60 years worth of interest rates for each bank
Comparing straightforward line graphs Performing advanced statistical analysis or graphing

Test your knowledge on exploring data types, fields, and values

TOTAL POINTS 3

Question 1

Fill in the blank: Internet search engines are an everyday example of how Boolean operators are used. The Boolean operator _____ expands the number of results when used in a keyword search.

  • AND
  • NOT
  • OR
  • WITH

Correct. The Boolean operator OR expands the number of results when used in a keyword search.

Question 2

Which of the following statements accurately describes a key difference between wide and long data?

  • Every wide data subject has a single column that holds the values of subject attributes. Every long data subject has multiple columns.
  • Wide data subjects can have data in multiple columns. Long data subjects can have multiple rows that hold the values of subject attributes.
  • Every wide data subject has multiple columns. Every long data subject has data in a single column.
  • Wide data subjects can have multiple rows that hold the values of subject attributes. Long data subjects can have data in multiple columns.

Correct. Wide data subjects can have data in multiple columns. Long data subjects can have multiple rows that hold the values of subject attributes.

Question 3

What does data transformation enable data analysts to accomplish?

  • Retrieve the data faster
  • Change the structure of the data
  • Restore the data after it has been lost
  • Inspect the data for accuracy

Correct. Data transformation enables data analysts to change the structure of data.