5.2.1.Convert and format data - quanganh2001/Google-Data-Analytics-Professional-Certificate-Coursera GitHub Wiki

Converting data in spreadsheets

In this reading, you will learn about converting data from one format to another. One of the ways to help ensure that you have an accurate analysis of your data is by putting all of it in the correct format. This is true even if you have already cleaned and processed your data. As a part of getting your data ready for analysis, you will need to convert and format your data early on in the process.

As a data analyst, there are lots of scenarios when you might need to convert data in a spreadsheet:

String to date

  • How to convert text to date in Excel: Transforming a series of numbers into dates is a common scenario you will encounter. This resource will help you learn how to use Excel functions to convert text and numbers to dates, and how to turn text strings into dates without a formula.

  • Google Sheets: Change date format: If you are working with Google Sheets, this resource will demonstrate how to convert your text strings to dates and how to apply the different date formats available in Google Sheets.

String to numbers

  • How to convert text to number in Excel: Even though you will have values in your spreadsheet that resemble numbers, they may not actually be numbers. This conversion is important because it will allow your numbers to add up and be used in formulas without errors in Excel.
  • How to convert text to numbers in Google Sheets: This resource is useful if you are working in Google Sheets; it will demonstrate how to convert text strings to numbers in Google Sheets. It also includes multiple formulas you can apply to your own sheets, so you can find the method that works best for you.

Combining columns

  • Convert text from two or more cells: Sometimes you may need to merge text from two or more cells. This Microsoft Support page guides you through two distinct ways you can accomplish this task without losing or altering your data. It also includes a step-by-step video tutorial to help guide you through the process.
  • How to split or combine cells in Google Sheets: This guide will demonstrate how to to split or combine cells using Google Sheets specifically. If you are using Google Sheets, this is a useful resource to reference if you need to combine cells. It includes an example using real data.

Number to percentage

  • Format numbers as percentages: Formatting numbers as percentages is a useful skill to have on any project. This Microsoft Support page will provide several techniques and tips for how to display your numbers as percentages.
  • TO_PERCENT: This Google Sheets support page demonstrates how to use the TO_PERCENT formula to convert numbers to percentages. It also includes links to other formulas that can help you convert strings.

Pro tip: Keep in mind that you may have lots of columns of data that require different formats. Consistency is key, and best practice is to make sure an entire column has the same format.

Additional resources

If you find yourself needing to convert other types of data, you can find resources on Microsoft Support for Excel or Google Docs Editor Help for Google Sheets.

Converting data is quick and easy, and the same functions can be used again and again. You can also keep these links bookmarked for future use, so you will always have them ready in case any of these issues arise. Now that you know how to convert data, you are on your way to becoming a successful data analyst.

Hands-On Activity: Combine multiple pieces of data

Question 1

Activity overview

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

In previous activities, you gained experience using spreadsheet functions for manipulating and cleaning data. In this activity, you’ll use the CONCAT and CONCATENATE functions to help you quickly and efficiently combine multiple pieces of raw data into new data.

By the time you complete this activity, you will be able to use these functions to combine data. This will enable you to simplify and condense data, which is important for processing and cleaning data in your career as a data analyst.

What you will need

To get started, you will need the CONCAT function exercise spreadsheet.

To use the template for the spreadsheet, click the link below and select “Use Template.”

Link to template: CONCAT Function Exercise Spreadsheet

OR

If you don’t have a Google account, you can download the spreadsheet directly from the attachment below:

Dataset for Project_CONCAT function

Using the CONCAT and CONCATENATE function

gw9p7-JsTlmPae_ibC5Zpg_4568f21f8c2c485a958b7380e5fee142_line-y

Occasionally, you will encounter a dataset with data values in separate cells that you want to combine as a single value in a single cell. This is common when dealing with names and dates. The dataset may have separate columns for first names and last names, but you may want a column with the full names.

City/state and month/year combinations are also often desirable to have together, as they are likely to be recorded together.

The CONCAT function in spreadsheets can combine these kinds of data.

Combining data from two cells

gw9p7-JsTlmPae_ibC5Zpg_4568f21f8c2c485a958b7380e5fee142_line-y

First, using the spreadsheet you downloaded, you’ll combine the two sets of names in columns First Name and Last Name in a new column called Full Name.

To do this, follow these steps:

  1. Click on cell F2. This is where you start the data for the new column. After you click on the cell, type =CONCAT(A2,B2) into the function bar and hit Enter (Windows) or Return (Mac).

VJmHYl-ZRPuZh2JfmaT70A_089105fc9fd14fa1866242b5e30e4f4a_DAC5M2L3SR1-ss1

Once you press enter, the following data should appear in the cell:

You have merged or, technically, concatenated the two data values from cells A2 and B2. Because you listed A2 first in the CONCAT function argument, it comes first in the final result.

Notice that the two names were combined without a space between them.

If you want to put the space in between, you need to use the full CONCATENATE function, which allows you to combine multiple strings.

  1. Click again on the cell F2. In the function call, place a space in quotes between A2 and B2 separated by commas.

qgDwfeUdTH-A8H3lHRx_ug_51b01fa1bb06482aa39057a85b854a28_Screenshot-2021-03-04-3 00 47-PM

Once you press enter or return, your screen should appear like this:

BVGEUlAWQ2mRhFJQFkNpIA_7e300aff4df64fa7a03628aafac3bc41_DAC5M2L3SR1-ss4

Now there is a space between the first name and the last name.

Next, repeat this process for all the remaining cells in Column F. Of course, you don't want to do this manually for each cell. (Especially if the dataset were larger, it would be laborious to do this cell-by-cell.) Luckily, you can fill out the data in the column by using your mouse.

  1. Click on the cell F2. Locate the small square in the lower-right corner of the highlighted boundary of the cell.

xae5rCNOSBmnuawjTvgZOg_af0d136c5a5943cfbc27c70c17fbb45e_Screenshot-2021-03-04-3 12 50-PM

  1. Click on this square, drag your mouse to the bottom of the column, and release. All the cells in the column should populate with the full name of the appropriate president.

QUYtzlWeTeGGLc5Vnr3hbw_ed5444b9c5ac4ae68038693bec94ad58_DAC5M2L3SR1-ss6

  • Note: While it does not happen in this dataset, you may have extra spaces in your result after you CONCAT. If you notice you have extra spaces, you can use the TRIM function to remove them.

Combining data from three cells

1efXdfNMQzan13XzTFM2KA_161c0d5b576048f8a6a4f75cd8bedc97_yellowhorzbar

The procedure for combining three pieces of data from different cells is almost identical to what you just did. The only difference is that you include a third cell in the full CONCATENATE argument.

Now, combine the month, day, and year into a single data value: Date. This will occupy column G.

  1. Click on the cell where you would like the new data to start. Here, this is cell G2.
  2. Enter the CONCAT command as =CONCATENATE(C2," ",D2,", ",E2)

OlnWlsu6TO-Z1pbLuszv3Q_549d3c7ab0d74a3683a9db7c1ca7f4cc_DAC5M2L3SR1-ss7

Pay particular attention to the extra strings you added between the month and the day, and between the day and the year. This is how you get the spaces and comma in your final result.

tbuZ6pnGTvy7meqZxq78wQ_88d70db0716442dbb7704110dd1e3d88_DAC5M2L3SR1-ss8

  1. Fill out the rest of the column using the same click-and-drag technique as before. Your screen should appear like this:

MY4hBsQdTRWOIQbEHe0VGQ_5eec4344a4354f98813c4ae792802065_DAC5M2L3SR1-ss9

Congratulations! You’ve combined data in spreadsheets using the CONCAT and CONCATENATE functions.

Confirmation and reflections

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

In an empty cell in the CONCAT function exercise spreadsheet, type the function =CONCATENATE(A7, " ", B7). What does this function return?

A. JohnQuincyAdams

B. John Quincy Adams

C. John QuincyAdams

D. #N/A

The correct answer is B. John Quincy Adams. Explain: The function =CONCATENATE(A7, " ", B7) would return "John Quincy Adams." The correct function to use here would be CONCATENATE, which you can use to combine strings. Using CONCAT with these arguments would resolve this correctly in Microsoft Excel, but return an error in Google Sheets. Going forward, you can use this distinction to write proper spreadsheet functions.

Identify conditional formatting use cases

Recently, you've been learning about uses for conditional formatting, such as indicating a task’s status by highlighting it red, yellow, or green. Now it’s your turn to think of examples of when you could use conditional formatting in the real world. You may even discover how your life and work can become easier with this practical and useful tool.

Write two or more paragraphs (100-150 words) describing three times when you could use conditional formatting. Then, visit the discussion forum to read what other learners have written, and engage in at least two discussions about their posts.

Transforming data in SQL

Data analysts usually need to convert data from one format to another to complete an analysis. But what if you are using SQL rather than a spreadsheet? Just like spreadsheets, SQL uses standard rules to convert one type of data to another. If you are wondering why data transformation is an important skill to have as a data analyst, think of it like being a driver who is able to change a flat tire. Being able to convert data to the right format speeds you along in your analysis. You don’t have to wait for someone else to convert the data for you.

9sPPM-2QT7qDzzPtkG-6ZA_f4f06d6d1c984fb29bd12f269513a5dd_Screen-Shot-2021-03-05-at-2 22 21-PM

In this reading, you will go over the conversions that can be done using the CAST function. There are also more specialized functions like COERCION to work with big numbers, and UNIX_DATE to work with dates. UNIX_DATE returns the number of days that have passed since January 1, 1970 and is used to compare and work with dates across multiple time zones. You will likely use CAST most often.

Common conversions

The following table summarizes some of the more common conversions made with the CAST function. Refer to Conversion Rules in Standard SQL for a full list of functions and associated rules.

Starting with CAST function can convert to:
Numeric (number) - Integer - Numeric (number) - Big number - Floating number - String
String - Boolean - Integer - Numeric (number) - Big number - Floating integer - String - Bytes - Date - Date time - Time - Timestamp
Date - String - Date - Date time - Time - Timestamp

The CAST function (syntax and examples)

CAST is an American National Standards Institute (ANSI) function used in lots of programming languages, including BigQuery. This section provides the BigQuery syntax and examples of converting the data types in the first column of the previous table. The syntax for the CAST function is as follows:

CAST(expression AS typename)

Where expression is the data to be converted and typename is the data type to be returned.

Converting a number to a string

The following CAST statement returns a string from a numeric identified by the variable MyCount in the table called MyTable.

SELECT CAST(MyCount AS STRING) FROM MyTable

In the above SQL statement, the following occurs:

  • SELECT indicates that you will be selecting data from a table
  • CAST indicates that you will be converting the data you select to a different data type
  • AS comes before and identifies the data type which you are casting to
  • STRING indicates that you are converting the data to a string
  • FROM indicates which table you are selecting the data from

Converting a string to a number

The following CAST statement returns an integer from a string identified by the variable MyVarcharCol in the table called MyTable. (An integer is any whole number.)

SELECT CAST(MyVarcharCol AS INT) FROM MyTable

In the above SQL statement, the following occurs:

  • SELECT indicates that you will be selecting data from a table
  • CAST indicates that you will be converting the data you select to a different data type
  • AS comes before and identifies the data type which you are casting to
  • INT indicates that you are converting the data to an integer
  • FROM indicates which table you are selecting the data from

Converting a date to a string

The following CAST statement returns a string from a date identified by the variable MyDate in the table called MyTable.

SELECT CAST(MyDate AS STRING) FROM MyTable

In the above SQL statement, the following occurs:

  • SELECT indicates that you will be selecting data from a table
  • CAST indicates that you will be converting the data you select to a different data type
  • AS comes before and identifies the data type which you are casting to
  • STRING indicates that you are converting the data to a string
  • FROM indicates which table you are selecting the data from

Converting a date to a datetime

Datetime values have the format of YYYY-MM-DD hh: mm: ss format, so date and time are retained together. The following CAST statement returns a datetime value from a date.

SELECT CAST (MyDate AS DATETIME) FROM MyTable

In the above SQL statement, the following occurs:

  • SELECT indicates that you will be selecting data from a table
  • CAST indicates that you will be converting the data you select to a different data type
  • AS comes before and identifies the data type which you are casting to
  • DATETIME indicates that you are converting the data to a datetime value
  • FROM indicates which table you are selecting the data from

The SAFE_CAST function

Using the CAST function in a query that fails returns an error in BigQuery. To avoid errors in the event of a failed query, use the SAFE_CAST function instead. The SAFE_CAST function returns a value of Null instead of an error when a query fails.

The syntax for SAFE_CAST is the same as for CAST. Simply substitute the function directly in your queries. The following SAFE_CAST statement returns a string from a date.

SELECT SAFE_CAST(MyDate AS STRING) FROM MyTable

More information

Browse these resources for more information about data conversion using other SQL dialects (instead of BigQuery):

Test your knowledge on converting and formatting data

Question 1

A spreadsheet cell contains the coldest temperature ever recorded in New Zealand: -22 °Celsius. What function will display that temperature in Fahrenheit?

A. =CONVERT(-22, "C", "F")

B. =CONVERT(-22, C, F)

C. =CONVERT(-22, "F", "C")

D. =CONVERT(-22, F, C)

The correct answer is A. =CONVERT(-22, "C", "F"). Explain: =CONVERT(-22, "C", "F") will display -22 °C in Fahrenheit.

Question 2

A data analyst wants to ensure spreadsheet formulas continue to run correctly, even if someone enters the wrong data by mistake. Which data-validation menu option should they select to flag data entry errors?

A. Remove Validation

B. Reject Invalid Inputs

C. Forbid Entry

D. Deny Help Text

The correct answer is B. Reject Invalid Inputs. Explain: To ensure spreadsheet formulas continue to run correctly, even if someone enters the wrong data by mistake, select Reject Invalid Inputs to flag that data as invalid.

Question 3

A data analyst clicks on the Format Cells in drop-down menu and selects the option Text Is Exactly November. This changes the color of all the cells that contain the word November. What spreadsheet tool is the analyst using?

A. Data validation

B. CONVERT

C. Filtering

D. Conditional formatting

The correct answer is D. Conditional formatting. Explain: The data analyst is using conditional formatting. Conditional formatting is a spreadsheet tool that changes how cells appear when values meet specific conditions.