5.2.2.Combine multiple datasets - quanganh2001/Google-Data-Analytics-Professional-Certificate-Coursera GitHub Wiki

Optional: Prepare to use the bike sharing dataset in BigQuery

The next video demonstrates how to use CONCAT in a SQL query to return data from two columns in a single column.

If you would like to follow along with the instructor, you will need to log in to your BigQuery account to use the open (public) dataset called new_york_citibike. If you need a refresher, the reading Using BigQuery in the Prepare Data for Exploration course explains how to set up a BigQuery account.

Prepare for the next video

Step 1: In the BigQuery Explorer, enter citibike in the search bar to locate the new_york_citibike dataset under bigquery-public-data.

xXfe-9dRQm6Ssfp6zu0sbg_7e731e42df9547fe9db054b02f3f90f1_1 citibike

Step 2: Click the citibike_trips table, then click the Preview tab to view the data in the table. You may notice the first 50 observations are null across each column, but note the total amount of observations equal 58,937,715. Click on the arrow button that will take you to the end of the list, and you will see data populate each column. At this point you are beginning to understand why BigQuery is used to search this table instead of downloading ~59M rows of data!

w_0omp8EQOa9KJqfBFDmag_ab7c517470fe4be39e57ef81b38e4af1_preview_citibkie_trips

What to expect from the query

You will be using CONCAT to combine the data in the start_station_name column with the data in the end_station_name column to create the route information in another column; for example, the route from Station 509 to Station 442 in the first row of the table above would be 9 Ave & W 22 St to W 27 St & 7 Ave, a combination of the start and end station names.

Manipulating strings in SQL

Knowing how to convert and manipulate your data for an accurate analysis is an important part of a data analyst’s job. In this reading, you will learn about different SQL functions and their usage, especially regarding string combinations.

A string is a set of characters that helps to declare the texts in programming languages such as SQL. SQL string functions are used to obtain various information about the characters, or in this case, manipulate them. One such function, CONCAT, is commonly used. Review the table below to learn more about the CONCAT function and its variations.

Function Usage Example
CONCAT A function that adds strings together to create new text strings that can be used as unique keys CONCAT('Google','.com');
CONCAT_WS A function that adds two or more strings together with a separator CONCAT_WS('.','www','google','com')

* The separator (being the period) gets input before and after Google when you run the SQL function
CONCAT with + Adds two or more strings together using the + operator 'Google'+'.com'

CONCAT at work

When adding two strings together such as ‘Data’ and ‘analysis’, it will be input like this:

  • SELECT CONCAT (‘Data’, ‘analysis’);

The result will be:

  • Dataanalysis

Sometimes, depending on the strings, you will need to add a space character, so your function should actually be:

  • SELECT CONCAT (‘Data’, ‘ ‘, ‘analysis’);

And the result will be:

  • Data analysis

The same rule applies when combining three strings together. For example,

  • SELECT CONCAT (‘Data’,’ ‘, ‘analysis’, ‘ ‘, ‘is’, ‘ ‘, ‘awesome!’);

And the result will be

  • Data analysis is awesome!

Practice makes perfect

W3 Schools is an excellent resource for interactive SQL learning, and the following links will guide you through transforming your data using SQL:

  • SQL functions : This is a comprehensive list of functions to get you started. Click on each function, where you will learn about the definition, usage, examples, and even be able to create and run your own query for practice. Try it out for yourself!
  • SQL Keywords : This is a helpful SQL keywords reference to bookmark as you increase your knowledge of SQL. This list of keywords are reserved words that you will use as your need to perform different operations in the database grows.
  • While this reading went through the basics of each of these functions, there is still more to learn, and you can even combine your own strings.
  1. Practice using CONCAT
  2. Practice using CONCAT WS
  3. Practice using CONCAT with +

Pro tip: The functions presented in the resources above may be applied in slightly different ways depending on the database that you are using (e.g. mySQL versus SQL Server). But, the general description provided for each function will prepare you to customize how you use these functions as needed.

Ungraded Plugin: SQL Syntax

Do you know your SQL queries?

Match each definition with the correct query. First, select a definition. Then, select a query to find out if it’s a match.

image

image

Learning Log: A data analysis checklist

Overview

Z65IW3QCSOmuSFt0Aijp8w_914270a3d2e84027b46e404f7a52007f_line-y

By now, you’ve already learned a lot about the steps involved in analyzing data. As you begin your career as a data analyst, you’ll need to analyze a lot of it! In this learning log entry, you’ll start developing a data analysis checklist. This will help you stay organized during your analysis, even when you have a lot of data to deal with in the future.

Create your checklist

Z65IW3QCSOmuSFt0Aijp8w_914270a3d2e84027b46e404f7a52007f_line-y

In the learning log template linked below, start a checklist of tasks you need to complete during analysis using the steps you have already learned: Ask, Prepare, Process, Analyze, Share and Act. This way, you can always remember what you need to do and in what order. As your projects get more complicated, you can add to and adapt your checklist to fit any situation.

For now, your checklist can be a high-level overview of the steps you need to take to complete your analysis; for example, you might include “clean the data” as a high-level task under the Process stage. You will return to your checklist again later in this course to add more detail and finish creating your checklist. You can also continue adding to this checklist whenever you have new ideas for it!

Access your learning log To use the template for this course item, click the link below and select “Use Template.”

Link to learning log template: Start a data analysis checklist

OR

If you don’t have a Google account, you can download the template directly from the attachment below.

Learning Log Template_ Start a data analysis checklist

Reflection

Z65IW3QCSOmuSFt0Aijp8w_914270a3d2e84027b46e404f7a52007f_line-y

When you’re finished, think about the process of creating your checklist. Write 3-5 sentences (60-100 words) in your learning log answering the following questions:

  • Did making a checklist help you better understand the data analysis process?
  • How can you adapt your checklist to fit each new project?

When you’ve finished your entry in the learning log template, make sure to save the document so your response is somewhere accessible. This will help you continue applying data analysis to your everyday life. You will also be able to track your progress and growth as a data analyst.

Test your knowledge on combining multiple datasets

Question 1

Fill in the blank: In SQL, _____ can be used to combine strings from multiple tables in order to create a new string.

A. CONCATENATE

B. CONCAT

C. CONNECT

D. COMBINE

The correct answer is B. CONCAT. Explain: In SQL, CONCAT can be used to combine strings from multiple tables in order to create a new string.

Question 2

You are working with a database table that contains data about playlists for different types of digital media. You are only interested in the first 4 playlists.

You write the SQL query below. Add a LIMIT clause that will return only the first 4 playlists.

SELECT 
*
FROM 
playlist
LIMIT 4

Output:

+-------------+------------+
| playlist_id | name       |
+-------------+------------+
|           1 | Music      |
|           2 | Movies     |
|           3 | TV Shows   |
|           4 | Audiobooks |
+-------------+------------+

What playlist appears in row 2 of your query result?

A. TV Shows

B. Audiobooks

C. Music

D. Movies

The correct answer is D. Movies. Explain: The clause LIMIT 4 will return only the first 4 playlists. The complete query is SELECT * FROM playlist LIMIT 4. The LIMIT clause sets a limit on the number of rows your query returns.

The Movies playlist appears in row 2 of your query result.

Question 3

What function can be used to return the number of characters in cell B8 so you can confirm that it contains exactly 20 characters?

A. =LEN(20, B8)

B. =LEN(B8)

C. =LEN(B8, 20)

D. =LEN(20)

The correct answer is B. =LEN(B8). Explain: The function =LEN(B8) will display the number of characters in cell B8. The LEN function returns the length of a string of text by counting the number of characters it contains.

⚠️ **GitHub.com Fallback** ⚠️