5.3.1.VLOOKUP for data aggregation - quanganh2001/Google-Data-Analytics-Professional-Certificate-Coursera GitHub Wiki

VLOOKUP core concepts

Functions can be used to quickly find information and perform calculations using specific values. In this reading, you will learn about the importance of one such function, VLOOKUP, or Vertical Lookup, which searches for a certain value in a spreadsheet column and returns a corresponding piece of information from the row in which the searched value is found.

When do you need to use VLOOKUP?

Two common reasons to use VLOOKUP are:

  • Populating data in a spreadsheet
  • Merging data from one spreadsheet with data in another

VLOOKUP syntax

A VLOOKUP function is available in both Microsoft Excel and Google Sheets. You will be introduced to the general syntax in Google Sheets. (You can refer to the resources at the end of this reading for more information about VLOOKUP in Microsoft Excel.)

VLOOKUP(10003, A2:B26, 2, FALSE)

Here is the syntax:

VLOOKUP(search_key, range, index, [is_sorted])

search_key

  • The value to search for.
  • For example, 42, "Cats", or I24.

range

  • The range to consider for the search.
  • The first column in the range is searched to locate data matching the value specified by search_key.

index

  • The column index of the value to be returned, where the first column in range is numbered 1.
  • If index is not between 1 and the number of columns in range, #VALUE! is returned.

is_sorted

  • Indicates whether the column to be searched (the first column of the specified range) is sorted. TRUE by default.
  • It’s recommended to set is_sorted to FALSE. If set to FALSE, an exact match is returned. If there are multiple matching values, the content of the cell corresponding to the first value found is returned, and #N/A is returned if no such value is found.
  • If is_sorted is TRUE or omitted, the nearest match (less than or equal to the search key) is returned. If all values in the search column are greater than the search key, #N/A is returned.

What if you get #N/A?

As you have just read, #N/A indicates that a matching value can't be returned as a result of the VLOOKUP. The error doesn’t mean that anything is actually wrong with the data, but people might have questions if they see the error in a report. You can use the IFNA function to replace the #N/A error with something more descriptive, like “Does not exist.”

IFNA(#N/A, "Does not exist")

Here is the syntax:

IFNA(value, value_if_na)

value

  • This is a required value.
  • The function checks if the cell value matches the value; such as #N/A.

value_if_na

  • This is a required value.
  • The function returns this value if the cell value matches the value in the first argument; it returns this value when the cell value is #N/A.

Helpful VLOOKUP reminders

  • TRUE means an approximate match, FALSE means an exact match on the search key. If the data used for the search key is sorted, TRUE can be used.
  • You want the column that matches the search key in a VLOOKUP formula to be on the left side of the data. VLOOKUP only looks at data to the right after a match is found. In other words, the index for VLOOKUP indicates columns to the right only. This may require you to move columns around before you use VLOOKUP.
  • After you have populated data with the VLOOKUP formula, you may copy and paste the data as values only to remove the formulas so you can manipulate the data again.

VLOOKUP resources for Microsoft Excel

VLOOKUP may slightly differ in Microsoft Excel, but the overall concepts can still be generally applied. Refer to the following resources if you are working with Excel.

Hands-On Activity: Using VLOOKUP

Question 1

Activity overview

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

Earlier, you learned about VLOOKUP, a function that uses vertical lookup to find specific values in a spreadsheet. In this activity, you will practice using VLOOKUP to consolidate information between two spreadsheets, clean data, and create a summary table from a query.

By the time you complete this activity, you will be able to use VLOOKUP to complete a variety of tasks in spreadsheets. This will enable you to clean and analyze data more efficiently, which is important for working with large datasets in your career as a data analyst.

What you will need To get started, first access the VLOOKUP Practice Worksheet.

Click the link to the worksheet to create a copy. If you don’t have a Google account, you may download the VLOOKUP Practice Worksheet directly from the attachments below.

Link to the worksheet: VLOOKUP Practice Worksheet

OR

Download VLOOKUP Practice Worksheet: VLOOKUP Practice Sheet

Search with VLOOKUP

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

Although you would usually clean your data prior to using VLOOKUP, this first step will illustrate why it’s important to clean data first.

Imagine your research requires you to know how many hours an employee worked on a specific date. This is easy to do manually on a small spreadsheet and becomes harder as the amount of information grows or is spread across multiple spreadsheets. The VLOOKUP function provides a way to have the spreadsheet gather the information for you.

Assume you needed to figure out how many hours the employee Daniel Chan worked on January 3, 2020. In the spreadsheet you downloaded, it is easy to notice which number contains Daniel’s name. But imagine if you had thousands of employees in your spreadsheet. It might not be easy to find his name without searching each cell. In this step, you are going to use Daniel Chan’s name as the lookup_value, sometimes known as a search key, in VLOOKUP.

The syntax for the VLOOKUP function is =vlookup(lookup_value, table_array, col_index_num, [range_lookup], true/false).

Search for the number of hours Daniel Chan worked on January 3, 2020.

  1. In B11 enter Chan, Daniel.
  2. In B12 enter =VLOOKUP(B11, B2:E6, 4, false).

As a refresher, this syntax means that the lookup value is contained in cell B11, the table array contains cells B2 through E6, you want to search in column 4 of this array, and you want an exact match. Remember that column refers to the array column, which represents the limits of your query.

  1. Press Enter (Windows) or Return (Mac). The cell will now contain an error, #N/A.

Notice that the entry for Daniel Chan has extra spaces after the comma. Because B11 does not contain those extra spaces, the search comes back with an error.

One option to fix this is to adjust the number of spaces until you get an exact match. However, this is not very efficient, and if you could identify the name, you would probably just use the cell number for your query.

The best way to handle this is to trim any extra spaces in the data. This is why it’s important to clean your data prior to using VLOOKUP.

Prepare the data

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

Now you will prepare the data to help you more easily figure out how many hours employees worked. You first need to clean and label the data. Then, you can combine data from two spreadsheets using the trusty VLOOKUP function.

Clean and label the data

To trim the data, follow these steps:

  1. In cell B15 type =trim(B2).
  2. Click and drag down the bottom-right corner of the cell until you reach B19. The rest of the names will populate.

For this exercise, you are not replacing the trimmed data into the original table. There are many cases where you need to clean the data for your use, but you do not want to change data in the set with which you are working.

  1. Scroll below the original data. In cell C15 type =value(C2).
  2. Click on the bottom-right corner of the cell and drag the cell down to populate the hours for the other employees.

It’s also helpful to label the different columns for the data. Working with data gets messy quickly, and it is important to keep track of your value references.

Enter in the following labels:

  • B14: Names
  • C14 – H14: (Enter in the dates 1/1/2020 through 1/6/2020)
  • I14: Hours
  • J14: Pay Rate
  • K14: Total Pay

Populate and sum the remaining hours

Use cells already populated in C15 through C19 to populate the remaining hours needed for each employee.

  1. Click and drag the corner of C15 to H15 to populate the remaining hours for Daniel Chan.
  2. Repeat this process for the remaining employees.

Now, fill in the Hours column for the employees.

  1. In cell I15 type =sum(C15:H15).
  2. Click and drag down the bottom corner of cell I15 to populate the sums for the remaining employees.

Import pay rate data

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

You might have noticed that pay information is missing from the spreadsheet. Data analysis often requires importing information from different data sources. In this case, the data required is on Sheet2. To import this data:

  1. Click on Sheet2, which you can find at the bottom of the spreadsheet. Employee ID, date of hire (DOH), status, and pay rate are the data found on this sheet.

Now, use VLOOKUP to import pay rate data.

  1. In J15 (of sheet 1) type: =VLOOKUP(A2, Sheet2!$A:$D, 4, false). Consider the syntax for this VLOOKUP function:
  • A2 refers to cell A2 in Sheet1.

Note: In Sheet2 the rate of pay, and related fields, are referenced by ID instead of employee name. You need to use employee ID to import the pay rate from Sheet2.

  • Sheet2! refers to the sheet from which you want to access the data.
  • $A$2:$D$6 refers to the cells that make up the table array. The $ placed in front of the column tabs and cell numbers locks the formula so that it can be copied by dragging down the cell J15 to import the pay rate for the other employees.
  • 4 refers to the column from which the returned value will come. 4 means that the returned value will come from the 4th column in the selected array.
  • false signifies that you want an exact, character-for-character match to the lookup value. If you put true instead, VLOOKUP would return an approximate match (or the closest match available) for the lookup value. This is not used very often in real-world situations.
  1. Populate the pay rate for the remaining employees by dragging down the corner of the cell to copy the formula.

Now, calculate total pay.

  1. In K15 type =product(I15, J15).
  2. Drag cell K15 down to populate the total pay for the remaining employees.

Create a summary table

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

Now that the data is clean and includes pay rate information, you can create a summary table, or pivot table. The following section demonstrates how to create a pivot table in Google Sheets. If you are using Excel, please follow the documentation for how to manually create a Pivot Table in Excel.

In Google Sheets, create a table for data in cells (B14:K19) using the following steps:

  1. Select the data in cells (B14:K19).
  2. Click on the Insert tab, then select Pivot Table.
  3. A pop-up window will display. Click on New Sheet, then click the Create button.

On the side of the new sheet, the Pivot table editor will display. The pivot table you are creating will contain each employee’s name, pay rate, and total pay. Follow these steps to create the pivot table:

  1. Click the Add button for Rows. Select Names.
  2. Click the Add button for Values. Select Pay Rate.
  3. Click the Add button for Values again. Select Total Pay.

The result should display like this:

Next, convert the cells in the Sum of Total Pay column to currency.

  1. Select the cells in the column Sum of Total Pay.
  2. Click on the $ symbol on the toolbar.

Alternatively, you can also click on the Format tab, select Number, then select Currency.

Congratulations! You have now used VLOOKUP and created a pivot table, two essential tools for analyzing data in spreadsheets.

Confirmation and reflection

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

Imagine the employee Anika Patel asks you to confirm her pay rate. Without using the pivot table, which VLOOKUP function would return her pay rate based off of the imported data on Sheet1?

A. =VLOOKUP(B19, B15:J19, 9, false)

B. =VLOOKUP(B20, B15:J20, 8, false)

C. =VLOOKUP(B19, B15-J19, 9, false)

D. =VLOOKUP(B19, B15:J19, 8, false)

The correct VLOOKUP function is A. =VLOOKUP(B19, B15:J19, 9, false). Explain: The VLOOKUP function =VLOOKUP(B19, B15:J19, 9, false) would return the value 3000, which is Anika Patel’s pay rate. You can use VLOOKUP to find problems or inconsistencies like this in your data. Going forward, you can apply your knowledge of VLOOKUP to find, trim, and convert important information in large datasets.

Question 2

In this activity, you used the VLOOKUP function to find values within a spreadsheet. In the text box below, write 2-3 sentences (40-60 words) in response to each of the following questions:

  • Why is it important to clean and label data when using VLOOKUP and pivot tables?
  • What are some other ways that you might use VLOOKUP with the data in this table?

Explain: Congratulations on completing this hands-on activity! A good response would include how VLOOKUP can be used to query and trim data, convert text data to numeric data, and create a summary table from a queried information.

If you are analyzing hours worked, you could use VLOOKUP to query your clean and well-labeled dataset to assess employee status and determine who is full-time, part-time, or on leave. You could also use it to analyze the relationships between pay rates and hire date or status. VLOOKUP is an extremely helpful tool that you can use to analyze data in many situations.