3.1.4.Explore data types, fields, and values - quanganh2001/Google-Data-Analytics-Professional-Certificate-Coursera GitHub Wiki

Understanding Boolean logic

In this reading, you will explore the basics of Boolean logic and learn how to use multiple conditions in a Boolean statement. These conditions are created with Boolean operators, including AND, OR, and NOT. These operators are similar to mathematical operators and can be used to create logical statements that filter your results. Data analysts use Boolean statements to do a wide range of data analysis tasks, such as creating queries for searches and checking for conditions when writing programming code.

qhCTUshlRO6Qk1LIZdTupg_5f70c065e99d419a9d5452843c0fc8fe_Screen-Shot-2021-03-04-at-8 19 16-PM

Boolean logic example

Imagine you are shopping for shoes, and are considering certain preferences:

  • You will buy the shoes only if they are pink and grey
  • You will buy the shoes if they are entirely pink or entirely grey, or if they are pink and grey
  • You will buy the shoes if they are grey, but not if they have any pink

Below are Venn diagrams that illustrate these preferences. AND is the center of the Venn diagram, where two conditions overlap. OR includes either condition. NOT includes only the part of the Venn diagram that doesn't contain the exception.

oFNB3TU3TkOTQd01N75DlQ_b5bddb0c244a487db6cc5a586645106d_DA-C3M1L4R1

The AND operator

Your condition is “If the color of the shoe has any combination of grey and pink, you will buy them.” The Boolean statement would break down the logic of that statement to filter your results by both colors. It would say “IF (Color=”Grey”) AND (Color=”Pink”) then buy them.” The AND operator lets you stack multiple conditions.

Below is a simple truth table that outlines the Boolean logic at work in this statement. In the Color is Grey column, there are two pairs of shoes that meet the color condition. And in the Color is Pink column, there are two pairs that meet that condition. But in the If Grey AND Pink column, there is only one pair of shoes that meets both conditions. So, according to the Boolean logic of the statement, there is only one pair marked true. In other words, there is one pair of shoes that you can buy.

Color is grey Color is Pink If Grey AND Pink, then Buy Boolean Logic
Grey/True Pink/True True/Buy True AND True = True
Grey/True Black/False False/Don't buy True AND False = False
Red/False Pink/True False/Don't buy False AND True = False
Red/False Green/False False/Don't buy False AND Flase = False

The OR operator

The OR operator lets you move forward if either one of your two conditions is met. Your condition is “If the shoes are grey or pink, you will buy them.” The Boolean statement would be “IF (Color=”Grey”) OR (Color=”Pink”) then buy them.” Notice that any shoe that meets either the Color is Grey or the Color is Pink condition is marked as true by the Boolean logic. According to the truth table below, there are three pairs of shoes that you can buy.

Color is grey Color is Pink If Grey OR Pink, then Buy Boolean Logic
Red/False Black/False False/Don't buy False OR False = False
Black/False Pink/True True/Buy False OR True = True
Grey/True Green/False True/Buy True OR False = True
Grey/True Pink/True True/Buy True OR True = True

The NOT operator

Finally, the NOT operator lets you filter by subtracting specific conditions from the results. Your condition is "You will buy any grey shoe except for those with any traces of pink in them." Your Boolean statement would be “IF (Color="Grey") AND (Color=NOT “Pink”) then buy them.” Now, all of the grey shoes that aren't pink are marked true by the Boolean logic for the NOT Pink condition. The pink shoes are marked false by the Boolean logic for the NOT Pink condition. Only one pair of shoes is excluded in the truth table below.

Color is Grey Color is Pink Boolean Logic for NOT Pink If Grey AND (NOT Pink), then Buy Boolean Logic
Grey/True Red/False Not False = True True/Buy True AND True = True
Grey/True Black/False Not False = True True/Buy True AND True = True
Grey/True Green/False Not False = True True/Buy True AND True = True
Grey/True Pink/True Not True = False False/Don't buy True AND False = False

The power of multiple conditions

For data analysts, the real power of Boolean logic comes from being able to combine multiple conditions in a single statement. For example, if you wanted to filter for shoes that were grey or pink, and waterproof, you could construct a Boolean statement such as: “IF ((Color = “Grey”) OR (Color = “Pink”)) AND (Waterproof=“True”).” Notice that you can use parentheses to group your conditions together.

Whether you are doing a search for new shoes or applying this logic to your database queries, Boolean logic lets you create multiple conditions to filter your results. And now that you know a little more about how Boolean logic is used, you can start using it!

Additional Reading/Resources

Hands-On Activity: Applying a function

Question 1

Activity overview

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

In previous lessons, you got familiar with spreadsheets and data structures. In this activity, you will write functions in spreadsheets.

As a reminder, a function is a preset command that automatically performs a specified process or task using the data in a spreadsheet.

By the time you complete this activity, you will be able to apply the SUM function in spreadsheet software such as Google Sheets and Microsoft Excel. This will enable you to create dynamic spreadsheets, which are important for organizing and understanding data in your career as a data analyst.

What you will need

To get started, first access the example spreadsheet of someone tracking their entertainment expenses.

To use the spreadsheet for this course item, click the link below and select Use Template.

Link to example spreadsheet: Entertainment Expenses

OR

If you don’t have a Google account, you can download the spreadsheet directly from the attachment below.

Apply the SUM function

First, open the example spreadsheet. You will find the table below that contains data on the monthly entertainment expenses for January and February:

HHfYKHakQ8232Ch2pKPNwQ_ba71e3dedcf0438a83dc5024765d4ce4_DAC3M1L4HO1_ss1

Now, working in the example spreadsheet, you will write functions in cells to carry out certain tasks.

Your first goal is to fill in the cells B7 and C7. Each of these cells is supposed to be the sum of the numbers in the cells above it. For example, B7 should be the sum total of the numbers in the cells B2 to B6. To achieve this result:

  1. Click on cell B7. The cell should have its border highlighted.

  2. With that cell selected, type =SUM(B2:B6) like in the figure below.

Notice that this function both shows up in the cell and the field above the table. This field is called the formula bar. Once you’ve clicked on a cell, typing in the formula bar is the same thing as typing directly into the cell.

The argument of the SUM function is the expression B2:B6. This expression represents a range of values starting from the first cell in the range (B2) to the last cell in the range (B6). The word SUM instructs the spreadsheet to add up the values in that range of cells. This works similarly if you wish to add across the rows instead.

  1. Press Enter (Windows) or Return (Mac OS). The result below is what you should get.

You will find that the SUM function in B7 is replaced by the numerical value (311.96) that is the sum of the numbers in cells B2 through B6. If the value in cell B7 is not equal to 311.96, check the function to ensure you have the correct range. The formula bar, however, still contains the SUM function. This is to inform people reading the spreadsheet how the value in cell B7 was determined.

Find errors in functions

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

Syntax is very important for making proper functions in spreadsheets. Next, you can explore what happens when you leave out a character or make an error.

  1. Click on cell C7. Enter the SUM function SUM(C2:C6) either in the cell itself or the formula bar. Do NOT include the =. Press Enter or Return. It should display the following:

The equal sign in the SUM command is not optional. Without it, the spreadsheet will interpret the input as a string. A string is text data. For the formula to work, it needs numeric data. This is why the command is uncalculated in C7. When the equal sign is included, the spreadsheet knows to carry out the sum calculation and return the result in the cell.

  1. Spreadsheets handle string data quite differently than numerical data. Column A of this table is populated entirely by string data—the labels for each row. Try to input the SUM function on this column. In cell A8, type =SUM(A2:A6) and press Enter or Return.

You will find the spreadsheet calculated zero for the sum. This is because the program was asked to sum strings. When a given cell contains a string, the program considers the numerical value of the cell as zero.

That's how the SUM function in Excel works. There are many other functions available to you beyond SUM. If you know them, you can enter them just like how you entered the SUM function. There are many different spreadsheet programs, and they all have functionality similar to, if not exactly, like this.

Confirmation and reflection

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

Question 1

How would you write a function to calculate February’s entertainment expenses for Cable TV, Video Streaming, and Movies in the example spreadsheet?

A. SUM(C2:C6)

B. =SUM(B2:C4)

C. SUM(B2:C6)

D. =SUM(C2:C4)

Explain: The correct way to write a SUM function that calculates February’s entertainment expenses for Cable TV, Video Streaming, and Movies is =SUM(C2:C4). To write this function, you took the relevant range of cells and put them in the proper SUM function syntax. Going forward, you can use this knowledge of functions to interact with spreadsheet data and make dynamic sheets that will aid you in the future.

Question 2

During this activity, you explored spreadsheet functions and practiced writing them. In the text box below, write a 2-3 sentence (40-60 words) response to each of the following questions:

  • When you wrote incorrect functions, what did you learn about spreadsheet data?
  • How do you think this knowledge of the SUM spreadsheet function will help you write other kinds of functions?

Explain: Congratulations on completing this hands-on activity! A good response would include how spreadsheet functions can build your data skillset. Beyond that, consider the following:

Spreadsheets are powerful tools, allowing data analysts to make sense of large amounts of data with simple functions. The more comfortable you get with the different functions and features available in spreadsheets, the more quickly and effectively you’ll be able to clean, explore, process, and visualize data on the job!

Transforming data

What is data transformation?

jtwtFPydRZWcLRT8ndWVjA_977547953ee4493fb24a750ac653c0e5_Screen-Shot-2021-03-04-at-9 01 41-PM

In this reading, you will explore how data is transformed and the differences between wide and long data. Data transformation is the process of changing the data’s format, structure, or values. As a data analyst, there is a good chance you will need to transform data at some point to make it easier for you to analyze it.

Data transformation usually involves:

  • Adding, copying, or replicating data
  • Deleting fields or records
  • Standardizing the names of variables
  • Renaming, moving, or combining columns in a database
  • Joining one set of data with another
  • Saving a file in a different format. For example, saving a spreadsheet as a comma separated values (CSV) file.

Why transform data?

Goals for data transformation might be:

  • Data organization: better organized data is easier to use
  • Data compatibility: different applications or systems can then use the same data
  • Data migration: data with matching formats can be moved from one system to another
  • Data merging: data with the same organization can be merged together
  • Data enhancement: data can be displayed with more detailed fields
  • Data comparison: apples-to-apples comparisons of the data can then be made

Data transformation example: data merging

Mario is a plumber who owns a plumbing company. After years in the business, he buys another plumbing company. Mario wants to merge the customer information from his newly acquired company with his own, but the other company uses a different database. So, Mario needs to make the data compatible. To do this, he has to transform the format of the acquired company’s data. Then, he must remove duplicate rows for customers they had in common. When the data is compatible and together, Mario’s plumbing company will have a complete and merged customer database.

Data transformation example: data organization (long to wide)

To make it easier to create charts, you may also need to transform long data to wide data. Consider the following example of transforming stock prices (collected as long data) to wide data.

Long data is data where each row contains a single data point for a particular item. In the long data example below, individual stock prices (data points) have been collected for Apple (AAPL), Amazon (AMZN), and Google (GOOGL) (particular items) on the given dates.

Long data example: Stock prices

dq8jjvDiQgmvI47w4hIJsA_ffb38f9200364ce7862d8848c8147a19_Screen-Shot-2020-11-24-at-5 42 01-PM

Wide data is data where each row contains multiple data points for the particular items identified in the columns.

Wide data example: Stock prices

With data transformed to wide data, you can create a chart comparing how each company's stock changed over the same period of time.

You might notice that all the data included in the long format is also in the wide format. But wide data is easier to read and understand. That is why data analysts typically transform long data to wide data more often than they transform wide data to long data. The following table summarizes when each format is preferred:

Wide data is preferred when Long data is preferred when
Creating tables and charts with a few variables about each subject Storing a lot of variables about each subject. For example, 60 years worth of interest rates for each bank
Comparing straightforward line graphs Performing advanced statistical analysis or graphing

Hands-on Activity: Introduction to Kaggle

Question 1

Activity overview

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

By now, you’ve learned a lot about different data types and data structures. In this activity, you will work with datasets from Kaggle, an online community of people passionate about data. To start this activity, you’ll create a Kaggle account, set up a profile, and explore Kaggle notebooks.

Every data analyst has a data community that they rely on for help, support, and inspiration. Kaggle can help you build your own data community.

Kaggle has millions of users in all stages of their data career, from beginners to data scientists with decades of experience. The Kaggle community brings people together to develop their data analysis skills, share datasets and interactive notebooks, and collaborate on solving real-life data problems.

Check out this brief introductory video to learn more about Kaggle.

By the time you complete this activity, you will be able to use many of Kaggle’s key features. This will enable you to create notebooks and browse data, which is important for completing and sharing data projects in your career as a data analyst.

Create a Kaggle account

BZoPGCbqS32aDxgm6mt9kg_bfbeb57eeb1743938a5041c2251e2ff0_line-y

To get started, follow these steps to create a Kaggle account.

  • Note: Kaggle frequently updates its user interface. The latest changes may not be reflected in the screenshots, but the principles in this activity remain the same. Adapting to changes in software updates is an essential skill for data analysts, and we encourage you to practice troubleshooting. You can also reach out to your community of learners on the discussion forum for help.
  1. Go to kaggle.com

  2. Click on the Register button at the top-right of the Kaggle homepage. You can register with your Google credentials or your personal email address.

w2EheDLAQ8WhIXgywPPFGA_70229c6c1b9c4561ae80b325cba19c28_Screen-Shot-2021-03-02-at-4 55 46-PM

  1. Once you’re registered and logged in to Kaggle, click on the Account icon at the top-right of your screen. In the menu that opens, click the Your Profile button.

yoE3_0m6RxaBN_9JugcW6Q_ccf31c944ce0467da201f617eda39c3f_Screen-Shot-2021-03-18-at-9 10 13-PM

  1. On your profile page, click on the Edit Profile button. Enter any information you’d like to share with the Kaggle community. Your profile will be public, so only enter the information you’re comfortable sharing.
  1. If you want some inspiration, check out the profile of Kaggle’s Community Advocate, Jesse Mostipak!

Explore Kaggle notebooks

Now that you’ve created an account and set up your profile, you can check out some notebooks on Kaggle. Kagglers use notebooks to share datasets and data analyses.

Step 1: Go to the Code home page

First, go to the Navigation bar on the left side of your screen. Then, click on the Code icon. This takes you to the Code home page.

QwbDWDGjSTqGw1gxoyk6NA_41ad837dd7a647ec80934df24edd0f2d_edit3

Step 2: Review Kaggler contributions

On the Code home page, you’ll notice links to notebooks created by other Kagglers.

To begin, feel free to scroll through the list and click on notebooks that interest you. As you explore, you may come across unfamiliar terms and new information: That’s fine! Kagglers come from diverse backgrounds and focus on different areas of data analysis, data science, machine learning, and deep learning.

Step 3: Narrow your search

Once you’re familiar with the Code home page, you can narrow your search results by typing a word in the search bar or by using the filter feature.

For example, type Beginner in the search bar to show notebooks tagged as beginner-friendly. Or, click on the Filter icon, the triangle shape on the right side of the search bar. You can filter results by tags, programming language, output, and other options. Filter to Datasets to show notebooks that use one of the tens of thousands of public datasets available on Kaggle.

Step 4: Review suggested notebooks

If you’re looking for specific suggestions, check out the following notebooks:

Spend some time checking out a couple of notebooks to get an idea of the work that Kagglers share online—and that you’ll be able to create by the time you’ve finished this course!

Edit a notebook

BZoPGCbqS32aDxgm6mt9kg_bfbeb57eeb1743938a5041c2251e2ff0_line-y

Now, take a look at a specific notebook: Dive into dplyr (tutorial #1) by Jesse Mostipak. Follow these steps to learn how to edit notebooks:

  1. Click on the link to open up the notebook. It contains the dataset you’ll work with later on.
  2. Click on the Copy and Edit button at the top-right to make a copy of the notebook in your account. Now, the notebook appears in Edit mode. Edit mode lets you make changes to the notebook if you want.

r48UAOcTR2-PFADnEwdvNQ_ff1a1e016cab44faa47bb5baa472d093_C3M1L4-Visual-5

This notebook is private. If you want to share your work, you can choose to make it public. When you copy and edit another Kaggler’s work, always make meaningful changes to the notebook before publishing it. That way, you’re not misrepresenting someone else’s work as your own.

  1. Take a moment to explore the Edit mode of the notebook.

Some of this may seem unfamiliar—and that’s just fine. By the end of this course, you’ll know how to create a notebook like this from scratch!

Working with datasets in notebooks

BZoPGCbqS32aDxgm6mt9kg_bfbeb57eeb1743938a5041c2251e2ff0_line-y

Now, you can check out the data!

In this notebook, you’ll find the data in a box labeled Data at the top-right of your screen. In the box, there’s an input folder with the title: palmer-archipelago-antarctica-penguin-data. Follow these instructions to explore the datasets and learn more about the data within them:

  1. Click on this title. Two .csv files appear: penguins_lter.csv and penguins_size.csv. Click on one of them. At the bottom of the notebook, you’ll now find an interactive data table with all the information from the dataset.

YikfzC3hQESpH8wt4RBEjw_98b6d2c13fa446d1979e0a29df37dd37_C3M1L4-Visual-6

  1. Click on the other .csv file. This opens a second tab with the second dataset.
  2. Take a moment to check out each dataset.
  3. Change the range of the quantitative data in the rows by clicking on the horizontal bars to the right of the column name and changing the position of the sliders.
  4. Click on the button that says 10 of 17 columns to change the columns that are visible in the table.

In the dropdown menu, there’s a checkmark next to the name of each column that appears in the table. Checking or unchecking one of these boxes will change what data is presented.

Congratulations! You’ve explored several ways to interact with the dataset. This will help you get familiar with the Kaggle interface. You can save the notebook you worked in for future reference. Coming up, you’ll learn more about other ways you can use Kaggle.

Confirmation and reflection

Question 1

Which statements are true about the two penguin datasets in the Dive into dplyr (tutorial #1) notebook? Select all that apply.

  • penguins_size.csv has 7 columns.
  • In both datasets, the number of columns is the same.
  • In penguins_lter.csv, the highest value in the column Sample Number is 152.
  • In penguins_lter.csv, the column Individual ID cannot be sorted.

Question 2

In this activity, you’ve learned a lot about data types and data structures. Using what you’ve learned so far, consider your experience with datasets and the two penguins datasets. In the text box below, write 2-3 sentences (40-60 words) in response to each of the following questions:

  • Using all of the information you learned while exploring in Kaggle, how would you thoroughly describe these datasets to someone else?
  • How do you think sharing interactive notebooks online can help you develop your data analysis skills?

Explain: Congratulations on completing this hands-on activity! You worked with notebooks and used different datasets in Kaggle. A strong response would include that online resources like Kaggle help data analysts accomplish many important tasks. Beyond that, consider the following:

Data analysts use a variety of resources to complete data analysis projects. For instance, an analyst could use Kaggle notebooks to host projects in a portfolio. This is important for practicing and demonstrating your skills, as well as getting feedback from more experienced data analysts on your work.

Test your knowledge on data types, fields, and values

Question 1

Fill in the blank: Internet search engines are an everyday example of how Boolean operators are used. The Boolean operator _____ expands the number of results when used in a keyword search.

A. OR

B. AND

C. NOT

D. WITH

Explain: The Boolean operator OR expands the number of results when used in a keyword search.

Question 2

Which of the following statements accurately describes a key difference between wide and long data?

A. Every wide data subject has a single column that holds the values of subject attributes. Every long data subject has multiple columns.

B. Wide data subjects can have multiple rows that hold the values of subject attributes. Long data subjects can have data in multiple columns.

C. Every wide data subject has multiple columns. Every long data subject has data in a single column.

D. Wide data subjects can have data in multiple columns. Long data subjects can have multiple rows that hold the values of subject attributes.

The correct answer is D. Wide data subjects can have data in multiple columns. Long data subjects can have multiple rows that hold the values of subject attributes. Explain: Wide data subjects can have data in multiple columns. Long data subjects can have multiple rows that hold the values of subject attributes.

Question 3

What does data transformation enable data analysts to accomplish?

A. Retrieve the data faster

B. Inspect the data for accuracy

C. Restore the data after it has been lost

D. Change the structure of the data

The correct answer is D. Change the structure of the data. Explain: Data transformation enables data analysts to change the structure of data.