3.1.1.Data exploration - quanganh2001/Google-Data-Analytics-Professional-Certificate-Coursera GitHub Wiki

Course syllabus

  1. Foundations: Data, Data, Everywhere

  2. Ask Questions to Make Data-Driven Decisions

  3. Prepare Data for Exploration (this course)

  4. Process Data from Dirty to Clean

  5. Analyze Data to Answer Questions

  6. Share Data Through the Art of Visualization

  7. Data Analysis with R Programming

  8. Google Data Analytics Capstone: Complete a Case Study

Welcome to the third course in the Google Data Analytics Certificate! So far, you have been introduced to the field of data analytics and discovered how data analysts can use their skills to answer business questions.

As a data analyst, you need to be an expert at structuring, extracting, and making sure the data you are working with is reliable. To do this, it is always best to develop a general idea of how all data is generated and collected, since every organization structures data differently. Then, no matter what data structure you are faced with in your new role, you will feel confident working with it.

You will soon discover that when data is extracted, it isn’t perfect. It might be biased instead of credible, or dirty instead of clean. Your goal is to learn how to analyze data for bias and credibility and to understand what clean data means. You will also get up close and personal with databases and even get to extract your own data from a database using spreadsheets and SQL. The last topics covered are the basics of data organization and the process of protecting your data.

And you will learn how to identify different types of data that can be used to understand and respond to a business problem. In this part of the program, you will explore different types of data and data structures. And best of all, you will keep adding to your data analyst tool box! From extracting and using data, to organizing and protecting it, these key skills will come in handy no matter what you are doing in your career as a data analyst.

Course content

Course 3 – Prepare Data for Exploration

  1. Understanding data types and structures: We all generate lots of data in our daily lives. In this part of the course, you will check out how we generate data and how analysts decide which data to collect for analysis. You’ll also learn about structured and unstructured data, data types, and data formats as you start thinking about how to prepare your data for exploration.
  2. Understanding bias, credibility, privacy, ethics, and access: When data analysts work with data, they always check that the data is unbiased and credible. In this part of the course, you will learn how to identify different types of bias in data and how to ensure credibility in your data. You will also explore open data and the relationship between and importance of data ethics and data privacy.
  3. Databases: Where data lives: When you are analyzing data, you will access much of the data from a database. It’s where data lives. In this part of the course, you will learn all about databases, including how to access them and extract, filter, and sort the data they contain. You will also check out metadata to discover the different types and how analysts use them.
  4. Organizing and protecting your data: Good organization skills are a big part of most types of work, and data analytics is no different. In this part of the course, you will learn the best practices for organizing data and keeping it secure. You will also learn how analysts use file naming conventions to help them keep their work organized.
  5. Engaging in the data community (optional): Having a strong online presence can be a big help for job seekers of all kinds. In this part of the course, you will explore how to manage your online presence. You will also discover the benefits of networking with other data analytics professionals.
  6. Completing the Course Challenge: At the end of this course, you will be able to apply what you have learned in the Course Challenge. The Course Challenge will ask you questions about the key concepts and then will give you an opportunity to put them into practice as you go through two scenarios.

What to expect

This part of the program is designed to get you familiar with different data structures and show you how to collect, apply, organize, and protect data. All of these skills will be part of your daily tasks as an entry-level data analyst. You will work on a wide range of activities that are similar to real-life tasks that data analysts come across on a daily basis.

This course has five modules or weeks, and each has several lessons included. Within each lesson, you will find content such as:

  • Videos of instructors teaching new concepts and demonstrating the use of tools
  • In-video questions that pop up during or at the end of a video to check your learning
  • Readings to introduce new ideas and build on the concepts from the videos
  • Discussion forums to discuss, explore, and reinforce new ideas for better learning
  • Discussion prompts to promote thinking and engagement in the discussion forums
  • Hands-on activities to introduce real-world, on-the-job situations, and the tools and tasks to complete assignments
  • Practice quizzes to prepare you for graded quizzes
  • Graded quizzes to measure your progress and give you valuable feedback

Hands-on activities promote additional opportunities to build your skills. Try to get as much out of them as possible. Assessments are based on the approach taken by the course to offer a wide variety of learning materials and activities that reinforce important skills. Graded and ungraded quizzes will help the content sink in. Ungraded practice quizzes are a chance for you to prepare for the graded quizzes. Both types of quizzes can be taken more than one time.

As a quick reminder, this course is designed for all types of learners, with no degree or prior experience required. Everyone learns differently, so the Google Data Analytics Certificate has been designed with that in mind. Personalized deadlines are just a guide, so feel free to work at your own pace. There is no penalty for late assignments. If you prefer, you can extend your deadlines by returning to Overview in the navigation pane and clicking Switch Sessions. If you already missed previous deadlines, click Reset my deadlines instead.

If you would like to review previous content or get a sneak peek of upcoming content, you can use the navigation links at the top of this page to go to another course in the program. When you pass all required assignments, you will be on track to earn your certificate.

Optional speed track for those experienced in data analytics

The Google Data Analytics Certificate provides instruction and feedback for learners hoping to earn a position as an entry-level data analyst. While many learners will be brand new to the world of data analytics, others may be familiar with the field and simply wanting to brush up on certain skills.

If you believe this course will be primarily a refresher for you, we recommend taking the practice diagnostic quiz offered this week. It will enable you to determine if you should follow the speed track, which is an opportunity to proceed to Course 4 after having taken each of the Course 3 Weekly Challenges and the overall Course Challenge. Learners who earn 100% on the diagnostic quiz can treat Course 3 videos, readings, and activities as optional. Learners following the speed track are still able to earn the certificate.

Tips

  • Do your best to complete all items in order. All new information builds on earlier learning.
  • Treat every task as if it is real-world experience. Have a mindset that you are working at a company or in an organization as a data analyst. This will help you apply what you learn in this program to the real world.
  • Even though they aren’t graded, it is important to complete all practice items. They will help you build a strong foundation as a data analyst and better prepare you for the graded assessments.
  • Take advantage of all additional resources provided.
  • When you encounter useful links in the course, remember to bookmark them so you can refer to the information later for study or review.

Meet and greet

While solving a mystery, a detective sometimes asks a big, critical question at the beginning of their investigation, then follows up with smaller questions. Other times, the detective starts with smaller questions, which lead to a big, critical question at the end. Either way, the mystery is solved!

For this discussion, consider the following questions:

  • What kind of data detective are you?
  • Do you tend to come up with a big question first?
  • Do you prefer to ask small questions and let them lead you to the big question?

Please write a short paragraph (50-100 words) describing your thoughts about being a data detective. In your response, include your preferred style of questioning. Then, visit the discussion forum to read what other learners have written, and engage in discussion about at least two posts.

Deciding if you should take the speed track

This reading provides an overview of a speed track we offer to those familiar with data analytics.

If you are brand new to data analytics, you can skip the diagnostic quiz after this reading, and move directly to the next activity: Data collection in our world.

The Google Data Analytics Certificate is a program for anyone. A background in data analysis isn’t required. But you might be someone who has some experience already. If you are this type of learner, we have designed a speed track for this course. Learners who opt for the speed track can refresh on the basic topics and take each of the weekly challenges and the Course Challenge at a faster pace.

To help you decide if you’re a good match for the speed track for this course:

  1. Take the optional diagnostic quiz.
  2. Refer to the scoring guide to determine if you’re a good fit for the speed track. A score of 90% or higher is the target goal for someone on the speed track.
  3. Based on your individual score, follow the recommendations in the scoring guide for your next steps.

Important reminder: If you’re eligible for the speed track, you’re still responsible to complete all graded activities. In order to earn your certificate, you will need an overall score of 80% or higher on all graded materials in the program.

Optional: Familiar with data analytics? Take our diagnostic quiz

Question 1

Optional speed track for those experienced in data analytics The Google Data Analytics Certificate provides instruction and feedback for learners hoping to earn a position as an entry-level data analyst. While many learners will be brand new to the world of data analytics, others may be familiar with the field and simply wanting to brush up on certain skills.

If you believe this course will be primarily a refresher for you, we recommend taking this practice diagnostic quiz. It will enable you to determine if you should follow the speed track, which is an opportunity to proceed to Course 4 after taking each of the Course 3 Weekly Challenges and the overall Course Challenge. Learners who earn 100% on the diagnostic quiz can treat Course 3 videos, readings, and activities as optional. Learners following the speed track are still able to earn the certificate.

Get ready to take the next step in your data analytics journey with the question below!

A data analyst at a construction company is working on a report for a quickly approaching deadline. Why might they choose to analyze only historical data?

A. The data is difficult to predict.

B. The data is constantly changing.

C. They enjoy historical references.

D. The project has a very short time frame.

The correct answer is D. The project has a very short time frame. Explain: They would analyze only historical data because the project has a very short time frame.

Question 2

What are the benefits of data modeling? Select all that apply.

  • Secure data for future use
  • Keep data consistent
  • Make data easier to understand
  • Provide a map of how data is organized

Explain: Data modeling keeps data consistent, provides a map of how data is organized, and makes data easier to understand. Data modeling is the process of creating a model that is used for organizing data elements and how they relate to one another.

Question 3

A group of high school students take a survey that asks," Are you on an athletic team? Please reply yes or no." What kind of data is being collected?

A. Boolean

B. Visual

C. Number

D. String

The correct answer is A. Boolean. Explain: Boolean data would be collected. Boolean data has only two possible values, such as yes or no.

Question 4

A data analyst is evaluating data to determine whether it is good or bad. Which qualities characterize good data? Select all that apply.

  • Comprehensive
  • Current
  • Cited
  • Consequential

Explain: Good data is comprehensive, current, and cited.

Question 5

Imagine that a company uses your personal data as part of a financial transaction. Before it occurs, you are not made aware of the nature and scale of this transaction. What concept of data ethics does this violate?

A. Consent

B. Currency

C. Transaction transparency

D. Openness

The correct answer is B. Currency. Explain: This situation violates the concept of currency. The currency concept of data ethics states that individuals should be aware of financial transactions resulting from the use of their personal data and the scale of these transactions.

Question 6

Which of the following are protections afforded by data privacy? Select all that apply.

  • Preserving a data subject’s information and activity for all data transactions
  • Applying standards of right and wrong to the management and usage of data
  • Providing users the right to free access, usage, and sharing of data
  • Providing users the right to inspect, update, or correct their own data

Explain: The protections of data privacy include preserving a data subject’s information and activity for all data transactions. They also include providing users the right to inspect, update, and correct their own data.

Question 7

Which of the following are uses of relational databases? Select all that apply.

  • Contain and describe a series of tables that can be connected to form relationships
  • Keep data consistent regardless of where it’s accessed
  • Organize numerical data based on relative scale
  • Present the same information to each collaborator

Explain: Relational databases are used to contain and describe a series of tables that can be connected to form relationships. They also present the same information to each collaborator by keeping data consistent regardless of where it’s accessed.

Question 8

Which statements define primary keys and foreign keys and describe their relationship? Select all that apply.

  • A primary key is an identifier that references a column in which each value is unique.
  • A foreign key is a field within a table that’s a primary key in another table.
  • Primary and foreign keys are two connected identifiers within separate tables in a relational database.
  • A primary key is a table containing observational data, and a foreign key is a table that contains the results of the primary key’s analysis.

Explain: A primary key is an identifier that references a column in which each value is unique. A foreign key is a field within a table that’s a primary key in another table. Primary and foreign keys are two connected identifiers within separate tables in a relational database.

Question 9

What tasks can data analysts accomplish using metadata? Select all that apply.

  • Interpret the contents of a database
  • Evaluate the quality of data
  • Combine data from more than one source
  • Perform data analyses

Explain: Data analysts use metadata to combine data, evaluate data, and interpret a database. Metadata is data about data; in database management, it helps data analysts understand the contents of the data within a database.

Question 10

A data analyst reviews a spreadsheet of boat auction sales to find the last five sailboats sold in Kentucky. What steps would they take in order to narrow the scope? Select all that apply.

  • Filter out sales outside of Kentucky
  • Sort by date in descending order
  • Filter out sales in Kentucky
  • Sort by date in ascending order

Explain: The analyst can filter out sales outside of Kentucky and sort by date in descending order.

Question 11

You are writing a SQL query to filter data from a database that describes trees in Omaha, Nebraska. You want to only display entries for trees that have a diameter of 30 inches. The name of the table you’re using is Nebraska_trees and the name of the column that shows the diameters of the trees is trunk_diameter. What is the correct query syntax that will retrieve and filter data from this table?

A. SELECT Nebraska_trees WHERE trunk_diameter = 30

B. SELECT trunk_diameter = 30 FROM Nebraska_trees

C. SELECT * FROM Nebraska_trees WHERE trunk_diameter = 30

D. SELECT * FROM trunk_diameter WHERE Nebraska_trees = 30

The correct query is C. SELECT * FROM Nebraska_trees WHERE trunk_diameter = 30

Question 12

Consistent naming conventions describe which properties of a file? Select all that apply.

  • Content
  • Version
  • File location
  • Creation date

Explain: Consistent naming conventions describe the content, creation date, and version of a file.

Optional: Your diagnostic quiz score and what it means

Use your score to help you determine whether you should take the speed track. The speed track allows you to skip over the lesson material and go straight to the weekly challenges and the course challenge, which lead to your final course score. In order to earn your certificate, you will need an overall score of 80% or higher on all graded materials in this program. Read on to figure out your next steps based on your quiz score:

If you scored 100% on the diagnostic quiz:

  • You’re probably very familiar with types of data and data structures and can take the speed track to move on to Course 4.
  • You must take each of the weekly challenges and the course challenge, which will count toward the 80% overall score needed to earn the certificate. To help you find these items more quickly, we’ve identified them with asterisks in the course materials (for example: course challenge).
  • After you complete the weekly challenges and course challenge, proceed to Course 4.
  • You’re welcome to review videos, readings, and activities throughout the course based on your interests.

If you scored between 90% and 99% on the diagnostic quiz:

  • You’re probably familiar with the types of data and data structures and might consider taking the speed track to move on to Course 4.
  • However, we still recommend that you go through the Course 3 lesson materials to review areas where you might have some gaps before proceeding to Course 4.
  • You must take each of the weekly challenges and the course challenge, which will count toward the 80% overall score needed to earn the certificate. To help you find these items more quickly, we’ve identified them with asterisks in the course materials (for example: course challenge).
  • After you complete the weekly challenges and course challenge, proceed to Course 4.
  • You’re welcome to review videos, readings, and activities throughout the course based on your interests.

If you scored between 80% and 89% on the diagnostic quiz:

  • You likely have some background knowledge on types of data and data structures.
  • However, we recommend that you go through the Course 3 lesson materials to review areas where you might have some gaps before proceeding to Course 4.
  • You must take the weekly challenges and the course challenge, which will count toward the 80% overall score needed to earn the certificate. To help you find these items more quickly, we’ve identified them with asterisks in the course materials (for example: course challenge).

If you scored less than 80% on the diagnostic quiz:

  • No problem — this course was made for you!
  • We strongly recommend that you go through all of the Course 3 videos, readings, and activities, as the concepts taught are building blocks that will set you up for success on your learning path.
  • You must take the weekly challenges and the course challenge, which will count toward the 80% overall score needed to earn the certificate.

Regardless of your score, the course material can help you supplement or identify gaps in your knowledge. Whether you take the speed track or complete the certificate at the recommended pace, good luck on your data endeavors!