231115‐16 - Forestreee/Data-Analytics GitHub Wiki

Google Data Analytics Professional

Ask Questions to Make Data-Driven Decisions

WEEK3 - More spreadsheet basics

Spreadsheets are a very important data analytics tool. In this part of the course, you will learn about how data analysts use spreadsheets in their work every day. You will also explore why structured thinking helps analysts better understand problems and come up with solutions.

Learning Objectives

  • Discuss the data analyst’s use of spreadsheets with reference to roles and responsibilities
  • Demonstrate the use of spreadsheets to complete basic tasks of the data analyst including entering and organizing data
  • Demonstrate an understanding of the use of formulas in spreadsheets including a definition and specific examples
  • Compare formulas and functions with reference to similarities and differences
  • Describe the key ideas associated with structured thinking including the problem domain, scope of work, and context

Working with spreadsheets

The amazing spreadsheet

There's a good chance a spreadsheet will be the first tool you reach for when trying to answer data-driven questions. After you've defined what you need to do with the data, you'll turn to spreadsheets to help build evidence that you can then visualize, and use to support your findings. Spreadsheets are often the unsung heroes of the data world. They don't always get the appreciation they deserve, but as a data detective, you'll definitely want them in your evidence collection kit.

I know spreadsheets have saved the day for me more than once. I've added data for purchase orders into a sheet, set up formulas in one tab, and had the same formulas do the work for me in other tabs. This frees up time for me to work on other things during the day.

I couldn't imagine not using spreadsheets. Math is a core part of every data analyst's job, but not every analyst enjoys it. Luckily, spreadsheets can make calculations more enjoyable, and by that, I mean easier.

Spreadsheets can do both basic and complex calculations automatically. Not only does this help you work more efficiently, but it also lets you see the results and understand how you got them.

Get to work with spreadsheets

Imagine you work for a construction company. Your company needs your spreadsheet skills to analyze some data about their expenses, so you access the appropriate data and add it to your spreadsheet.

Spreadsheet tasks

What do you do with the data now that it's in your spreadsheet? Again, this will be different for each job, but you might start by organizing your data with the task you've been given. For example, you might put your data in a pivot table.

Think of them as well-organized and very useful tables.

Next, you might filter the data in the pivot table. Sorting and filtering data is a common part of most jobs.

This lets you focus only on the data you'll need for your analysis. In our example, maybe you only need the expenses for a certain time frame, like the last three months.

After you filtered your data, you could perform some calculations to learn more about it. Maybe you need to find out which construction projects ended up costing the most money. This is where formulas and functions are really handy.

Formulas and functions are great for doing some quick math, especially once you run out of fingers and toes to count on. Now you've seen some of the ways data analysts are using spreadsheets in their day to day work for a lot of different tasks, including organizing their data and making calculations.

Spreadsheets and the data life cycle

Resources Spreadsheet shortcuts can help you become more efficient with spreadsheets. If you’d like to learn more, you can explore the collection of Google Sheets shortcuts, or visit the Microsoft Excel shortcuts page if you are using Excel. Both of these resources contain a list of spreadsheet shortcuts you can save and reference as you work more with spreadsheets on your own.

Hands-On Activity: Introduction to Google Sheets

click sheets practice

Basic spreadsheet tasks

I'm going to demonstrate some basic tasks we know data analysts use spreadsheets for, including entering and organizing data. We'll start with a step-by-step process to show you some tools to organize your data in a spreadsheet. Consider these steps the basics. You won't always have to use them when working with a data set, but if your data is a bit messy when you get it, these steps can help you get it ready for analysis.

Depending on the job, you might use data from an open source, you might be given data to work with or you might be asked to find your own data. You'll experience all of these later in the program. There are a lot of open data sources online, where data is made available to the public. For example, we'll use data from worldbank.org, that's already in the spreadsheet. The data shows the population of Latin American and Caribbean countries from 2010-2019.

Link to practice template: Population of Latin and Caribbean Countries

Learn more about spreadsheet basics

Microsoft Excel

  • Office Quick Starts: Scroll down to the Downloadable guides section to download the Excel Quick Start Guide: This PDF guide begins with a labeled map of Excel that can guide you through the basic tasks you can accomplish in Excel. For tips on starting and opening Excel, this Microsoft Support page will show you how to begin a new workbook.

  • Excel video training: This is a collection of step-by-step videos to use all sorts of Excel features, including adding and working within rows, columns, and cells; formatting; using formulas and functions; and adding charts and pivot tables.

  • Sort data in a range or table: This page guides you through all of the steps you will need to sort data by number, text, and color. You’ll also have the option to sort by custom list so that you can customize exactly what you want to sort.

  • Filter data in a range or table: This article has step-by-step instructions on how to filter an Excel spreadsheet to show only the data you want to see. You can also use built-in comparison operators, such as “greater than” and “top 10” to reveal only the most relevant data.

  • Format a worksheet: The guide will help you select and format your Excel spreadsheet, then change the borders, shading, colors, and text. This can help improve your spreadsheet’s readability.

Pro tip: If you’re searching for information about using customizable options, check out Microsoft’s Guidelines for organizing and formatting data on a worksheet. This article provides clear methods for creating easy-to-read spreadsheets.

Google Sheets

  • Google Sheets cheat sheet: The cheat sheet puts all the basics of Sheets on a single page for easy reference. Here, you can learn about customizing your spreadsheet and the data inside; working with rows, columns, and cells; sharing your spreadsheet with others; creating different versions and copies of a spreadsheet; and more.

  • Get started with Sheets: Create and import files: This guide is a step-by-step guide for working with Sheets. You start by learning how to open a spreadsheet, then move on to adding data.

  • Sort and filter your data: This resource can help you organize data in Sheets. Use this guide to sort part or all of a spreadsheet. You can sort by text, number, and color. Then, learn how to create filters to show only certain data while hiding the rest. Finally, the article includes information on creating, saving, and removing a filter view.

  • Edit and format a spreadsheet: This will help you make easy-to-read spreadsheets. You will learn how to assign a color, customize borders around cells, and change the appearance of text. If you’d like to give your spreadsheet a theme, you can scroll to the bottom of the page and find how to apply it to parts of your spreadsheet.

Tip: Microsoft Excel and Google Sheets are very similar in terms of calculations, formulas, functions, and many other features. But there are some differences, which can make it tricky to switch from one to the other. If you are moving between Excel and Google Sheets, find a quick list of the differences between the two kinds of spreadsheet applications in Overview: Differences between Sheets and Excel.

Formulas in spreadsheets

Formulas for success

You may need to calculate everything from sums to averages, to finding minimum and maximum amounts. You'll use calculations for a lot of different kinds of tasks. We'll focus on learning the basics and then do a little math with some sales data to practice.

Basically, formulas can do the math for you. Now, they don't only do math, they can do a lot more.

Formulas are built on operators which are symbols that name the type of operation or calculation to be performed.

For example, a plus sign is a common operator. The formulas you use as a data analyst will usually include at least one operator.

Now, let's talk about math expressions or equations. 3 minus 1, 15 plus 8 divided by 2, 846 times 513. Well, back in math class, you most likely learned to complete an expression by including an equal sign and the solution. These are all examples of expressions. It's slightly different with spreadsheets.

When you create a formula using an expression in a spreadsheet, you start the formula with an equal sign.

For example, if we want to subtract, we type an equal sign followed by the rest of the expression without any spaces in the formula.

Now let's try an expression that's a bit more challenging. We'll type 31982, then a hyphen for a minus sign, then 17795. To calculate, we press "Enter." You'll most likely use formulas this way when dealing with large numbers or expressions with multiple steps.

If you already have data in your spreadsheet, you can use cell references in your formulas instead.

Cell references contain the letter of the column and the number of the row where the data is.

A range can include cells from the same row or column, or from different columns and rows collected together.

Now let's apply what we just learned to some sales data.

Quick reference: Formulas in spreadsheets

Formulas

Absolute reference & Relative reference

Hands-on Guided Project: Create a Personal Spending Budget Using Google Sheets

Summary of the key takeaways from this project

After completing this project, you can reference a summary of the key takeaways from this document.

Please refer to the Coursera Help Center for more information.

Know what differs from relative reference to absolute reference

Spreadsheet error fixes

Spreadsheet Errors and Fixes Demo Sheets

Recently we've been learning about formulas. Sometimes data analysts encounter a problem with our formulas and we get an error. But there are solutions, that's what we're going to explore.

#DIV/0! (The DIV error)

In this spreadsheet, the percentage Complete values in column C are calculated by dividing the values in the Tasks Completed column by the values in the Required Tasks column. Notice that column C is already formatted as a percentage. The DIV error is in cell C4 because we're dividing by zero the value in cell A4.

To avoid this problem, we can have this spreadsheet automatically enter not applicable whenever a cell in column A contains a zero that would cause the error. To do this, we'll use the IFERROR function. If it encounters a DIV error caused by a cell that contains the zero, the phrase "Not applicable" will be inserted.

#ERROR! (in Google Sheets only)

This is also known as a parsing error.

Say we want to tally the number of total tasks in a column B and C, we use the SUM function, but the formula equal sum B2 to B6, C2 to C6 causes an error. Examining it more closely, we see that a comma is missing between the cell ranges B2 to B6 and C2 to C6.

We can fix this by inserting a comma between the cell ranges to indicate the end of each data item. This is called a delimiter, which you will learn more about soon. Now, the formula can correctly calculate the total number of tasks as 25.

#N/A error

The N/A error tells you that the data in your formula can't be found by the spreadsheet. Generally, this means the data doesn't exist. This error most often occurs when using functions such as VLOOKUP, which searches for a certain value in a column to return a corresponding piece of information.

Here, we see a master list of nuts and their prices. Using VLOOKUP, the spreadsheet finds prices in the list and then calculates the prices for each store using the assigned markup.

But we have a N/A error in cells B49 and C49. The VLOOKUP formula is correct, so what's going on?

Well, if we look carefully at the name of the nut, "almond" has no match in the lookup table, the lookup table uses the plural "almonds" instead. So if we change almond to almonds, and with that typo fixed, the right prices are filled in.

#NAME? error

Speaking of typos, sometimes a typo can cause a NAME error. A NAME error can happen when a formula's name isn't recognized or understood.

Suppose we see a NAME error in the nut prices spreadsheet.

If we look carefully, the VLOOKUP function in cell B21 is spelled incorrectly, it has one extra O; this causes a NAME error for both the price and the resulting markup calculation for the store.

To fix this error, we can delete the extra O in VLOOKUP.

#NUM! error

Sometimes an error is caused by inconsistent or wrong data. For instance, the NUM error tells us that a formula's calculation can't be performed as specified by the data. The data doesn't make sense for that calculation.

Suppose we're working on a large construction project using a spreadsheet to track how many months it takes to reach key milestones.

We can use the DATEDIF function to calculate the number of months between start and end dates. The function requires the start date to be in the first cell referenced and the end date to be in the second cell referenced.

In our case, cells B2 and C2 respectively. The M represents months, as we want this spreadsheet to calculate the number of months between our start and end dates. But we get a NUM error in cell D6. We notice that the end date comes before the start date, so the DATEDIF function can't calculate the number of months between.

It's likely the start and end dates were interchanged by accident. We can request verification of the data to make sure. In the meantime, let's reverse the order of the cells in the formula to temporarily get around the error.

Now, the result is nine months.

#VALUE! error

What if the client's name was accidentally inserted into the start date in the spreadsheet? You guessed it, we get an error. The VALUE error can indicate a problem with a formula or referenced cells. It's often not clear right away what the problem is, so this error might take a little more effort to fix.

In this case, John Welty was input as the start date, making the calculation impossible for the DATEDIF function in the cell D6.

We just replace the text, John Welty, with the correct start date of September 1st, 2016.

#REF! error

Last is the REF error, which often comes up when cells being referenced in a formula have been deleted, thus making the formula unable to perform the calculation.

Here's a spreadsheet used to calculate the number of seats available for a company lunch.

Let's say the company decided not to run the second floor, so we delete row 4. This results in a REF error when calculating the total seats available in cell B5.

To fix this, we can change the formula to add the values in cells B2 and B3.

Also, in this case, we could have prevented the REF error by using the SUM function and a range of cells instead of adding the cell value by direct reference. Now, if we delete row 10, the SUM function calculates the total seats available.

Troubleshooting is a big part of data analysis, so being able to find solutions is a key skill for data analysts.

More about spreadsheet errors and fixes

DAC2 Spreadsheet Errors and Fixes.pdf (very good summary of previous lesson)

Tips

If you are working with Microsoft Excel, an interactive page, How to correct a #VALUE! error, can help you narrow down the cause of this error. You can select a specific function from a drop-down list to display a link to tips to fix the error when using that function.

Conditional formatting

Step-by-step in spreadsheets

Spreadsheet error resources To learn more and read about additional examples of errors and solutions, explore these resources:

Microsoft Formulas and Functions: This resource describes how to avoid broken formulas and how to correct errors in Microsoft Excel. This is a useful reference to have saved in case you run into a specific error and need to find solutions quickly while working in Excel.

When Your Formula Doesn’t Work: Formula Parse Errors in Google Sheets: This resource is a guide to finding and fixing some common errors in Google Sheets. If you are working with Google Sheets, you can use this as a quick reference for solving problems you might encounter working on your own.

With some practice and investigative determination, you will become much more comfortable handling errors in spreadsheets. Each error you catch and fix will make your data clearer, cleaner, and more useful.

Formulas are a great way to become more efficient when using spreadsheets, especially when you add shortcuts like copying and pasting, into the mix. As you progress as a data analyst, you'll most likely learn more shortcuts to help your process.

Functions in spreadsheets

Functions 101

Think of functions as the most useful of the shortcuts. The good news is a lot of spreadsheet functions have names that tell you what they do. There are tons of functions out there. As you continue to work with spreadsheets, you'll find that you use certain ones a lot, and others, rarely or not at all.

For now, let's take a look at some of the functions that we can apply to our sales data from the previous lesson.

practice template: Monthly Sales - Function 101

=SUM(B2:E2)

=AVERAGE(B2:E2)

=(E2-D2)/D2 (percentage)

=MIN(B2:E4)

**=MAX(B2:E4)

Quick reference: Functions in spreadsheets

Popular functions A lot of people don’t realize that keyboard shortcuts like cut, save, and find are actually functions. These functions are built into an application and are amazing time-savers. Using shortcuts lets you do more with less effort. They can make you more efficient and productive because you are not constantly reaching for the mouse and navigating menus. Use these links to discover the most popular shortcuts, for Chromebook, PC, and Mac.

Auto-filling

Relative, absolute, and mixed references

Date ranges

For more information, refer to:

Keyboard shortcuts DAC2 Keyboard functions1.pdf

DAC2 Keyboard functions2.pdf

Hands-On Activity: Create a Custom Data Table

Practice template: spreadsheet: 2019_data_analyst_job

select all the data in the spreadsheet


Create a data table

  • What was the total number of applications received per month in 2019?

  • Which months had the least and greatest number of total applications received?

  • What was the average number of applications received per month?

autofill

Go back to the raw data sheet,

autofill relevant months by =TEXT(B2,"mmmm") with double click function.

Go back to the summary data sheet type =COUNTIF('raw data'!G:G,A2) and autofill with doubleclick.

The COUNTIF function quickly counts how many items in a range of cells meet a given criterion. In cell B2, type =COUNTIF('raw data'!G:G,A2). The first entry ('raw data'!G:G) refers to the range where you are counting the data. The range is located on your raw data sheet ('raw data'!) and includes all column G (G:G). This column contains the data for months. The second entry (A2) refers to the criterion you want to count. In this case, it’s “January,” the value in cell A2 of your summary data sheet. The function will tell you how many times January (the criterion) appears in the Date column (the range).

Total(=SUM(B2:B13)), MIN(=min(B2:B13)), MAX(=max(B2:B13)), Avg(=AVERAGE(B2:B13))

Save time with structured thinking

Before solving a problem, understand it

A lot of times, teams jump right into data analysis before realizing a few months later that they are either solving the wrong problem or they don't have the right data. We will learn how to develop a structured approach to defining the problem domain. This is important because if you define the problem clearly from the start, it'll be easier to solve, which saves a lot of time, money, and resources.

Before we can do anything else, we need to understand the problem domain and all of its parts and relationships so that we can discover the whole story. Actually calling it the first piece makes me think of a jigsaw puzzle. Say you have a puzzle. Let's think of that puzzle as our problem domain. You have all 500 pieces but you lost the box. So you don't know what image the puzzle will reveal. Will it be an animal? A waterfall? A bowl of oranges? Whatever it is, it's going to be tough trying to put it together without an image you can refer to. Even the greatest puzzler in the galaxy would need a new process and lots of time to complete that puzzle.

Data analysts face the same kinds of challenges too. You might remember that data analysts aren't always given the complete picture at the start of a project. A big part of their job is to develop a structured approach and use critical thinking to find the best solution. That starts with understanding the problem domain. This is where structured thinking comes into play. To successfully solve a problem as a data analyst, you need to train your brain to think structurally.

Scope of work and structured thinking

Structured thinking

It's having a clear list of what you are expected to deliver, a timeline for major tasks and activities, and checkpoints so the team knows you're making progress.

Problem domain

The starting place for structured thinking is the problem domain. Once you know the specific area of analysis, you can set your base and lay out all your requirements and hypotheses before you start investigating. With a solid base in place, you'll be ready to deal with any obstacles that come up. What kind of obstacles?

Well, let's say you're asked to predict the future value of an apartment building based on a given dataset. You have hundreds of variables and every one is crucial to your analysis. But what if one variable accidentally gets left out, like square footage, for example? You'd have to go back and redo all your hard work. That's because missing variables can lead to inaccurate conclusions.

Scope of work

Another way that you can practice structured thinking and avoid mistakes is by using a scope of work. For many businesses, this includes things like work details, schedules, and reports that the client can expect. Now, as a data analyst, your scope of work will be a bit more technical and include those basic items we just mentioned, but you'll also focus on things like data preparation, validation, analysis of quantitative and qualitative datasets, initial results, and maybe even some visuals to really get the point across.

Let's bring a scope of work to life with a simple example. Say a couple has hired a wedding planner. We'll focus on just one task, the wedding invitations.
Here's what might be in the scope of work: deliverables, timeline, milestones, and reports.

Let's break down just one of these, deliverables. The wedding planner and couple will need to decide on the invitation, make a list of people to invite, collect their addresses, print the invitations, address the envelopes, stamp them, and mail them out.

Now let's check out the timelines. You'll notice the dates and the milestones which keep us on track.

Finally, we have the reports, which give our couple some peace of mind by telling them when each step is complete.

A scope of work can be a simple but powerful tool. With a solid scope of work, you'll be able to address any confusion, contradictions, or questions about the data up-front and make sure these sneaky setbacks don't stand in your way.

Creating a scope of work

Deliverables

Deliverables are items or tasks you will complete before you can finish the project.

Timelines

Timelines include due dates for when deliverables, milestones, and/or reports are due.

Milestones

Milestones are significant tasks you will confirm along your timeline to help everyone know the project is on track.

Reports

Reports notify everyone as you finalize deliverables and meet milestones.

Hands-On Activity: Create a scope of work

Why do you need an SOW? The point of data analysis projects is to complete business tasks that are useful to the stakeholders. Creating an SOW helps to make sure that everyone involved, from analysts and engineers to managers and stakeholders, shares the understanding of what those business goals are, and the plan for accomplishing them.

Clarifying requirements and setting expectations are two of the most important parts of a project. Recall the first phase of the Data Analysis Process—asking questions.

As you ask more and more questions to clarify requirements, goals, data sources, stakeholders, and any other relevant info, an SOW helps you formalize it all by recording all the answers and details. In this context, the word “ask” means two things. Preparing to write an SOW is about asking questions to learn the necessary information about the project, but it’s also about clarifying and defining what you’re being asked to accomplish, and what the limits or boundaries of the “ask” are. After all, if you can’t make a distinction between the business questions you are and aren’t responsible for answering, then it’s hard to know what success means!

What is a good SOW?

Question 1

Activity overview

You have been learning about the role of a data analyst and how to manage, analyze, and visualize data. Now, you will consider a valuable tool to help you practice structured thinking and avoid mistakes: a scope-of-work (SOW).

In this activity, you’ll get practical experience developing an SOW document with the help of a handy template. You will then complete an example SOW for an imaginary project of your choosing and learn how analysts outline the work they are going to perform. By the time you complete this activity, you will be familiar with an essential, industry-standard tool, and gain comfort asking the right questions to develop an SOW.

Before you get started, take a minute to think about the main ideas, goals, and target audiences of SOW documents.

Scope of work: What you need to know

As a data analyst, it’s hard to overstate the importance of an SOW document. A well-defined SOW keeps you, your team, and everyone involved with a project on the same page. It ensures that all contributors, sponsors, and stakeholders share the same understanding of the relevant details.

Why do you need an SOW?

The point of data analysis projects is to complete business tasks that are useful to the stakeholders. Creating an SOW helps to make sure that everyone involved, from analysts and engineers to managers and stakeholders, shares the understanding of what those business goals are, and the plan for accomplishing them.

Clarifying requirements and setting expectations are two of the most important parts of a project. Recall the first phase of the Data Analysis Process—asking questions.

As you ask more and more questions to clarify requirements, goals, data sources, stakeholders, and any other relevant info, an SOW helps you formalize it all by recording all the answers and details. In this context, the word “ask” means two things. Preparing to write an SOW is about asking questions to learn the necessary information about the project, but it’s also about clarifying and defining what you’re being asked to accomplish, and what the limits or boundaries of the “ask” are. After all, if you can’t make a distinction between the business questions you are and aren’t responsible for answering, then it’s hard to know what success means!

What is a good SOW?

There’s no standard format for an SOW. They may differ significantly from one organization to another, or from project to project. However, they all have a few foundational pieces of content in common.

  • Deliverables: What work is being done, and what things are being created as a result of this project? When the project is complete, what are you expected to deliver to the stakeholders? Be specific here. Will you collect data for this project? How much, or for how long? Avoid vague statements. For example, “fixing traffic problems” doesn’t specify the scope. This could mean anything from filling in a few potholes to building a new overpass. Be specific! Use numbers and aim for hard, measurable goals and objectives. For example: “Identify top 10 issues with traffic patterns within the city limits, and identify the top 3 solutions that are most cost-effective for reducing traffic congestion.”
  • Milestones: This is closely related to your timeline. What are the major milestones for progress in your project? How do you know when a given part of the project is considered complete? Milestones can be identified by you, by stakeholders, or by other team members such as the Project Manager. Smaller examples might include incremental steps in a larger project like “Collect and process 50% of required data (100 survey responses)”, but may also be larger examples like ”complete initial data analysis report” or “deliver completed dashboard visualizations and analysis reports to stakeholders”.
  • Timeline: Your timeline will be closely tied to the milestones you create for your project. The timeline is a way of mapping expectations for how long each step of the process should take. The timeline should be specific enough to help all involved decide if a project is on schedule. When will the deliverables be completed? How long do you expect the project will take to complete? If all goes as planned, how long do you expect each component of the project will take? When can we expect to reach each milestone?
  • Reports: Good SOWs also set boundaries for how and when you’ll give status updates to stakeholders. How will you communicate progress with stakeholders and sponsors, and how often? Will progress be reported weekly? Monthly? When milestones are completed? What information will status reports contain?

At a minimum, any SOW should answer all the relevant questions in the above areas. Note that these areas may differ depending on the project. But at their core, the SOW document should always serve the same purpose by containing information that is specific, relevant, and accurate. If something changes in the project, your SOW should reflect those changes.

What is in and out of scope?

Template: Data Analysis Project Scope-Of-Work (SOW) Template

Fill the template in for an imaginary project

  • Spend a few minutes thinking about a plausible data analysis project. Check out 5 Data Analytics Projects for Beginners if you need help coming up with ideas.

  • Come up with a problem domain, and then make up the relevant details to help you fill out the template.

  • Take some time to fill out the template. Treat this exercise as if you were writing your first SOW in your new career as a data analyst. Try to be thorough, specific, and concise!

  • The specifics here aren’t important. The goal is to get comfortable identifying and formalizing requirements and using those requirements in a professional manner by creating SOWs.

Link to the strong example: Data Analysis Project Scope-of-Work (SOW) Strong Example

Reflection

  • How did you identify and formalize the project’s requirements?

  • What questions did you ask in order to define the foundational boundaries and pieces of your analytical content, which may include descriptions of your deliverables, timelines, milestones, and reports? (needs to review)

Staying objective

We'll explore the importance of contextualizing data, and recognizing data bias. Let's get started.

Data doesn't live in a vacuum, it needs context. Earlier, we learned that context is the condition in which something exists or happens. Actions can be appropriate in some contexts, but inappropriate in others.

For example, yelling move, is rude one context if your friend is standing in front of the TV, but it's entirely appropriate in another if that friend is about to get hit by a kid on a tricycle.

In the world of data, numbers don't mean much without context. As we have more and more data available to us. We can leverage that data in increasingly sophisticated ways, and generate more powerful insights from it.

We use data at many different levels. Sometimes our data is descriptive, answering questions like, how much did we spend on travel last month? Data becomes more valuable, as we generate diagnostic and predictive insights, like understanding why travel spending increased last month. Data is most valuable, however, when we can generate prescriptive insights.

For example, how can we leverage data to incentivize more efficient travel?

As a data analyst, a big part of your job, is putting data into context. It's also up to you, to remain objective and recognize all sides of an argument, before drawing conclusions. The thing about context, is that it's very personal. If two people curate the same data set, and follow the same directions, there's a chance they will end up with different results. Why? Because there is no universal set of contextual interpretations. Everyone approaches it in their own way. Even if the data collection process is correct, the analysis can still be misinterpreted. Conclusions can be influenced by your own conscious and subconscious biases, which are based on cultural, social and market norms.

For example, if you ask a Boston resident, which baseball team is the best, chances are, they're going to say Boston Red Sox. Which brings us to a major limitation of data analytics. If the analysis is not objective, the conclusions can be misleading.

To really understand what the data is about, you have to think through who, what, where, when, how and why. It's good to ask yourself questions like,

  • who collected the data? And what is it about?
  • What does the data represent in the world,
  • and how does it relate to other data?
  • When, was the data collected?

Data collected awhile ago may have certain limitations, given the present day situation. For example, if we collected phone numbers over the past century, at some point, mobile phones would have been introduced, leading to the need for an additional phone number field. You should also think about,

  • where, was the data collected? A lot can change across cities, states and countries,
  • and how was it collected.

A survey might not be as effective as an in-person interview, for example.

Of course, there's the, why. The why can have a particularly strong relationship with bias.

Why? Because sometimes, data is collected, or even made up, to serve an agenda.

The best thing you can do for the fairness and accuracy of your data is to make sure you start with an accurate representation of the population and collect the data in the most appropriate, and objective way. Then, you'll have the facts so you can pass them on to your team.

The importance of context

Contextis the condition in which something exists or happens. Context is important in data analytics because it helps you sift through huge amounts of disorganized data and turn it into something meaningful. The fact is, data has little value if it is not paired with context.

Context can turn raw data into meaningful information. It is very important for data analysts to contextualize their data. This means giving the data perspective by defining it. To do this, you need to identify:

  • Who: The person or organization that created, collected, and/or funded the data collection
  • What: The things in the world that data could have an impact on
  • Where: The origin of the data
  • When: The time when the data was created or collected
  • Why: The motivation behind the creation or collection
  • How: The method used to create or collect it

Learning Log: Define problems and ask questions with data

template: Define problems and ask questions with data


Course 2 Module 3 Glossary