Course 5‐2 - Forestreee/Data-Analytics GitHub Wiki
Google Data Analytics Professional
Analyze Data to Answer Questions
WEEK2 - Formatting and adjusting data
As you move closer to analyzing your data, you’ll want to have the data formatted and ready to go. In this part of the course, you’ll learn all about converting and formatting data, including how SQL queries can help you combine data. You’ll also find out the value of feedback and support from your colleagues and how it can lead to new learning that you can apply to your work.
Learning Objectives
- Demonstrate an understanding of what is involved in the conversion and formatting of data
- Demonstrate an understanding of the use of spreadsheets and SQL queries to combine multiple pieces of data
- Discuss the importance of seeking feedback and support from others
Convert and format data
Getting started with data formatting
Hey, it's great to have you back! You've learned so much already, and now you're ready to start analyzing data. Coming up, we'll cover some final things you'll need to do for your analysis to make sure your data is formatted and adjusted correctly.
We'll start converting and formatting your data; using data validation in spreadsheets; and conditional formatting. You'll also learn how to combine multiple pieces of data. And finally, we'll talk about how to get support during your analysis and find resources whenever you're stuck.
These skills will help make sure that your data analysis process is as smooth as possible. And even when it isn't, you'll know how to tackle any problems that might come up.
A big piece of being an analyst is troubleshooting and problem-solving. You're as good of an analyst as your ability to ask the right questions, which is why we'll spend some time learning about problem-solving strategies you can use during analysis. So whenever you're ready to start learning about data formatting and solving problems, head to the next video, and we'll get started.
From on type to another
Hey there! So far, we've learned about typecasting data with SQL as a way of converting data from one type to another in databases. Now I want to check out another way to format data types within spreadsheets. In this video, we'll talk more about why making sure your data is formatted properly is so important and how to format numbers and convert units of measurement in your spreadsheets. Let's get started.
Sometimes, you need to convert data when you're working with spreadsheets. That might mean changing numbers into dates, strings, percentages, or even currency. It's important to double check that all of your data is in the right format for your analysis. Sometimes even after cleaning and processing data, it still might not be in the right format you need.
Let's think back to the table with movie data from before.
Would you like to follow along with the instructor using the same spreadsheet? To use the template for this spreadsheet, click the link below Movie Data Starter Project
There were a lot of different data types that included numbers, such as dates, budgets, and text strings, like actors' names.
These are distinct values, but the spreadsheet doesn't always automatically know that.
Here's an example. Let's say you wanted to sort the movies in this spreadsheet by most recent. If the spreadsheet cast them as strings instead of dates, it might sort them alphabetically. Until you change the data type, you won't be able to sort them the way you want. It's also possible that your datasets contain inconsistent units of measurement that you'll need to convert. Like say, a table that includes both US dollars and English pounds. That's why it's important to check those data types again, so you don't run into any problems during the actual analysis. Think about the incorrectly cast dates in our movie table. If your boss needed a list of the 20 most recent movies, but your spreadsheet was organized alphabetically instead of by the most recent, you wouldn't be giving her the list of movies she needed.
Incorrectly formatted data can lead to time-consuming mistakes in your analysis and might end up affecting your stakeholders' decision-making.
But taking the time early on to convert and format your data can help you avoid that. And now that you know why you'll need to convert data types while working in spreadsheets, let's find out how.
First, let me show you a really useful menu for specifying data types in spreadsheets.
Here's the movie data table we use before, but now the money columns aren't typed as currency. On the toolbar at the top of the sheet, you'll find a menu that can help you convert these numbers into specific data types. It gives you a lot of choices just from the drop-down menu, such as number, currency, date, percentage.... And if you click to open the full menu, there's even more options, including one for a custom number format. We know that we want these columns to be in currency format, so let's do that.
All I have to do is select this column and then hit the currency shortcut.
And now it's all typed correctly. But it doesn't stop there.
You can go even further and convert the unit of measurement you're using.
Would you like to follow along with the instructor using the same spreadsheet? To use the template for this spreadsheet, click the link below Weather Table - Data for CONVERT
For this example, let's check out a different table.
Imagine that you're working with a weather channel to gather data about daily temperatures. You have a table with some data about daily observations on the temperature, wind speed, and precipitation in this area.
Right now, the temperatures are in Fahrenheit, but for your analysis you need them to be in Celsius. No problem. All you need to do is use the CONVERT function to change the unit of measurement.
We'll use this empty column here. Here's the first temperature in the table. We'll input the CONVERT function in our new column to change it to Celsius. Then we need to put what cell we want converted. And finally, we're going to convert.
And presto! Now this cell has the right unit of measurement for your analysis.
You can simply apply it to the rest of this column. Now this temperature data is all in Celsius, and your unit of measurement is consistent across the table.
And here's another tip. When adding data to tables using a formula, go back and paste the data in as values afterwards. That way they're locked in. Otherwise the cell stays as a formula and could get confusing when you start working with the data. So let's do that now. We'll copy the values and then right click in a new column. There's an option for "Paste special." And there's an option to "Paste values only." And now we have the static values in this column.
Making sure your data is in the right format before you start analysis is so important. Do this, and your analysis will return the kinds of answers you're really searching for.
For details on the correct syntax, refer to the Google Help Center documentation for CONVERT. If you're following along, do you want more practice? Try converting the wind speed in Column D from miles per hour (mph) to meters per second (m/s) using CONVERT. In cell H2, enter: =CONVERT(D2, "mph", "m/s") You can check if your conversion is correct by entering 8.5248 in a metric conversion tool, metric-conversions.org/speed/miles-per-hour-to-meters-per-second.htm. After filling the rest of Column H, your spreadsheet will appear like below. You can also copy and paste values only into Column I (not shown) as you did in Column G for the converted temperatures.
And now you know some ways to typecast numbers and convert units of measurement in spreadsheets. You can feel confident your data is formatted the right way.
Next up, we'll talk more about adjusting your data for analysis and data validation. See you soon.
Converting data in spreadsheets
As a data analyst, there are lots of scenarios when you might need to convert data in a spreadsheet:
String to date
- How to convert text to date in Excel : Transforming a series of numbers into dates is a common scenario you will encounter. This resource will help you learn how to use Excel functions to convert text and numbers to dates, and how to turn text strings into dates without a formula.
- Google Sheets: Change date format: If you are working with Google Sheets, this resource will demonstrate how to convert your text strings to dates and how to apply the different date formats available in Google Sheets.
String to numbers
- How to convert text to number in Excel: Even though you will have values in your spreadsheet that resemble numbers, they may not actually be numbers. This conversion is important because it will allow your numbers to add up and be used in formulas without errors in Excel.
- How to convert text to numbers in Google Sheets: This resource is useful if you are working in Google Sheets; it will demonstrate how to convert text strings to numbers in Google Sheets. It also includes multiple formulas you can apply to your own sheets, so you can find the method that works best for you.
Combining columns
- Convert text from two or more cells: Sometimes you may need to merge text from two or more cells. This Microsoft Support page guides you through two distinct ways you can accomplish this task without losing or altering your data. It also includes a step-by-step video tutorial to help guide you through the process.
- How to split or combine cells in Google Sheets: This guide will demonstrate how to to split or combine cells using Google Sheets specifically. If you are using Google Sheets, this is a useful resource to reference if you need to combine cells. It includes an example using real data.
Number to percentage
- Format numbers as percentages: Formatting numbers as percentages is a useful skill to have on any project. This Microsoft Support page will provide several techniques and tips for how to display your numbers as percentages.
- TO_PERCENT: This Google Sheets support page demonstrates how to use the TO_PERCENT formula to convert numbers to percentages. It also includes links to other formulas that can help you convert strings.
Pro tip: Keep in mind that you may have lots of columns of data that require different formats. Consistency is key, and best practice is to make sure an entire column has the same format.
Additional resources If you find yourself needing to convert other types of data, you can find resources on Microsoft Support for Excel or Google Docs Editor Help for Google Sheets.
Converting data is quick and easy, and the same functions can be used again and again. You can also keep these links bookmarked for future use, so you will always have them ready in case any of these issues arise. Now that you know how to convert data, you are on your way to becoming a successful data analyst.
Hands-On Activity: Combine multiple pieces of data
Data validation
Welcome back! While we're learning about formatting data, I want to talk to you about another spreadsheet feature: data validation. In this video, I'll teach you a little bit about data validation and show you how to use it. For now, when I say data validation, I'm talking about the function, which is different from the data validation process. We'll get into that later on.
But first, let's talk about what data validation does in spreadsheets. Basically, it allows you to control what can and can't be entered in your worksheet.
Usually, data validation is used to add drop-down lists to cells with predetermined options for users to choose from. If you have a spreadsheet with a lot of collaborators, this can make it easier for them to interact with your table. You can think of it like a multiple-choice question on a quiz. Since you control what's being entered into the worksheet, it cuts down on how much data cleaning you have to do later on.
Let's figure out how we might do that. For this example, we'll work on a project with a lot of milestones and deadlines to keep track of.
Would you like to follow along with the instructor using the same spreadsheet? To use the spreadsheet template, click the link below Project Spreadsheet - for Data Validation
Let's say our team has a spreadsheet that tracks everyone's progress. But instead of making everyone write in where they are in their task individually, we can provide a drop-down menu with multiple options, like "Not Yet Started," "In Progress," and "Ready."
So we'll select the column that we want to add the drop-down menus to, in this case, the "Status" column. Then we'll go to the Data pull-down menu here at the top and click "Data validation."
This brings up a pop-up menu with options for data validation. In this case, we know that we want to add a list of items for other users to choose from. So we'll select the "list of items" option from the possible criteria and type in the selections we want to create.
Then hit Save, and now all of those cells have drop-down menus that we can use to easily mark progress for each task.
But there's other things that you can do with data validation and spreadsheets, too, like creating custom check boxes. To do this, let's select the cells under the "Review" column to make a checkbox that will let us know if tasks have been approved or not.
We will be using column E to create checkboxes. Follow along as we create them!
We'll go back to the data validation menu. But instead of choosing "List from a range," we'll choose "Checkbox." There's an option to use custom cell values. Let's choose that and put in "Approved" and "Not approved."
Now these tasks can be checked off by whoever's reviewing them, like a project manager, for example.
Another way we can use data validation is to protect structured data and formulas. The more people that are working together in a spreadsheet, the more likely someone can accidentally break a formula.
But good news: the data validation menu has an option to reject invalid inputs, which helps make sure our custom tools will continue to run correctly, even if someone puts the wrong data in by mistake.
All right, now you know three uses for data validation in your spreadsheets: adding drop-down lists, creating custom checkboxes, and protecting structured data and formulas.
Data validation can help your team track progress, protect your tables from breaking when working in big teams, and help you customize tables to your needs.
Coming up, we'll learn more about conditional formatting and some ways you can use conditional formatting and data validation together. See you soon!
Conditional formatting
Hi again.
In this video, we'll take that even further by combining conditional formatting and data validation to create custom tools for our spreadsheets. So far, we've used conditional formatting to highlight empty cells that still needed data so that we could quickly pinpoint what information our table was missing and add it in. Now, let's build on that by using it to make our scheduling table easier to read at a glance.
If you followed along with the instructor in the Data Validation video, click Continue and use the same spreadsheet to continue following along. If you didn't previously follow along, or hopped around videos, click the link below Project Spreadsheet - for Conditional Formatting
Now all of the upcoming review dates have an easy-to-see color code, so anyone using this table can quickly reference these deadlines. You'll find that some spreadsheet programs, like Excel, have built-in color codes that you can use, too.
And there you go. Now you know how to use data validation and conditional formatting to create custom tools and visual cues that make your information easy to understand. There's a lot of different ways to use these tools, so feel free to experiment with them in your own spreadsheets.
Coming up, we'll keep learning about new tools for spreadsheets and SQL. Bye for now.
Identify conditional formatting use cases
Recently, you've been learning about uses for conditional formatting, such as indicating a task’s status by highlighting it red, yellow, or green. Now it’s your turn to think of examples of when you could use conditional formatting in the real world. You may even discover how your life and work can become easier with this practical and useful tool.
Write two or more paragraphs (100-150 words) describing three times when you could use conditional formatting. Then, visit the discussion forum to read what other learners have written, and engage in at least two discussions about their posts.
Transforming data in SQL
hashtags: CAST, COERCION, UNIX_DATE
Common conversions
The CAST function (syntax and examples)
Converting a number to a string
Converting a string to a number
Converting a date to a string
Converting a date to a datetime
The SAFE_CAST function
More information Browse these resources for more information about data conversion using other SQL dialects (instead of BigQuery):
- CAST and CONVERT: SQL Server reference documentation
- MySQL CAST Functions and Operators: MySQL reference documentation
- How to: SQL Type Casting: Blog about type casting that has links to other SQL short guides
Practice Quiz: converting and formatting data
Combine multiple datasets
Optional: Prepare to use the bike sharing dataset in BigQuery
Merging and multiple sources
Great to see you back.
In this video, we'll build on what we've learned about concatenate and import range by exploring a new SQL query concat.
You might remember that concatenate is a function that joins together two or more text strings and as a quick reminder a text strings.
As a quick reminder a text string is a group of characters within a cell, most often composed of letters.
You've seen how that works within a single spreadsheet.
But there's a similar function in Sequel that allows you to join multiple text strings for multiple sources, concat. Let's use concat to combine strings from multiple tables to create new strings. For this example, we'll use open data from City Bike which is a public bicycle-sharing system in New York. As you've learned earlier open data initiatives have created a ton of data for analysts to use.
Openness or open data is free access usage and sharing of data. It's a great resource if you want to practice or experiment with the data analysis tools you've been learning here. You have open access to the New York City bike-sharing data which has information about the use of shared bikes across the city.
Now we can use concat to pull and concatenate data from different columns stored here. The first thing we need to do is figure out which columns we need, that way we can tell SQL where the strings we want are.
For example, the bike sharing company has two different kinds of customers one-time paying customers and subscribers.
Let's say we want to find out what routes are most popular with different user types. To do that, we need to create strings of recognizable route names that we can count in sort. We know that the information we need is in the stations and trips table, so we'll start building our query from there.
First we'll input SELECT user type to let SQL know that we want the user type as a column. Then we'll use CONCAT to combine the names of the beginning and ending stations for each trip in a new column. This will create one column based on the routes people take. We also need to input a title for this new column, we'll type in AS route to name the route column using those beginning and ending station names we combined with CONCAT. This will make these route names easy for us to read and understand.
After that, we want SQL to count the number of trips, so we'll input COUNT to do that, we can use an asterisk to tell it's a count of the number of rows in the data we're selecting. In this case, each row represents a trip which is why we can just count all of the rows we've selected. We'll name this output as n u m _ trips.
Now let's also get the average trip duration for each route. In this case, we don't need the exact average, so we can use the round function to round up. We'll put that first, and then in the parentheses, use average(AVG) to get the average trip duration. We'll also want this data to be an integer form for this calculation, so input cast as int 64. Bigquery stores numbers and a 64-bit memory system, which is why there's a 64 after integer in this case.
The instructor is about to say "divide by the number of rows." This is incorrect. Instead, this query divides by 60, which is the number of seconds in a minute. Dividing by 60 returns the output "duration" in minutes instead of seconds.
Next will divide it by the number of rows until it how far we want it to round, two decimal places. We'll name this output AS duration. Next, we'll need to tell SQL where this information is stored, so we'll use from and the location we're pulling it from.
Since we're using count and average functions and our select clause, we have to use GROUP BY to group together summary rows. Let's GROUP BY the start station, the end station, and the user type for this query.
Finally, we'll use ORDER BY to tell it how we want to organize this data. For this, we want to figure out the most common trips, so we can input the number of trips column (num_trips) and use DESC to put it in descending order.
Finally, we only want the top 10. So let's add limit 10.
And now thanks to concat, we can easily read these route names and trace them back to real places, and we can see which kind of customers are taking which route, which could help the bike-sharing company understand their user base and different parts of the city and where to keep more bikes for people to rent.
Being able to combine multiple pieces of data can give you new ways to organize and analyze data and there's a lot of different tools to help you do that. Now you've seen concat in action and later, you will come across another similar query join.
But up next we'll talk more about working with strings, see you soon!
Strings in spreadsheets
Hey, welcome back. So far we've worked with strings in both SQL and spreadsheets before, and we've learned that they usually have similar functions.
If these directions do not work for the version of Excel that you have. Visit this free online training center, Microsoft Excel for Windows Training, you will find everything you need to know, all in one place.
You've come across these functions used in SQL, but now you'll find out how they work in spreadsheets.
Would you like to follow along with the instructor using the same spreadsheet? To use the template for the spreadsheet, click the link below (NOTE: this is a large dataset and loading times may take longer than normal) Citi Bike Trip Data
This is all useful data, but chances are we'll only need part of the strings to make a formula or answer a question. For example, these strings contain multiple data points, like date and time. But if we're trying to find the average time between start times, we won't need the date. We can actually use LEN, LEFT and RIGHT, and FIND to split the timestamps into separate columns if we want.
Using Excel? Visit this free online training center, Microsoft Excel for Windows Training, you will find everything you need to know, all in one place.
Let's build a simple formula to separate the dates in these strings. We know that LEN tells us the length of a string. Let's check how long these datetime strings are now. To start, we'll input the first part of the formula.
And then we'll just select one of the cells with the datetime string in it. These strings are 19 characters long.
We can use the FIND function to locate specific characters in a string. Keep in mind, this is case-sensitive. So if you're using FIND to pull a substring, make sure that you've input the substring correctly. We notice that all of the datetime strings have a space separating the date and the timestamp. So we can actually use FIND to figure out where the date ends.
Okay, seems like the space is the 11th character in this string. So the timestamp substring will start at character 12.
We can use the LEFT and RIGHT functions to select which parts of the string we want to isolate in a new column.
As a data analyst, being able to work with strings is a key skill, especially when you find yourself working with data from outside sources. Hopefully, you're a little bit more comfortable applying LEN, RIGHT, LEFT and FIND functions in both SQL and spreadsheets.
Later on, we'll use these functions to perform even more complicated formulas, so feel free to try them out on some data yourself, maybe even some open data like we've been using today. See you later.
Manipulating strings in SQL
CONCAT at work
Practice makes perfect W3 Schools is an excellent resource for interactive SQL learning, and the following links will guide you through transforming your data using SQL:
-
SQL functions: This is a comprehensive list of functions to get you started. Click on each function, where you will learn about the definition, usage, examples, and even be able to create and run your own query for practice. Try it out for yourself!
-
SQL Keywords: This is a helpful SQL keywords reference to bookmark as you increase your knowledge of SQL. This list of keywords are reserved words that you will use as your need to perform different operations in the database grows.
-
While this reading went through the basics of each of these functions, there is still more to learn, and you can even combine your own strings.
- Practice using CONCAT
- Practice using CONCAT WS
- Practice using CONCAT with +
Pro tip: The functions presented in the resources above may be applied in slightly different ways depending on the database that you are using (e.g. mySQL versus SQL Server). But, the general description provided for each function will prepare you to customize how you use these functions as needed.
SQL Syntax
Learning Log: A data analysis checklist
Reflection When you’re finished, think about the process of creating your checklist. Write 3-5 sentences (60-100 words) in your learning log answering the following questions:
- Did making a checklist help you better understand the data analysis process?
- How can you adapt your checklist to fit each new project? When you’ve finished your entry in the learning log template, make sure to save the document so your response is somewhere accessible. This will help you continue applying data analysis to your everyday life. You will also be able to track your progress and growth as a data analyst.
Test your knowledge on combining multiple datasets
Get support during analysis
What to do when you get stuck
Hi there. Data analysts spend a lot of time problem-solving, and that means there's going to be times when you get stuck, but the trick is knowing what to do when that happens.
Asking other people about a problem you're having can help you find new solutions that move a project forward. It's always a good idea to reach out to your peers and mentors, especially if they're working with you on that project. Your team members have valuable knowledge and insight that can help you find the solution you need to get unstuck. Sometimes we spend a lot of time spinning our wheels saying, "I can do this myself," but we can be way more productive if we engage with other people, find new resources to lean on, and try to get as many voices as we can involved.
Note: If the units of measurement in the timestamp columns are expressed in hours, adding 1 (day) to the formula would incorrectly calculate the results as a negative value, thus causing an error in the cell. Update: Since 1 day = 24 hours, you could substitute the number "24" to satisfy the following statement: Add 24 minus the start time to the formula that's being used for the multi-day trip; then try to apply to another trip that happened in the same day.
You need to find a way to build a conditional formula, but you aren't sure how. You decide to check in with other analysts working on your team to see if they have any ideas.
They suggest using an IF formula like this. This basically says that, "if the end time is larger than the start time, replace the standard end time minus start time formula with one minus start time plus end time."
Updated Formula: As a result of the different units of measurement as described in the last pop-up, the following formula would calculate the correct solution: =IF(end>start, end-start, 24+end-start)
Now it's also possible that your team members don't have an answer; that's okay too. There's definitely someone else with the same problem asking the same questions online. Knowing how to find solutions online is an incredibly valuable problem-solving tool for data analysis.
There's also all kinds of forums where spreadsheet users can ask questions, and you never know what you can turn up with just a basic search.
Whether you're asking someone you know or searching the internet for answers, reaching out for help can give you some really interesting solutions and new ways to solve problems for future analysis.
Coming up, we'll learn even more about searching for solutions online. See you soon.
Advanced spreadsheet tips and tricks
Google Sheets
-
Keyboard shortcuts for Google Sheets: This is a great resource for quickly learning a range of keyboard shortcuts that can make regular tasks quicker and easier, like navigating your spreadsheet or accessing formulas and functions. This list contains shortcuts for the desktop and mobile versions of Google Sheets so that you can apply them to your work no matter what device you are using.
-
List of Google Sheets Functions: This is a comprehensive list of the Google Sheets functions and syntax. Each function is listed with a link to learn more.
-
23 Must-Know Google Sheet Formulas: This blog article from Layer summarizes and describes 20 of the most useful Google Sheets formulas.
-
18 Google Sheets Formula Tips and Techniques: These are tips for using Google Sheets shortcuts when working with formulas.
Excel
-
Keyboard shortcuts in Excel: Earlier in this list, you were provided with a resource for keyboard shortcuts in Google Sheets. Similarly, this resource provides a list of keyboard shortcuts in Excel that will make performing regular spreadsheet tasks more efficient. This includes keyboard shortcuts for both desktop and mobile versions of Excel, so you can apply them no matter what platform you are working on.
-
222 Excel shortcuts: A compilation of shortcuts includes links to more detailed explanations about how to use them. This is a great way to quickly reference keyboard shortcuts. The list has been organized by functionality, so you can go directly to the sections that are most useful to you.
-
List of spreadsheet functions: This is a comprehensive list of Excel spreadsheet functions with links to more detailed explanations. This is a useful resource to save so that you can reference it often; that way, you’ll have access to functions and examples that you can apply to your work.
-
List of spreadsheet formulas: Similar to the previous resource, this comprehensive list of Excel spreadsheet formulas with links to more detailed explanations and can be saved and referenced any time you need to check out a formula for your analysis.
-
Essential Excel Skills for Analyzing Data: This blog post includes more advanced functionalities of some spreadsheet tools that you have previously learned about, like pivot tables and conditional formatting. These skills have been identified as particularly useful for data analysis. Each section includes a how-to video that will take you through the process of using these functions step-by-step, so that you can apply them to your own analysis.
-
Advanced Spreadsheet Skills: Mark Jhon C. Oxillo’s presentation starts with a basic overview of spreadsheet but also includes advanced functions and exercises to help you apply formulas to actual data in Excel. This is a great way to review some basic concepts and practice the skills you have been learning so far.
Layla: All about the analyze phase
Hi. My name is Layla, and I'm an analytical lead at Google. An analytical lead is someone who helps advertisers understand the value of their advertising dollars. We also help them understand if they were to spend another dollar on ads, where should they spend that dollar, and what could they expect to get from it. The skill set involved in this kind of role has all to do with being able to look at a dataset and make sense of it, and then tell a story to people who maybe don't have that same experience level with data. What's going on in the data? What's driving growth for your client or your company? What could they do more of or less of to drive more of what they want to happen? The analyze stage is like preparing a fabulous meal. You have done all the cleaning and the preparing and the cooking, and you're finally able to take a bite and to see if what you're originally hoping to happen or what you were expecting, to see if that is really the case. Is it delicious? Is it exactly like you expected? Or is the consistency a little off and you need to add a little bit more salt? The analysis stage begins once you've prepped and cleaned your data. You don't want to have those blank fields that will throw you off or duplicate entries that will enlarge your dataset beyond what's actually true. The analyze stage is where you become the expert about your dataset. Here, you're going to understand all of the different fields. You're going to understand their averages, potentially the median of the data. You're going to understand how different rows in your data differ from each other. And it's where you're going to gain the confidence to be able to explain your findings to an audience that maybe does not have the same level of expertise with data that you have.
When I analyze data, I often like to use SQL and spreadsheets. You can use these tools to, for example, sort your data and understand which entries are larger than others. Or to understand how many times something happens by selecting the distinct entries. Here, you can also filter out data that you are specifically interested in analyzing, or in a spreadsheet, use conditional formatting to show which entries show a more positive outcome and which ones are maybe more negative. Using SQL and spreadsheets to help you through the analyze phase are absolutely crucial. With these tools, you can format your dataset in a way that is digestible and then begin to tell a story with the data. My favorite part of working in a spreadsheet is when you finally have that data that you want cleaned up and exported from your SQL query. Then you get to turn it into a pivot table and chart out exactly the cut of data that you were interested in looking at in the first place and explore the trends that are happening there. When you get to do that, you basically unlock a whole world of information and you get to pick what story that you actually want to tell with your data without just saying, "This is the largest number; that's the smallest number." Here, you have to show what is happening over time potentially or what you should expect to see in the future.
Running into challenges? Not to worry!
Welcome back. Earlier, we talked a little bit about finding resources online to help you figure out solutions to problems during analysis. The internet has so much knowledge and advice to offer, but you need to know how to find it.
In this video, we'll talk more about finding answers online. You might think that great data analysts don't regularly rely on outside resources, but that's a myth. The best data analysts know that finding answers to their problems online can be empowering and give them new knowledge for the future.
Being able to find new ideas and combine them with what you already know can help you come up with some amazing things. Don't be afraid to turn to the internet to find your answers. It's a great resource that lots of analysts use. Me included.
But let's talk more about how you can make sure you're using web resources in the best way possible. There's a combination of best practices that you can use to guide your search for answers online.
By practicing the thinking skills we've learned in this program, using the right data analytics terms, and your basic knowledge of analysis tools, you have everything you need to find answers and apply them to your own work.
And it starts with how you approach a problem mentally. You've learned about different kinds of thinking skills and how to practice them in your data analysis work. From analytical to mathematical, to structured thinking. This helps build your mental model, or your thought process, and the way you approach a problem.
Data analysts use these thinking skills to approach a problem logically and break it into smaller parts. Building this into your own problem-solving process can help you pinpoint specific questions, which you can use to find resources more easily.
For example, maybe you keep running into an error in your analysis. You narrow it down to two possibilities: your formula or the data itself. You double check your formula, and you see that it's correct. So now you know that you need to make sure that data has been entered correctly. You consider the problem logically and track it down to the source, using your mental model.
Next, it's important to use the right terms when searching for solutions. Knowing how to frame data analytics questions with the same language other analysts are using will help you get more search results, and it'll help you understand what other analysts are saying.
For example, maybe you need to use the left four characters of a string for a column in SQL. How would you search for this? Searching for "four characters in a column" is a little vague and might not bring up specific resources. But "left string query SQL" uses some keywords that other data analysts are also using to talk about these things.
On top of being able to use the right terms to search online, you also need to be familiar with basic tools. That way, when an online resource is walking you through a new function and a tool that you've used before, you'll know how those tools work.
For example, if you find a spreadsheet formula online, you need to understand how formulas work to apply it to your own spreadsheet, or maybe the dataset you're working with is too large for a single spreadsheet, and you'll need to switch to SQL. Having a variety of tools in your toolkit is important as a data analyst, but just as important is knowing when to use them.
If you find yourself stuck on a problem, it can be a good idea to take a step back and reconsider how you're approaching a task. We've covered a lot of tools that you can use as a data analyst throughout this program.
Later, you'll learn one more, R. We'll talk about R more later on, but here's a sneak peak to get you excited for it.
R is another programming language, but it's not a database language like SQL. It's a programming language frequently used for statistical analysis, visualization, and other data analysis. R is a little different from other tools we've been working with, but it's a great complement for the tools you're already using, and it will give you more potential solutions when you run into problems.
Using the thinking skills we've learned throughout this program, the right terms, and your understanding of different analysis tools, we'll get you ready for the next steps of this process: actually searching for answers online.
There's a lot of resources like program support websites and forums where other data analysts are asking and answering questions.
In an earlier video, we ran into a problem trying to calculate the time elapse between bike rides and the bike-sharing data. Maybe our first search, "Calculate time in spreadsheets," didn't turn up the answers we needed. By thinking about our specific question and how other data analysts might be asking it, we could change that search into "Conditional formula for calculating elapsed time in spreadsheets." Now, we have more specific solutions to our problem.
Finally, being able to modify example code to fit your own needs is so useful. Understanding the syntax of formulas and functions for different tools will allow you to take what you learned online and make it work for you, and maybe even build on it to create a whole new solution.
For example, the MOD formula we built to account for trips that started and ended on different days in our bike-sharing data.
The MOD formula we found online wasn't created for the data we were working with. But because we are familiar with spreadsheet tools, we were able to apply it to our data and use it as a solution to our problem. Great data analysts know how to find and use resources online to help them build new solutions to problems they face. By using the thinking skills you've already learned in this program and using your knowledge of data analytics tools and terms, you can, too. Once you've found some answers to your problems, you can build them into your analysis work to get past any challenge you might face.
When to use which tool
Hello there. In these videos, you've been introduced to spreadsheets, SQL, and so many other tools. We've also talked about choosing the right tool before you start a project. But sometimes you find yourself stuck on a problem during your data analysis. That might mean it's time to reconsider which tool you're using for the job.
For example, if you're working with a simple spreadsheet, maybe five to ten rows and a few columns, then pivot tables are a great way to visualize that data. But if that spreadsheet is more than a million rows, it'll start to crash, making a pivot table hard to complete.
When you find yourself working with a huge spreadsheet that keeps crashing, you might switch to SQL to pull the data you need from different locations in a database instead of from a single spreadsheet. You might remember that SQL can handle trillions of rows of data and is now a standard language for working with database programs. SQL is great for querying, updating, and optimizing data. But trying to analyze your data with only SQL can get complicated.
As you continue to progress as a data analyst, you might find yourself spending a lot of time building long, nested queries and then debugging them. It might be time to consider another tool, R. R is a new tool that you'll work with later on, but for now, I'll tell you a little bit about it so that you can start getting excited.
R is another programming language, but it's not a database language like SQL. It's a programming language frequently used for statistical analysis, visualization, and other data analysis. R is a little different from other tools we've been working with, but it's a great complement for the tools you're already using. With R, you'll be able to analyze and visualize data in all kinds of new ways. We'll talk about R more later on, but I hope this sneak peek gives you an exciting first look.
Having a variety of tools in your tool kit is important as a data analyst, but just as important is knowing when to use them. If you find yourself stuck on a problem, it can be a good idea to take a step back and reconsider how you're approaching a task. "Do you have too much data for a single spreadsheet? Switch to SQL." "Are you spending more time debugging queries than actually analyzing data? Maybe you should consider R."
You also know how to find answers online now. So if you ever run into a problem and need to try a different tool, a quick search can be really helpful. There might be resources online, or someone else may have had the same problem and posted about it. This is great if you start feeling stuck on a problem, and you might even find a new way to use a tool you're already familiar with. That brings us to the end of this module. Great job.
We've covered a lot of information. We learned about converting and formatting data, how to combine multiple pieces of data, and how to search for help when you need support during your analysis.
Coming up next, you'll take on the weekly challenge. As always, feel free to go back over anything we've learned from these past videos. Then I'll see you for the next video. Good luck.