Course 5‐1 - Forestreee/Data-Analytics GitHub Wiki

Google Data Analytics Professional

[Analyze Data to Answer Questions]

WEEK1 - Organizing data to begin analysis

Organizing data makes the data easier to use in your analysis. In this part of the course, you’ll learn the importance of organizing your data through sorting and filtering. You’ll explore these processes in both spreadsheets and SQL as you continue to prepare your data for analysis.

Learning Objectives

  • Describe what is involved in the data analysis process with reference to goals and key tasks
  • Discuss the importance of organizing data before analysis with references to sorts and filters
  • Describe sorting as it relates to data in a spreadsheet or database with reference to functionality and benefits
  • Demonstrate an understanding of the steps involved in sorting and filtering data through the use of SQL queries

Let's get organized

Introduction to getting organized

Hey there, future data analysts! You've made a lot of progress so far. It's not an easy journey, but you're doing great. Before you started this program, something inside of you convinced you to get your Google Data Analytics Certificate. You had an idea, did some research, and made the time to get started. Then you made the decision to commit to your goal. Now look where you are! That is something to be proud of.

Early on, we jumped right into the world of data analytics and saw how data played a part in your everyday life. You learned how to navigate spreadsheets and why structured thinking was key to solving problems.

You also explored the best ways to collect and store your data. From there, you gained an understanding of clean data and data integrity. You've identified how to ask the right questions and learned to clean data.

Now we'll take your skills to the next level. Next up, you'll learn how to come up with clear and objective answers to any data question you encounter. Earlier, we learned about the data analysis process. As a quick reminder, the phases of that process are Ask, Prepare, Process, Analyze, Share, and Act.

We'll explore the Analyze phase more here, focusing on how to organize and format the data you have so that you can do all sorts of calculations. Knowing how to analyze the data you've collected and cleaned is essential to your work as an analyst.

Before we get started, I'd like to introduce myself. My name is Ayanna, and I'm excited to be your instructor for this course. I'm a global insights manager at Google, and I've also taught at the Google Analytics Academy, which is a training resource for Google analysts. In my job, I help advertisers determine the value of investing in Google products. When you search for something online, you'll often see an ad on the page. That's an investment an advertiser has made.

I use data analysis to show advertisers the value they could gain from investing in those ads. That's what I love about being a data analyst: figuring out how to create value anytime I enter a situation. The best way to know if you're creating value is if you have evidence. For me, that evidence is data. Now that you know a little bit about my love for data, let's talk about what you'll learn here.

You'll start by covering best practices for organizing your data and the different ways you can sort through that data using spreadsheets and SQL. We'll also spend time learning three important ways to work with data that will boost your analytical skills. Then we'll talk about saving time. You'll discover tips and tricks that can help you analyze data more efficiently. Last but not least, we'll work together to identify techniques to help you be as fair and unbiased as possible. Well, that's all you need to know for now.

Coming up, we'll break down the basics of data analysis and bring you one step closer to a future in data.

Course syllabus

Course content Course 5 – Analyze Data to Answer Questions

  1. Organizing data to begin analysis. Organizing data makes the data easier to use in an analysis. In this part of the course, you will learn the importance of organizing your data with sorting and filtering. You will explore organizing data in both spreadsheets and with SQL queries and temporary tables.

  2. Formatting and adjusting your data. As you move closer to analyzing your data, you will want to have the data formatted and ready to go. In this part of the course, you will learn all about converting and formatting data, including how to use SQL queries to combine data. You will also discover the value of feedback and support from your colleagues and how it can lead to new insights that you can apply to your work.

  3. Aggregating data for analysis. During an analysis, you might need to combine data to gain insights and complete business objectives. In this part of the course, you will explore the functions, procedures, and syntax to combine, or aggregate data. You will learn how to combine data within multiple cells in spreadsheets, and within multiple database tables using SQL queries.

  4. Performing data calculations. Calculations are one of the more common tasks that data analysts perform during an analysis. In this part of the course, you will explore formulas, functions, and pivot tables in spreadsheets and SQL queries. All of these are used in data calculations. You will also learn about the benefits of using SQL to manage temporary database tables.

  5. Completing the Course Challenge. At the end of this course, you will be able to put everything you have learned into perspective with the Course Challenge. The Course Challenge will ask you questions about the main concepts and then give you an opportunity to apply what you have learned in three scenarios.

Learning Log: Process and organize your data

Process and organize your data

Reflection Based on what you know, write 3-5 sentences (60-100 words) to answer the following questions:

  • What’s one organizational system you already use in real life? How can you apply that logic to data organization?
  • What do you think is involved in the process of organizing data?
  • What do you want to better understand about organizing data?

When you’ve finished your entry in the learning log template, make sure to save the document so your response is somewhere accessible. This will help you continue applying data analysis to your everyday life. You will also be able to track your progress and growth as a data analyst.

Meet and greet

  • What do you think will be the easiest and most difficult parts of analyzing data?
  • What are you most excited to learn how to do?
  • How has what you’ve learned so far prepared you for learning about analyzing data?

Refresher: Your data analytics certificate roadmap

Data analysis basics

The analysis process

Welcome back. It's great to see you again. So let's talk about analysis. We've learned how to ask the right questions, prepare data for exploration, and then process that data to make sure it's squeaky clean.

Now it's time for the heart of the process: the actual analysis! Finally, right? But what is analysis? Basically, analysis is the process used to make sense of the data collected. It means taking the right steps to proceed and think about your data in different ways.

The goal of analysis is to identify trends and relationships within the data so that you can accurately answer the question you're asking.

To do this, you should stick to the 4 phases of analysis: organize data, format and adjust data, get input from others, and transform data by observing relationships between data points and making calculations.

Let's apply the 4 phases of analysis to a real-world scenario. Imagine you want to buy a gift for your friend Zara's wedding. The problem is you're not sure what to get her. Fortunately, you have a ton of data from her wedding website. But instead of reading all the data on her website and scrolling through a photo album of her and her partner, you go straight to the online registry, a wish list of gifts they'd enjoy.

The registry is like a dataset that you can analyze to make a decision. Now that you're checking out organized data in the registry, you want to make sure that the list of data, or gifts in this case, is formatted in a way that's easy to reference.

Formatting data streamlines things and saves you time. Scrolling through hundreds of gifts can be time-consuming. Instead, you can adjust the data in a way that makes it easy to digest by filtering and sorting your data. You have a budget you want to stick to, so you sort the gift prices from low to high. You then filter prices to include gifts that are within your budget of $60. You're working with a newly formatted list of data.

At this point, it's good to remember that input from other people can also be really helpful when analyzing information and making decisions. You can check the list of gifts to figure out if anyone else has already bought any of the items. You realize a few of the items in the list have been purchased, and this informs your decision. When analyzing data, gaining input from others is important because it gives you a viewpoint you might not understand or have access to. On top of gaining input from other people, it's also important to seek out others' perspectives early. That way, if they predict any obstacles or challenges, you'll know beforehand. The people you'll look to for input don't have to be experts to be helpful. Sometimes all you need is for someone who's familiar with a topic or data you're considering. In our example, that would be Zara's wedding guests who are purchasing gifts from the same online registry. They probably aren't wedding gift experts, but their collaborative effort to mark off the item they purchase can help you figure out what not to buy, which will prevent Zara from getting the same gift twice. In the end, getting input is valuable to your analysis.

This brings us to the last step of the analysis: transforming data. Transforming data means identifying relationships and patterns between the data, and making calculations based on the data you have. Going back to our example, you were able to find a gift that you knew Zara would like, and one that fits your budget. You were also able to choose a gift that wasn't already purchased by someone else. By finding the relationship between these data points, you chose, purchased, and sent a gift that would answer the problem you wanted to solve. The beauty of the analysis process is that you probably already analyze situations in your everyday life.

Whether you're analyzing data in your personal life or in your career, these four tasks can help you make better decisions. The more you do it, the more comfortable you'll feel with the process. I hope this gives you a better understanding of the basics of analysis. As we move forward, we'll check out how to locate data for analysis, both in a spreadsheet and using SQL. When you're ready, you can go ahead. See you soon!

Ayanna: Sticking with it

I think one of the coolest things about working with data at Google is that we have one of the world's, most valuable datasets. People refer to Google data as really a lens into human curiosity. We often look at Google as really a proxy for what's happening in the world. And so for many of our advertisers, they really, really value the data and the insights that we're able to give them from Google because they believe it's a proxy or a reflection of what's happening in their business or within their industry. And so I think the value of the data that we're able to work with at Google really keeps me interested and excited about the work that I do.

So I came to Google about three years ago after spending a few years in consulting. And so I was really interested in switching into a role that was really focused on sales and marketing. But at the same time, I still wanted to be able to leverage the analytical skill set that I had gained prior. This role was a great complement to the skillsets that I already had and the interest that I had in moving into the sales and marketing function.

I think one important thing for all students to realize is that no one learns this material overnight. Many of your colleagues you may look at as experts, but most likely they've been able to gain that level of expertise through their years within the field.

I think one of the biggest attributes that students should keep in mind is that the most important thing that they need to have throughout this learning journey is grit. Grit to understand that it may be a struggle, it may be a challenge, but if you put in the work, you put in the time, these concepts will eventually click, and you'll be well on your way to becoming a data analyst. Hi, my name is Ayanna and I'm a global insights manager here at Google.

Practice Quiz: Understanding data analysis

Organize data for analysis

Keeping data organized with sorting and filters

Hi again. Let's jump back in. Right now we're in the Analyze phase of the data analysis process. And even though each phase is unique, data analysts make decisions about organization throughout all of them.

That's what we're talking about here: organization. It's super important that you keep your data organized throughout your analysis. How your data is classified and structured will impact your findings, whether you're working in a spreadsheet or a database. And once you know how your data is organized, you'll be able to capture or collect the information you need.

Most of the data you'll use in your analysis will be organized in tables. Tables help you organize similar kinds of data into categories and subject areas that you can focus on as you analyze.

For example, this basic database has tables for car dealerships, product details, and repair parts. Each table then has several fields of data, like branch owner and the cost of repair parts. You can use these tables and fields to help you decide how to move forward with your analysis.

The structure of this database can help you decide which data you need to pull to meet your objectives. For example, the total number of a particular brand of car sold, or a repair part for a specific make and model of a car at a certain branch.

Tables allow you to make decisions about data types. They help you to figure out what variables you need and the data type those variables should have. So if you have a database where you need to convert a data type during your analysis, you can do that by using the CAST command in SQL or any other method that you learn on the job or from your own research.

Like this example where we converted a purchase price column to be a FLOAT instead of a STRING so that it was in a numerical form we could use for calculations.

If you're performing your analysis in a spreadsheet, you want to make sure that the columns and rows are effectively organized. You can even hide columns that you won't need for analysis or that show duplicate information.

Once you have the data organized and formatted, you'll be ready to sort and filter it to find the data you need. We'll cover sorting and filtering soon. But for now, just know that both filters and sorts are affected by the type of data we're working with. The bottom line is that it's important to have your data in the right format. So always be prepared to adjust, no matter how far into your analysis you are. That's all for now. Coming up, we'll show you what filters are all about. Bye! ​

Keeping data organized with sorting and filters

hashtags: sorting, filtering, outliers

Sorting in a pivot table

Optional: Upload the movie dataset to BigQuery

More on sorting and filtering

Hey, great to see you again. Earlier we talked about why you should organize your data no matter what part of the life cycle it's in. Just like any collection, it's easier to manage and care for a group of things when they're structured around them.

Now we should keep in mind that organization isn't just about making things look orderly, it's also about making it easier to search and locate the data you need in a quick and easy way. As a data analyst, you’ll find yourself rearranging and sifting through databases pretty often. Two of the most common ways of doing this are with sorting and filtering. We briefly discussed sorting and filtering before and it's important you know exactly what each one does.

Sorting is when you arrange data into a meaningful order to make it easier to understand, analyze and visualize. Sorting ranks your data based on a specific metric that you can choose. You can sort data and spreadsheets and databases that use SQL. We'll get to all the cool functions you can use in both a little later on. A common way to sort items when you're shopping on a website is from lowest to highest price, but you can also sort by alphabetical order like books in a library or you can sort from newest to oldest like the order of text messages in a phone, or nearest to furthest away like when you're searching for restaurants online.

Another way to organize information is with a filter. Filtering is showing only the data that meets a specific criteria while hiding the rest. Typically you can use filters when you want to narrow down the amount of data you want to sift through.

Say you’re searching for green sneakers online. To save time, you filter the green shoes only. Using a filter slims down larger data sets to smaller subsets that are relevant to what you need.

Sorting and filtering are two actions you probably perform a lot online, whether you're sorting movie showtimes from earliest to latest, or filtering your search results to just images, you're probably already familiar with how helpful they can be for making sense of data.

Now let's take that knowledge and apply it when it comes to sifting through large disorganized piles of data, filters are your friend. You might remember from a previous video that you can use filters and spreadsheet programs like excel and sheets to only display data from rows that match the range or condition you set.

You can also filter data into SQL using the WHERE Clause. The WHERE Clause works similarly to filtering in a spreadsheet because it returns rows based on a condition you name. Let's learn how you can use a WHERE Clause in a database. We'll use BigQuery to access the database and run our query. If you're joining us open up your tool of choice for using SQL and reference the earlier resource on how to access the data set.

To follow along with the instructor, you should have already uploaded the movie data to BigQuery as a CSV file. If you haven't, follow the instructions in Optional: Upload the movie dataset to BigQuery.

Otherwise watch as the WHERE Clause does its thing.

Here's the database. You might recognize it from past videos. Basically, it's a long list of movies. Each row includes an entry for the columns named movie title, released date, genre, director, cast members, budget and total revenue. It also includes a link to the film's Wikipedia page. If you scroll down the list, the list goes on for a long time. Of course, we won't need to go through everything to find the data we want, that's the beauty of a filter.

In this case, we'll use the WHERE Clause to filter the database and narrow down the list to movies in the comedy genre.

To start, we’ll use the SELECT command followed by an asterisk. In SQL, an asterisk selects all of the data.

On a new line, we'll type FROM and the name of the database movie_data.movies. To filter the movies by comedy.

We're going to type WHERE, then list the condition which is genre. Genre is a column in the data set and we only want to select rows where the cell in the genre column exactly matches comedy.

Next, we'll type the equal sign and write the specific genre we're filtering for, which is comedy. Since the data and the genre column is a string format we have to use single or double quotations when writing it, and keep in mind that capitalization matters here, so we have to make sure that the letter casing matches the column name exactly, and now we can click RUN to check out the results.

What we're left with is a shorter list of comedy movies, pretty cool right?

Here's something else that you should know. You can apply multiple filters to a database, you can even sort and filter data at the same time for even more precise results. As a data analyst, knowing how to sort and filter data will make you a superstar, that's all for now.

Coming up we'll get down to the nitty-gritty of sorting functions and spreadsheets. See you there.

Practice Quiz: organizing data

Sort data in spreadsheets

Sort data in spreadsheets

Hey there, data pro! Happy to see you back and ready to explore more of the organizational side of data analysis. In this video, we'll learn how to sort data in spreadsheets. We've done some sorting in spreadsheets earlier in the program.

Now it's time to build on what we've covered and introduce some more advanced sorting techniques. Sorting is amazing. Not only does it add order and meaning to your spreadsheets, it also gives you the power to reimagine data altogether. When you sort data based on a specific metric, you can uncover new patterns and relationships within datasets you might not have otherwise noticed. This is especially true for spreadsheets, which you'll use a lot in your work as a data analyst.

Knowing how to sort data in spreadsheets can make you a stronger and more confident analyst. In many ways, sorting relies on your creativity to reimagine the information you have in front of you. In spreadsheets, you can sort data by ascending or descending order using numbers or letters. If cells are labeled with color, you can sort them by color, too.

When sorting data in a spreadsheet, you can choose to "Sort sheet" or "Sort range."

If "Sort sheet" is applied, all of the data in a spreadsheet is sorted by the conditions of a single column, but the related information across each row stays together.

On the other hand, "Sort range" doesn't keep the information across rows together. When you sort a range, you're selecting a specific collection of cells or the range that you want the sorting limited to. Nothing else on the spreadsheet gets rearranged but the specified cells.

There are two methods for sorting spreadsheet data: one involves using the menu; the other involves writing out the sort function. For now, we'll focus on sorting with the menu. We'll get to writing out functions later on. Now, depending on the program you use, the process might seem slightly different, but the instructions and concepts we discuss will be basically the same.

Back to sorting with the data menu. To give you an idea of how to do it, we'll use the movie spreadsheet.

Would you like to follow along with the instructor using a similar spreadsheet? If so, you may click the link below movie spreadsheet

Let's check it out. In this example, we'll sort movies by release date. We'll head to column B, which is listed as "Release Date." Click on the "B" button to highlight all the cells in the column. From there, we'll head to the Data tab in the menu. Now you have two choices: sort a sheet or a range of data. You'll notice that we've selected just the release dates, but these release dates are specifically related to the movies in their row.

In this case, you want the release date and the movie title to stay in the same row as you sort because they're related. To do this, you'll want to "Sort sheet."

This will keep all the data together by row, no matter how you sort it. Depending on the order you want the release dates to be in, you can sort from A to Z, which will also rank the dates numerically. Or you can sort from Z to A, which will sort data the opposite way.

Since we want the release dates to be in order, we'll click "Sort sheet by column B" from A to Z. And there you go. You just sorted a sheet of data using the menu. Now the movies are arranged in chronological order based on release date.

Let's say you want to sort data in a specific column, but don't need the cells in that column tied to a specific row of information. Instead, you want to isolate the column's data and sort it on its own without affecting how the rest of the sheet's arranged.

For fun, we'll use the Movie Title column in this example. First, we'll select the column we want to sort: column A. Clicking on column A highlights all the cells in the column which contain the movie titles.

Then we'll go to the menu and click Data because we're isolating the column from the rest of the sheet. When we sort this time, we'll click "Sort range by column A." For this example, we'll sort the movie titles alphabetically from A to Z. (not by range but sheet / error to be fixed) And that's it! You'll notice that "Sort range" doesn't keep the rows together, so the data are a bit jumbled.

You'll probably end up using "Sort sheet" more often, but it's important to understand them both so you don't accidentally get them mixed up. You've just sorted data in a spreadsheet using the menu, and you've learned how to sort data by an entire sheet or by a range of cells. That's something you'll be able to take with you wherever you go as a data analyst.

Coming up, we'll learn about the second way to sort in a spreadsheet: by writing out a function. We'll also take sorting to the next level by custom sorting your data. See you there!

The SORT function

Happy to have you back. Earlier in the program, we covered some basics of sorting in spreadsheets. We learned the differences between sorting a range and an entire sheet, and how to sort a spreadsheet using the menu. Now that we've laid the groundwork, it's time to move on to more advanced ways to sort information. We've talked about how there's two methods of sorting data in spreadsheets. The first method uses the Data tab in the menu of your spreadsheet program. The second way to store information in a spreadsheet is by writing a SORT function. In spreadsheets, functions are preset commands that perform a specific process. So in this case, the SORT function, as you might be able to guess, sorts your data.

Let's check out this spreadsheet of party plans to witness the SORT function in action.

If you would like to follow along with the instructor in this video, you can use the same spreadsheet in Google Sheets. To use the spreadsheet template, click the link below Party Plan Spreadsheet

The first arranged set of data is our original dataset of guests and some information about them. So let's say you want to sort the party guests by table to get an idea of who will be sitting where.

To do that, start by typing a function in an empty cell. Just like any function, you do this by typing the equal sign, and then write SORT after it.

After your first open parenthesis, reference the first cell in which data is collected from. In this case, that's A2. Then you'll include a colon and write the last cell you want included in the function, which is D6. A2 colon D6 is the range for this function.

Next, write a comma to separate the range from what we're sorting by, which is column B. You should keep in mind that this part of the function doesn't recognize column letters. So in this case, we use the corresponding number instead, which is 2, since column B is the second column in our range. Now add another comma.

In this next part you'll need to decide whether you want the data in this column to be in ascending or descending order. A TRUE statement is in ascending order, and FALSE is descending. Because we want the tables to be listed starting from table number one, we'll write TRUE for ascending, and then end the function with a closed parenthesis. Now, let's see our function play out.

Our party guests are now sorted by which table they're seated. Once you have an idea of the data you want to be sorted and how, applying functions to your data is simple. Now, you have two different tools in your tool belt for sorting data.

After you've tackled writing SORT functions, you'll want to customize sort orders, too. A customized sort order is when you sort data in a spreadsheet using multiple conditions. This means that sorting will be based on the order of the conditions you select.

Let's go back to our party spreadsheet. Imagine you want the guests to be sorted by whether or not they've been sent an invitation. And based on that, we want those guest names to be listed alphabetically. You can do that easily with the "Sort range" option under Data.

First, highlight all the data in the set from cells A1 to D6.

Then under the Data tab in the menu, click "Sort range."

The menu option has slightly changed. Click Data, then Sort range, then Advanced range sorting options.

In this case, check "Data has a header row," which makes sure that the title of the column isn't mixed into the sorting. Then, we'll make sure it's being sorted by "Sent invitation." Here, we want the "No" responses first and the "Yes" responses second, so we'll make sure A to Z is clicked to sort the responses in that order. Because we want to add an additional sorting condition, we'll now click on "Add another sort column." The guest names should be in alphabetical order. So let's select "Guest Names" and sort from A to Z.

Then we'll click Sort. And voilà! You've officially applied a custom sort order like a champ.

Okay, so you've tackled sorting in spreadsheets by sheet, by range, through the menu, and by using a function. On top of all that, you've added to your organizational skills by learning how to create custom-sort orders. Pretty soon you'll learn another powerful tool: how to sort data using SQL. Even though databases can sometimes be a lot to digest, learning these skills gives you the power to rearrange data in a way that makes sense to you. Once you've sorted data in a way that really clicks, you'll understand why it's so valuable to you as a data analyst. Bye for now!

  • The sort from a spreadsheet's Data tab overwrites the cells containing the unsorted data with the sorted data, while a written SORT function inserts the sorted data in a different cell range.
  • The sort from a spreadsheet's Data tab can exclude a header row in the data range from being sorted, while the data range for a written SORT function should never contain a header row.

Sorting and filtering in Sheets and Excel

Sorting and filtering in Sheets Sorting in Google Sheets helps you quickly spot trends in numbers. One trend might be gross revenue by sales region. In this case, you could sort the gross revenue column in descending (Z to A) order to spot the top performing regions at the top, or sort the gross revenue column in ascending (A-Z) order to spot the lowest performing regions at the top. Although an alphabetical order is implied, these sorting options do sort numbers, as our gross revenue example highlighted.

If you want to learn more about the set menu options for sorting and filtering, start with these resources:

In addition to the standard menu options, there is a SORT function for more advanced sorting. Use this function to create a custom sort. You can sort the rows of a given range of data by the values in one or more columns. And you get to set the sort criteria per column. Refer to the SORT function page for the syntax.

And like the SORT function, you can use the FILTER function to filter by any matching criteria you like. This creates a custom filter.

You might recall that you can filter data and then sort the filtered results. Using the FILTER and SORT functions together in a range of cells can programmatically and automatically achieve these results for you.

Sorting and filtering in Excel You can also sort in ascending (A-Z) and descending (Z-A) order in Microsoft Excel. Excel offers Smallest to Largest and Largest to Smallest sorting when you are working with numbers.

Similar to the SORT function in Google Sheets, Excel includes custom sort capabilities that are available from the menu. After you select the data range, click the Sort & Filter button to select the criteria for sorting. You can even sort by the data in rows instead of by the data in columns if you select Sort left to right under Options. (Sort top to bottom is the default setting to sort the data in columns.)

If you want to learn more about sorting and filtering in Excel, start with these resources:

Excel also has SORT, SORTBY, and FILTER functions. Explore how you can use these functions to automatically sort and filter your data in spreadsheets without having to select any menu options at all.

Emma: Journey to a meaningful career

I'm Emma, and I'm a product analyst on Google Health. The product that I am helping analyze data for is a tool for clinicians, and what this is going to allow clinicians to do is find their patients' health data just as easily as you can find data on Google Search.

What I specifically focus on is standardizing health care data, analyzing it to find anomalies or data quality issues, and talking to product managers about which features we should launch and why. I've worked on problems from solving locomotive failures in predicting them before they happen, to forecasting sales for handbags before the handbag is even launched, to now working with health care data to try to allow clinicians to easily have their patients' data at their fingertips.

I was really interested in working in data analytics, but I was trying to figure out what type of data I wanted to work with or what field within data analytics I wanted to focus on because there are so many. I was ultimately drawn towards working with health care data. I just really fell in love with all of the problems that we have today in health care and how just this vast amount of data that's available in the health care industry could be better utilized to help patients, to help clinicians, to improve population health.

Working with meaningful datasets is what really makes me excited to come to work every day and excited to solve these problems. What I've found in my career is that following the datasets that interested me, the types of problems that interested me, always yielded better results because I was just that much more driven to go to work every day to do my best to solve these interesting problems because it's just what interested me. The amazing thing is there's data everywhere, there are data problems at every company in every field, and you really get to just follow what you're passionate about.

Practice Quiz: sorting in spreadsheets

  • Sort Sheet is used to sort all data in a spreadsheet by the ranking of a specific sorted column.

  • Sorting a range and sorting a sheet can both be done from the menu and written as a function. Analysts can work from the Data tab in the menu or type a function directly into an empty cell.

  • To sort a spreadsheet in descending order using the SORT function, the analyst types FALSE at the end of their sort function.

Sort data using SQL

Sorting queries in SQL

Hello, there. If you're hoping to learn about sorting and SQL this time, you've definitely come to the right place. So far, we've sorted spreadsheets to the menu and with a written function, which brings us to the next part of our learning, more sort functions, but this time in SQL.

Data analysts love playing with the way data is presented. Sorting is a useful way to rearrange data because it can help you understand the data you have in a different light. As you've probably already noticed a lot of things you can do in spreadsheets can also be done in SQL. Sorting is one of those things. We've talked about using SQL with large data sets before. When a spreadsheet has too much data you can get error messages or it can cause your program to crash, that's definitely something we want to avoid. SQL shorten processes that would otherwise take a very long time or be impossible to complete in a spreadsheet.

Personally I use SQL to pull and combine different data tables, it's much quicker than a spreadsheet and that usually comes in handy. Here's something pretty helpful you can do with SQL.

You can use the order by clause to sort results return in a query.

Let's go back to our movie spreadsheet to get a better idea of how this works. Feel free to follow along in a SQL tool of your choice as we go.

As a quick refresher, we have a database of movies listed with data like release date, director and more. We can sort this table and lots of different ways using the order by function.

For this example, let's sort by release date.

First we have the select function and an asterisk. Keep in mind that the asterisk means all columns are selected, then we have FROM, and the name of the database and table were in right now.

If you are following along in BigQuery you may notice there is some information missing before the 'movie_data.movies' query line in the video. You can find the full Table ID in the top line of the DETAILS tab. In the case of this video example, the project name is oval-flow-286322. Anytime a Table ID is being typed in the FROM section of a SQL query, the dataset and data table file path name will always be preceded by the project name. Here is a template of the file path structure in this lecture example: personal project name.customer_data.customer_purchase Keep in mind that the created project name, unique to each learner, will be inserted before the dataset name. If you don't insert your personal project name, it will most likely cause an error to occur when the query is run.

Now let's check out the next line. It's empty, but that's where we write our ORDER BY function. The ORDER BY command is usually the last clause in your query. So back to the actual sorting, we'll type ORDER BY with the space. With this clause, you can choose to order data by fields in a certain column.

Because we want to sort by release date, we'll type Release_Date. By default, the ORDER BY clause sorts data in ascending order. If you run the query as it is right now, the movies will be sorted from oldest to the most recent release dates.

Let's run the query and see what we've got.

You can also sort the release date and the reverse order from the most recent dates to the oldest. To do this just specify descending order in the order by command written as DESC, d e s c. Let's run this query. As you'll notice the most recently released films, are now at the top of the database.

In a spreadsheet, you can combine sorts and filters to display information differently. You can do something similar in SQL too. You might remember that while sorting puts data in a specific order, filters narrow down data, so you only see data that fits in the filter.

For example, let's say we want to filter movies by genre, so that we're only working with comedies, but we still won't release dates to be sorted in descending order, from most recent to oldest films. We can do this with the WHERE clause. Let's try that now.

First we'll check that the order by clause is always the last line. That makes sure that all the results of the query you're running are sorted by that clause.

Then we'll add a new line for the WHERE clause after from and before order by. Here's what we've got so far. From there we want to type the column or filtering for. In this case, we want to filter the database for comedies, so after the WHERE clause, we’ll type the column list name as Genre. Now we'll add an equal sign after genre because we only want to include genres that match what we're filtering for. In this case, we're filtering for comedy, so we'll type comedy between two apostrophes.

Now if you check out the entire query as a whole, you'll notice that we're selecting all columns, and we know it's all columns because that's what an asterisk means. The FROM clause specifies the name of the movie database for using, and the WHERE clause filters the data to include entries whose genre is specified as comedy, and then in the last line we have the ORDER BY clause, which will sort the data we've chosen to filter by release dates in descending order. This means when we run the query it will only have comedy movies listed from newest releases to oldest releases.

So let's run it and figure out if that's the case. Cool, check out all those comedy movies and the way those dates are sorted.

Now let's take these queries a step further. We’ll filter for two conditions at once using the AND filter. Working off the query we've been using, we'll add a second condition in the WHERE clause. We'll keep the Sorting the same. Let's say you wanted to filter by comedy movies and movies that earned over 300 million in the box office.

In this case, after the AND function, you add the revenue condition by typing Revenue. From there, you'll specify that you only want to return films with revenues over 300 million dollars. To do that, type the greater than sign and then the complete number of 300 million without commas.

Now let's run the query.

Here, the data only shows comedy movies with revenues of over 300 million dollars and it is sorted in descending order by release date. It looks really good. You just filtered and sorted a database, like it's your job, and with practice, one day, it can be.

And just like that you finish another step and your data analyst Journey. By now, you really dug and learned about the analysis process, with a special emphasis on how organization can change how you go through your data.

You also learned about both spreadsheets and SQL and how to sort and filter data in both types of programs. To help you get more comfortable using spreadsheet and SQL features, you'll be getting some materials you can use as a resource.

Coming up, we'll check out how an organizational mindset can take your analytical skills even further. well also cover converting, formatting, and adjusting data to combine information in a way that makes sense. Learning those skills early on can make your work as a data analyst much more efficient and effective in the long run. See you soon

Optional Refresher: Using BigQuery

As a quick refresher, BigQuery is a data warehouse on Google Cloud that data analysts can use to query, filter large datasets, aggregate results, and perform complex operations.

In the next activity, you will use BigQuery to sort data using SQL queries. In case you jumped around in a few courses, set up a BigQuery account by following the instructions in Using BigQuery from the Prepare Data for Exploration course. If you already have a BigQuery account, you may mark this reading as complete and begin the next activity.

Hands-On Activity: SQL sorting queries

Hands-On Activity: Analyze weather data in BigQuery

Practice Quiz: sorting in SQL

Module 1 challenge


Course 5 Module 1 Glossary