4.2.2.Cleaning data in spreadsheets - sj50179/Google-Data-Analytics-Professional-Certificate GitHub Wiki

Workflow automation

In this reading, you will learn about workflow automation and how it can help you work faster and more efficiently. Basically, workflow automation is the process of automating parts of your work. That could mean creating an event trigger that sends a notification when a system is updated. Or it could mean automating parts of the data cleaning process. As you can probably imagine, automating different parts of your work can save you tons of time, increase productivity, and give you more bandwidth to focus on other important aspects of the job.

What can be automated?

Automation sounds amazing, doesn’t it? But as convenient as it is, there are still some parts of the job that can’t be automated. Let's take a look at some things we can automate and some things that we can’t.

Task Can it be automated? Why?
Communicating with your team and stakeholders No Communication is key to understanding the needs of your team and stakeholders as you complete the tasks you are working on. There is no replacement for person-to-person communications.
Presenting your findings No Presenting your data is a big part of your job as a data analyst. Making data accessible and understandable to stakeholders and creating data visualizations can’t be automated for the same reasons that communications can’t be automated.
Preparing and cleaning data Partially Some tasks in data preparation and cleaning can be automated by setting up specific processes, like using a programming script to automatically detect missing values.
Data exploration Partially Sometimes the best way to understand data is to see it. Luckily, there are plenty of tools available that can help automate the process of visualizing data. These tools can speed up the process of visualizing and understanding the data, but the exploration itself still needs to be done by a data analyst.
Modeling the data Yes Data modeling is a difficult process that involves lots of different factors; luckily there are tools that can completely automate the different stages.

More about automating data cleaning

One of the most important ways you can streamline your data cleaning is to clean data where it lives. This will benefit your whole team, and it also means you don’t have to repeat the process over and over. For example, you could create a programming script that counted the number of words in each spreadsheet file stored in a specific folder. Using tools that can be used where your data is stored means that you don’t have to repeat your cleaning steps, saving you and your team time and energy.

More resources

There are a lot of tools out there that can help automate your processes, and those tools are improving all the time. Here are a few articles or blogs you can check out if you want to learn more about workflow automation and the different tools out there for you to use:

As a data analyst, automation can save you a lot of time and energy, and free you up to focus more on other parts of your project. The more analysis you do, the more ways you will find to make your processes simpler and more streamlined.

VLOOKUP

A function that searches for a certain value in a column to return a corresponding piece of information

=VLOOKUP(data to look up, 'where to look'!Range, column, false)

Hands-On Activity: Clean data with spreadsheet functions

Test your knowledge on cleaning data in spreadsheets

TOTAL POINTS 3

Question 1

Describe the relationship between a text string and a substring.

  • A text string is a column of data within a table. A substring is one cell within that column.
  • A text string is the list of attributes at the top of columns within a table. A substring is a single attribute within that list.
  • A text string is a group of characters within a cell. A substring is a smaller subset of that text string.
  • A text string is a row of data within a table. A substring is one cell within that row.

Correct. A text string is a group of characters within a cell. A substring is a smaller subset of that text string.

Question 2

A data analyst uses the COUNTIF function to count the number of times a value less than 5 occurs between spreadsheet cells A2 through A100. What is the correct syntax?

  • =COUNTIF(A2:A100,"<5")
  • =COUNTIF(A2:A100,>5)
  • =COUNTIF(A2:A100,<5)
  • =COUNTIF(A2:A100,">5")

Correct. The correct syntax is ****=COUNTIF(A2:A100,"<5"). COUNTIF will return the number of cells that match a value. A2:A100 is the range. And “<5” is the specified value.

Question 3

Fill in the blank: To remove leading, trailing, and repeated spaces in data, analysts use the ____ function.

  • TRIM
  • RIGHT
  • MID
  • LEFT

Correct. TRIM is a function that removes leading, trailing, and repeated spaces in data.