1.3.3.The data analysis toolbox - quanganh2001/Google-Data-Analytics-Professional-Certificate-Coursera GitHub Wiki
Key data analyst tools
As you are learning, the most common programs and solutions used by data analysts include spreadsheets, query languages, and visualization tools. In this reading, you will learn more about each one. You will cover when to use them, and why they are so important in data analytics.
Spreadsheets
Data analysts rely on spreadsheets to collect and organize data. Two popular spreadsheet applications you will probably use a lot in your future role as a data analyst are Microsoft Excel and Google Sheets.
Spreadsheets structure data in a meaningful way by letting you
- Collect, store, organize, and sort information
- Identify patterns and piece the data together in a way that works for each specific data project
- Create excellent data visualizations, like graphs and charts.
Databases and query languages
A database is a collection of structured data stored in a computer system. Some popular Structured Query Language (SQL) programs include MySQL, Microsoft SQL Server, and BigQuery.
Query languages
- Allow analysts to isolate specific information from a database(s)
- Make it easier for you to learn and understand the requests made to databases
- Allow analysts to select, create, add, or download data from a database for analysis
Visualization tools
Data analysts use a number of visualization tools, like graphs, maps, tables, charts, and more. Two popular visualization tools are Tableau and Looker.
These tools
- Turn complex numbers into a story that people can understand
- Help stakeholders come up with conclusions that lead to informed decisions and effective business strategies
- Have multiple features - Tableau's simple drag-and-drop feature lets users create interactive graphs in dashboards and worksheets - Looker communicates directly with a database, allowing you to connect your data right to the visual tool you choose
A career as a data analyst also involves using programming languages, like R and Python, which are used a lot for statistical analysis, visualization, and other data analysis.
Key takeaway
You have a lot of tools as a data analyst. This is a first glance at the possibilities, and you will explore many of these tools in-depth throughout this program.
Choosing the right tool for the job
As a data analyst, you will usually have to decide which program or solution is right for the particular project you are working on. In this reading, you will learn more about how to choose which tool you need and when.
Depending on which phase of the data analysis process you’re in, you will need to use different tools. For example, if you are focusing on creating complex and eye-catching visualizations, then the visualization tools we discussed earlier are the best choice. But if you are focusing on organizing, cleaning, and analyzing data, then you will probably be choosing between spreadsheets and databases using queries. Spreadsheets and databases both offer ways to store, manage, and use data. The basic content for both tools are sets of values. Yet, there are some key differences, too:
Spreadsheets | Databases |
---|---|
Software applications | Data stores - accessed using a query language (e.g. SQL) |
Structure data in a row and column format | Structure data using rules and relationships |
Organize information in cells | Organize information in complex collections |
Provide access to a limited amount of data | Provide access to huge amounts of data |
Manual data entry | Strict and consistent data entry |
Generally one user at a time | Multiple users |
Controlled by the user | Controlled by a database management system |
You don’t have to choose one or the other because each serves its own purpose. Generally, data analysts work with a combination of the two, as both tools are very useful in data analytics. For example, you can store data in a database, then export it to a spreadsheet for analysis. Or, if you are collecting information in a spreadsheet, and it becomes too much for that particular platform, you can import it into a database. And, later in this course, you will learn about programming languages like R that give you even greater control of your data, its analysis, and the visualizations you create.
As you continue learning about these important tools, you will gain the knowledge to choose the right tool for any data job.
Practice Quiz: Self-Reflection: Reviewing past concepts
Overview
Now that you have been introduced to working with data, you can pause for a moment and think about what you are learning. In this self-reflection, you will consider your thoughts about the data analysis process and data life cycle, then respond to brief questions.
This self-reflection will help you develop insights into your own learning and prepare you to apply your knowledge of the phases of data analysis to your data analysis toolbox. As you answer questions—and come up with questions of your own—you will consider concepts, practices, and principles to help refine your understanding and reinforce your learning. You’ve done the hard work, so make sure to get the most out of it: This reflection will help your knowledge stick!
Review the phases of data
So far you’ve learned about the data analysis process and the data life cycle. They include the following steps:
Data Analysis Process:
- Ask
- Prepare
- Process
- Analyze
- Share
- Act
Data Life Cycle:
- Plan
- Capture
- Manage
- Analyze
- Archive
- Destroy
For a refresher on the phases of data, you can review the reading on the data analysis process and the video on the data life cycle.
Reflection
Consider what you reviewed about the phases of data:
- What is the relationship between the data life cycle and the data analysis process? How are the two processes similar? How are they different?
- What is the relationship between the Ask phase of the data analysis process and the Plan phase of the data life cycle? How are they similar? How are they different?
Reflect on your learning and think about how you can apply the phases of data to future projects.
Now, write 2-3 sentences (40-60 words) in response to each of these questions. Type your response in the text box below.
Practice Quiz: Test your knowledge on the data analysis toolbox
Question 1
Based on what you have learned in this course, spreadsheets are digital worksheets that enable data analysts to do which of the following tasks? Select all that apply.
- Choose a topic for data analysis
- Organize data in columns and rows
Explain: Spreadsheets enable data analysts to store, organize, sort, and filter data. This helps them see patterns, group information, and easily find the information they need.
- Sort and filter data
Explain: Spreadsheets enable data analysts to store, organize, sort, and filter data. This helps them see patterns, group information, and easily find the information they need.
- Store data
Explain: Spreadsheets enable data analysts to store, organize, sort, and filter data. This helps them see patterns, group information, and easily find the information they need.
Question 2
Fill in the blank: A set of instructions that performs a specific calculation using spreadsheet data is called _____.
A. a report
B. an operation
C. a formula
D. a program
The correct answer is C. a formula. Explain: A set of instructions that performs a specific calculation using spreadsheet data is called a formula.
Question 3
A database is a collection of data stored in a computer system. True or False?
A. True
B. False
It is true because a database is a collection of data stored in a computer system.
Question 4
In data analytics, SQL is an acronym meaning _____ query language.
A. structured
B. syntax
C. software
D. statistical
The correct answer is A. structured. Explain: SQL stands for structured query language. It enables data analysts to communicate with a database.
Question 5
What is the term for the graphical representation of data?
A. Data summary
B. Data collection
C. Data language
D. Data visualization
The correct answer is D. Data visualization. Explain: Data visualization is the graphical representation of data.