231009,11 - Forestreee/Data-Analytics GitHub Wiki
Data has its own life cycle, and the work of data analysts often intersects with that cycle. In this part of the course, you’ll learn how the data life cycle and data analysts' work both relate to your progress through this program. You’ll also be introduced to applications used in the data analysis process.
- Identify key software applications critical to the work of a data analyst including spreadsheets, databases, query languages, and visualization tools
- Identify relationships between the data analysis process and the courses in the Google Data Analytics Certificate
- Explain the data analysis process, making specific reference to the ask, prepare, process, analyze, share, and act phases
- Discuss the use of data in everyday life decisions
- Discuss the role of spreadsheets, query languages, and data visualization tools in data analytics
- Discuss the phases of the data life cycle
- Ask: Define the problem and confirm stakeholder expectations
- Prepare: Collect and store data for analysis
- Process: Clean and transform data to ensure integrity
- Analyze: Use data analysis tools to draw conclusions
- Share: Interpret and communicate results to others to make data-driven decisions
- Act: Put your insights to work in order to solve the original problem
The data life cycle starts with the right data analysis tool. These include spreadsheets, databases, query languages, and visualization software.
My aha spreadsheet moment came when I started researching shortcuts that I could use to work with spreadsheets more efficiently. This would streamline the process of moving those reports to the new system.
The life cycle of data is to plan, capture, manage, analyze, archive, and destroy.
- Planning
This actually happens well before starting an analysis project. During planning, a business decides what kind of data it needs, how it will be managed throughout its life cycle, who will be responsible for it, and the optimal outcomes.
For example, let's say an electricity provider wanted to gain insights into how to save people energy. In the planning phase, they might decide to capture information on how much electricity their customers use each year, what types of buildings are being powered, and what types of devices are being powered inside of them. The electricity company would also decide which team members will be responsible for collecting, storing, and sharing that data. All of this happens during planning, and it helps set up the rest of the project.
- Capture
This is where data is collected from a variety of different sources and brought into the organization. (The capture phase involves gathering data from various sources and bringing it into the organization.) With so much data being created every day, the ways to collect it are truly endless. One common method is getting data from outside resources.
For example, if you were doing data analysis on weather patterns, you'd probably get data from a publicly available dataset like the National Climatic Data Center. Another way to get data is from a company's own documents and files, which are usually stored inside a database. While we've mentioned databases before, we haven't gone into too much detail about what they are. A database is a collection of data stored in a computer system. In the case of our electricity provider, the business would probably measure data usage among its customers within a database that it owns. As a quick note, when you maintain a database of customer information, ensuring data integrity, credibility, and privacy are all important concerns.
- Manage
Here we're talking about how we care for our data, how and where it's stored, the tools used to keep it safe and secure, and the actions taken to make sure that it's maintained properly. This phase is very important to data cleansing, which we'll cover later on.
- Analyze
This is where data analysts really shine. In this phase, the data is used to solve problems, make great decisions, and support business goals.
For example, one of our electricity company's goals might be to find ways to help customers save energy.
- Archive
Archiving means storing data in a place where it's still available, but may not be used again.
During analysis, analysts handle huge amounts of data. Can you imagine if we had to sort through all of the available data that's out there, even if it was no longer useful and relevant to our work? It makes way more sense to archive it than to keep it around.
- Destroy
Yes, it sounds sad, but when you destroy data, it won't hurt a bit.
So let's get back to our electricity provider example. They would have data stored on multiple hard drives. To destroy it, the company would use secure data erasure software. If there were any paper files, they would be shredded too. This is important for protecting a company's private information, as well as private data about its customers.
recap:
- Plan: Decide what kind of data is needed, how it will be managed, and who will be responsible for it.
- Capture: Collect or bring in data from a variety of different sources.
- Manage: Care for and maintain the data. This includes determining how and where it is stored and the tools used to do so.
- Analyze: Use the data to solve problems, make decisions, and support business goals.
- Archive: Keep relevant data stored for long-term and future reference.
- Destroy: Remove data from storage and delete any shared copies of the data.
Warning: Be careful not to mix up or confuse the six stages of the data life cycle (Plan, Capture, Manage, Analyze, Archive, and Destroy) with the six phases of the data analysis life cycle (Ask, Prepare, Process, Analyze, Share, and Act). They shouldn't be used or referred to interchangeably.
The data life cycle provides a generic or common framework for how data is managed. You may recall that variations of the data analysis life cycle were described in Origins of the data analysis process. The same can be done for the data life cycle.
The rest of this reading provides a glimpse of how government, finance, and education institutions can view data life cycles a little differently.
The U.S. Fish and Wildlife Service uses the following data life cycle:
- Plan
- Acquire
- Maintain
- Access
- Evaluate
- Archive For more information, refer to U.S. Fish and Wildlife's Data Management Life Cycle page.
Historical data is important to both the U.S. Fish and Wildlife Service and the USGS, so their data life cycle focuses on archiving and backing up data.
The USGS uses the data life cycle below:
- Plan
- Acquire
- Process
- Analyze
- Preserve
- Publish/Share
Several cross-cutting or overarching activities are also performed during each stage of their life cycle:
- Describe (metadata and documentation)
- Manage Quality
- Backup and Secure
For more information, refer to the USGS Data Lifecycle page.
Financial institutions may take a slightly different approach to the data life cycle as described in The Data Life Cycle , an article in Strategic Finance magazine:
- Capture
- Qualify
- Transform
- Utilize
- Report
- Archive
- Purge
In contrast, the data life cycle for finance clearly identifies archive and purge stages.
One final data life cycle informed by Harvard University research has eight stages:
- Generation
- Collection
- Processing
- Storage
- Management
- Analysis
- Visualization
- Interpretation For more information, refer to 8 Steps in the Data Life Cycle.
Harvard's interests are in research and teaching, so its data life cycle includes visualization and interpretation even though these are more often associated with a data analysis life cycle. The HBS data life cycle also doesn't call out a stage for purging or destroying data.
To sum it up, although data life cycles vary, one data management principle is universal. Govern how data is handled so that it is accurate, secure, and available to meet your organization's needs.
The scenario: interview for a data analyst position
Imagine that you interview for a data analyst role at a local ice cream company. The hiring manager explains that the company needs a data analyst because they want to learn more about their customers. First, they want to understand their customers' ice cream flavor preferences. Then, they will use this customer data to help make important decisions.
The hiring manager explains that they do not collect any customer data, and they don't know where to begin. The hiring manager asks you: Can you please explain how you would approach this task?
Recap: The data life cylce
Reflection
The steps of the data life cycle are:
Consider the steps of the data life cycle and reflect on the hiring manager's request. Review the following questions to help guide your thinking:
- Plan: What plans and decisions do you need to make? What data do you need to answer your question?
- What kind of data should they gather?
- Capture: Where does you data come from? How will you get it?
- How should they gather this data?
- Manage: How will you store your data? What should it be used for? How do you keep this data secure and protected?
- Where will the data live? How will they store the data?
- Analyze: How will the company analyze the data? What tools should they use?
- Once they have the data, how will they use it?
- Archive: What should they do with their data when it gets old? How do they know when it's time?
- How do they keep their data secure and protected?
- Destroy: Should they ever dispose of any data? If so, when and how?
- What should they do with old data? What are their options?
My reflection:
- Plan: How will you execute the survey of customer preferences?
- Capture: Select customer flavor preferences top 1-10
- Manage: Make spreadsheets or databases to store them up
- Analyze: Compare with top 1-10 general flavor preferences to your store's ice cream flavor
- Archive: Select the top 5 at least to choose interchangeable flavor in preferences to your store
- Destroy: Destroy uninterchangeable flavors in preferences
Great work reinforcing your learning with meaningful self-reflection! A good reflection on this topic would address that—to use their data successfully—the ice cream company must:
1. Figure out what data they need and where they can get it. 2. Collect the data, and be sure of what they will (and won’t) use it for. 3. Consider how to secure the data and deal with old data that is no longer useful.
As a data analyst, these are the types of questions you should always seek to answer about your data. From considering data collection before a project begins (in the plan phase) to removing data with data erasure software (in the destroy phase), data analysts must apply these concepts to effectively approach each data project.
Data analysis isn't a life cycle. It's the process of analyzing data.
- Ask
In this phase, we do two things. We define the problem to be solved and we make sure that we fully understand stakeholder expectations. Stakeholders hold a stake in the project. They are people who have invested time and resources into a project and are interested in the outcome. Let's break that down.
First, defining a problem means you look at the current state and identify how it's different from the ideal state. Usually, there's an obstacle we need to get rid of or something wrong that needs to be fixed.
For instance, a sports arena might want to reduce the time fans spend waiting in the ticket line. The obstacle is figuring out how to get the customers to their seats more quickly.
Another important part of the ask phase is understanding stakeholder expectations. The first step here is to determine who the stakeholders are. That may include your manager, an executive sponsor, or your sales partners. There can be lots of stakeholders. But what they all have in common is that they help make decisions, influence actions and strategies, and have specific goals they want to meet. They also care about the project and that's why it's so important to understand their expectations.
For instance, if your manager assigns you a data analysis project related to business risk, it would be smart to confirm whether they want to include all types of risks that could affect the company, or just risks related to weather such as hurricanes and tornadoes.
Communicating with your stakeholders is key in making sure you stay engaged and on track throughout the project. So as a data analyst, developing strong communication strategies is very important. This part of the ask phase helps you keep focused on the problem itself, not just its symptoms. As you learned earlier, the five whys are extremely helpful here.
- Prepare
This is where data analysts collect and store data they'll use for the upcoming analysis process. You'll learn more about the different types of data and how to identify which kinds of data are most useful for solving a particular problem. You'll also discover why it's so important that your data and results are objective and unbiased. In other words, any decisions made from your analysis should always be based on facts and be fair and impartial.
- Process
Here, data analysts find and eliminate any errors and inaccuracies that can get in the way of results. This usually means cleaning data, transforming it into a more useful format, combining two or more datasets to make information more complete and removing outliers, which are any data points that could skew the information. After that, you'll learn how to check the data you prepare to make sure it's complete and correct. This phase is all about getting the details right. So you'll also fix typos, inconsistencies, or missing and inaccurate data. To top it off, you'll gain strategies for verifying and sharing your data cleansing with stakeholders.
- Analyze
Analyzing the data you've collected involves using tools to transform and organize that information so that you can draw useful conclusions, make predictions, and drive informed decision-making. There are lots of powerful tools data analysts use in their work and in this course you'll learn about two of them, spreadsheets and structured query language, or SQL, which is often pronounced "sequel."
- Share
Here you'll learn how data analysts interpret results and share them with others to help stakeholders make effective data-driven decisions. In the share phase, visualization is a data analyst's best friend. So this course will highlight why visualization is essential to getting others to understand what your data is telling you. With the right visuals, facts and figures become so much easier to see and complex concepts become easier to understand. We'll explore different kinds of visuals and some great data visualization tools. You'll also practice your own presentation skills by creating compelling slideshows and learning how to be fully prepared to answer questions.
Then we'll take a break from the data analysis process to show you all of the really cool things you can do with the programming language R. You don't need to be familiar with R or programming languages in general. Just know that R is a popular tool for data manipulation, calculation, and visualization.
- Act
This is the exciting moment when the business takes all of the insights you, the data analyst, have provided and puts them to work in order to solve the original business problem and will be acting on what you've learned throughout this program.
This is when you prepare for your job search and have the chance to complete a case study project. It's a great opportunity for you to bring together everything you've worked on throughout this course. Plus adding a case study to your portfolio helps you stand out from the other candidates when you interview for your first data analyst job.
DAC(Data Analysis Core) The data analysis process

(231011)
- Ask
"You want to ask all of the right questions at the beginning of the engagement so that you better understand what your leaders and stakeholders need from this analysis."
What is the problem that we're trying to solve? What is the purpose of this analysis? What are we hoping to learn from it?
- Prepare
"We need to be thinking about what type of data we need in order to answer those key questions that we've set out to answer based on what we learned when we asked the right questions."
We also need to think about how we're going to collect that data or if we need to collect that data. It may be the case that we need to collect this data brand-new. So we need to think about what type of data we will be collecting and how.
This could be anything from quantitative data to qualitative data. It could be cross-sectional or points in time versus longitudinal over a long period of time.
For our employee engagement survey, we do that via survey of both quantitative and qualitative questions. But it may actually be the case that for many analyses, the data that you're looking for already exist. Then it's a question of working with those data owners to make sure that you are able to leverage that data and use it responsibly.
- Process
"This is where you get a chance to understand its structure, its quirks, its nuances, and you really get a chance to understand deeply what type of data you're going to be working with and understanding what potential that data has to answer all of your questions."
This is the part where we spend a lot of time really digging deeply into the structure and nuance of the data to make sure that you're able to analyze it appropriately and responsibly. It begins with cleaning. This is such an important part, too, where we're running through all of our quality assurance checks.
For example, do we have all of the data that we anticipated we would have? Are we missing data at random or is it missing in a systematic way such that maybe something went wrong with our data collection effort? If needed, did we code all of our data the right way? Are there any outliers that we need to treat differently?
- Analyze
"This is the point where we have to take a step back and let the data speak for iteself."
Make sure to do so in as objective and unbiased a way as possible.
To do this, the first thing we do is run through a series of analyses that we've already planned ahead of time based on the questions that we know we want to answer from the very, very beginning of the process.
One thing that's probably the hardest about this particular process, the hardest thing about analyzing data, is that we as analysts are trained to look for patterns. Over time as we become better and better at our jobs, what we'll often find is that we can start to intuit what we might see in the data. We might have a sneaking suspicion as to what the data are going to tell us. This is the point where we have to take a step back and let the data speak for itself. As data analysts, we are storytellers, but we also have to keep in mind that it is not our story to tell. That story belongs to the data, and it is our job as analysts to amplify and tell that story in as unbiased and objective a way as possible.
- Share
"Share all of the data and insights that you've generated from your analyses.
Now typically for employee engagement surveys, we start by sharing the high-level findings with our executive team. We want them to have a landscape view of how the organization is feeling, and we want to make sure that there aren't any surprises as they dig deeper and deeper into the data to understand how teams are feeling and how individual employees are feeling.
- Act
"All of this work from asking the right questions to collecting your data, to analyzing and sharing doesn't mean much of anything if we aren't taking action on what we've just learned."
This to me is the most critical part, especially of our employee engagement survey. I like to say that the survey is actually the easy part, and acting on the results is really where the real work begins. This is where we use all of those data-driven insights to decide what types of interventions we want to introduce, not only at the organizational level but also at the team level as well.
We might find, for example, that the organization is working on a series of interventions to help improve part of the employee experience, whereas individual teams have additional roles, and responsibilities to play, to either bolster some of those efforts or to introduce new ones to better meet their team where their strengths and opportunity areas are.
Molly says: "The data analysis process is rigorous, but it is lengthy. I can completely appreciate that we as data analysts, get so excited about just diving right into the data and doing what we do best. The challenge is that if we don't work through the process in its entirety if we try to skip steps, we're not going to be able to elicit the insights that we're looking for. I absolutely love my job. I have such a deep appreciation for data and what it can do and what type of insight we can derive from it."
Understand how a table is structured:
- A table consists of rows and columns
- Each row is a different observation
- Each column is a different attribute of that observation
- Microsoft Excel
- Google Sheets
It stores, organizes, and sorts data. This is important because the usefulness of your data depends on how well it's structured. When you put your data into a spreadsheet, you can see patterns, and group information and easily find the information you need. Spreadsheets also have some really useful features called formulas and functions.
Spreadsheets structure data in a meaningful way by letting you
-
Collect, store, organize, and sort information
-
Identify patterns and piece the data together in a way that works for each specific data project
-
Create excellent data visualizations, like graphs and charts.
-
Formula
A set of instructions that performs a specific calculation using the data in a spreadsheet.
Formulas can do basic things like add, subtract, multiply and divide, but they don't stop there. You can also use formulas to find the average of a number set. Look up a particular value, return the sum of a set of values that meets a particular rule, and so much more.
- Function
A preset command that automatically performs a specific process or task using the data in a spreadsheet
Just think of a function as a simpler, more efficient way of doing something that would normally take a lot of time. In other words, functions can help make you more efficient. Those are the spreadsheet basics for now.
A database is a collection of structured data stored in a computer system. Some popular Structured Query Language (SQL) programs include MySQL, Microsoft SQL Server, and BigQuery.
Query languages
-
Allow analysts to isolate specific information from a database(s)
-
Make it easier for you to learn and understand the requests made to databases
-
Allow analysts to select, create, add, or download data from a database for analysis
-
Query language
A computer programming language that allows you to retrieve and manipulate data from a database
More commonly known as SQL.
- SQL
A language that lets data analysts communicate with a database.
With SQL, data analysts can access the data they need by making a query. Although query means question, I like to think of it as more of a request. So you're requesting that the database do something for you. You can ask it to do a lot of different things such as insert, delete, select or update data. Okay, that's a top level look at SQL.
- Database
A collection of data stored in a computer system
- Data visualization
The graphical representation of information
Some examples include graphs, maps, and tables.
Data analysts use a number of visualization tools, like graphs, maps, tables, charts, and more. Two popular visualization tools are Tableau and Looker.
These tools
- Turn complex numbers into a story that people can understand
- Help stakeholders come up with conclusions that lead to informed decisions and effective business strategies
- Have multiple features
- Tableau's simple drag-and-drop feature lets users create interactive graphs in dashboards and worksheets
- Looker communicates directly with a database, allowing you to connect your data right to the visual tool you choose
A career as a data analyst also involves using programming languages, like R and Python, which are used a lot for statistical analysis, visualization, and other data analysis.
Most people process visuals more easily than words alone. That's why visualizations are so important. They help data analysts communicate their insights to others, in an effective and compelling way. When you think about the data analysis process, after data is prepared, processed and analyzed, the insights are visualized so it can be understood and shared. This makes it easier for stakeholders to draw conclusions, make decisions, and come up with strategies.
Some popular visualization tools are Tableau and Looker.
Data analysts like using Tableau because it helps them create visuals that are very easy to understand. This means that even non-technical users can get the information they need.
Looker is also popular with data analysts because it gives them an easy way to create visuals based on the results of a query. With Looker, you can give stakeholders a complete picture of your work by showing them visualization data and the actual data related to it.
As a data analyst, you will usually have to decide which program or solution is right for the particular project you are working on. If you are focusing on organizing, cleaning, and analyzing data, then you will probably be choosing between spreadsheets and databases using queries. Spreadsheets and databases both offer ways to store, manage, and use data. The basic content for both tools are set of values. Yet, there are some key differences, too:

Generally, data analysts work with a combination of the two, as both tools are very useful in data analytics. For example, you can store data in a database, then export it to a spreadsheet for analysis. Or, if you are collecting information in a spreadsheet, and it becomes too much for that particular platform, you can import it into a database. And, later in this course, you will learn about programming languages like R that give you even greater control of your data, its analysis, and the visualizations you create.
- Ask
- Prepare
- Process
- Analyze
- Share
- Act
- Plan
- Capture
- Manage
- Analyze
- Archive
- Destroy
- What is the relationship between the data life cycle and the data analysis process? How are the two processes similar? How are they different?
While the data analysis process will drive your projects and help you reach your business goals, you must understand the life cycle of your data in order to use that process. To analyze your data well, you need to have a thorough understanding of it. Similarly, you can collect all the data you want, but the data is only useful if you have a plan for analyzing it.
- What is the relationship between teh Ask phase of the data analysis process and the Plan phase of the data life cycle? How are they similar? How are they different?
The Plan and Ask phases both involve planning and asking questions, but they tackle different subjects. The Ask phase in the data analysis process focuses on big-picture strategic thinking about business goals. However, the Plan focuses on the fundamentals of the project, such as what data you have access to, what data you need, and where you're going to get it.
=> Overall, my answer:
The data life cycle provides a generic or common framework for how data is managed. So data life cycle is describing how data iteself is managed and worked or process. Data analysis process, is more likely related to business relation matters, and how we can manage analyzing process more structured and organized way, and since it is conducting data analysis, and deeply working with data, it may well remind the as well the data life cycle as in core feature of data. I guess somewhat the Act phase of data analysis process and Destroy phase of data life cycle process can be designed in interaction way. So to remind the difference in each plan and ask phase of two different data life cylce and analysis process, is that plan is more focus on the data concept, and how data should be processed and managed overall in advanced, while ask is more related to business concept, and what kind of data should be used in context level. The similarity can be both are committed to data-driven decision-making process, and they have goals into drive the objective and best-optimal way of the data in unbiased way as possible. Both are very crucial to the subsequences from well-structured objectives.