1.1.3.Understanding the data ecosystem - sj50179/Google-Data-Analytics-Professional-Certificate GitHub Wiki

What is the data ecosystem?

To put it simply, an ecosystem is a group of elements that interact with one another.

Data ecosystems are made up of various elements that interact with one another in order to produce, manage, store, organize, analyze, and share data.

Data can also be found in something called the cloud. The cloud is a place to keep data online, rather than on a computer hard drive.

For example,

  • you could tap into your retail store's database, which is an ecosystem filled with customer names, addresses, previous purchases, and customer reviews. As a data analyst, you could use this information to predict what these customers will buy in the future, and make sure the store has the products and stock when they're needed.
  • let's think about a data ecosystem used by a human resources department. This ecosystem would include information like postings from job websites, stats on the current labor market, employment rates, and social media data on prospective employees. A data analyst could use this information to help their team recruit new workers and improve employee engagement and retention rates.
  • Agricultural companies regularly use data ecosystems that include information including geological patterns in weather movements. Data analysts can use this data to help farmers predict crop yields.
  • Some data analysts are even using data ecosystems to save real environmental ecosystems. At the Scripps Institution of Oceanography, coral reefs all over the world are monitored digitally, so they can see how organisms change over time, track their growth, and measure any increases or declines in individual colonies.

Q. In data analytics, what is the term for elements that interact with one another in order to produce, manage, store, organize, analyze, and share data? (reminder: be sure to scroll down to see all options!)

A. Data ecosystems

  • Elements that interact with one another in order to produce, manage, store, organize, analyze, and share data are data ecosystems. These elements include hardware and software tools, as well as the people who use them.

Difference between data scientists and data analysts

  • Data science is defined as creating new ways of modeling and understanding the unknown by using raw data.
  • Data scientists create new questions using data, while analysts find answers to existing questions by creating insights from data sources.
  • Data analysis is the collection, transformation, and organization of data in order to draw conclusions, make predictions, and drive informed decision-making.
  • Data analytics in the simplest terms is the science of data.

When you think about data, data analysis and the data ecosystem, it's important to understand that all of these things fit under the data analytics umbrella.

How data informs better decisions

One of the most powerful ways you can put data to work is with data-driven decision-making.

Data-driven decision-making is defined as using facts to guide business strategy.

Organizations in many different industries are empowered to make better, data-driven decisions by data analysts all the time.

The first step in data-driven decision-making is figuring out the business need.

Usually, this is a problem that needs to be solved. For example,

  • a problem could be a new company needing to establish better brand recognition, so it can compete with bigger, more well-known competitors
  • maybe an organization wants to improve a product and needs to figure out how to source parts from a more sustainable or ethically responsible supplier.
  • it could be a business trying to solve the problem of unhappy employees, low levels of engagement, satisfaction and retention

Whatever the problem is, once it's defined, a data analyst finds data, analyzes it and uses it to uncover trends, patterns and relationships. Sometimes the data-driven strategy will build on what's worked in the past. Other times, it can guide a business to branch out in a whole new direction.

  • Let's look at a real-world example. Think about a music or movie streaming service. How do these companies know what people want to watch or listen to, and how do they provide it? Well using data-driven decision-making, they gather information about what their customers are currently listening to, analyze it, then use the insights they've gained to make suggestions for things people will most likely enjoy in the future. This keeps customers happy and coming back for more, which in turn means more revenue for the company.
  • Another example of data-driven decision-making can be seen in the rise of e-commerce. It wasn't long ago that most purchases were made in a physical store, but the data showed people's preferences were changing. So a lot of companies created entirely new business models that remove the physical store, and let people shop right from their computers or mobile phones with products delivered right to their doorstep.

In fact, data-driven decision-making can be so powerful, it can make entire business methods obsolete.

It's important to note that no matter how valuable data-driven decision-making is, data alone will never be as powerful as data combined with human experience, observation, and sometimes even intuition. To get the most out of data-driven decision-making, it's important to include insights from people who are familiar with the business problem. These people are called subject matter experts, and they have the ability to look at the results of data analysis and identify any inconsistencies, make sense of gray areas, and eventually validate choices being made.

As a data analyst, you play a key role in empowering these organizations to make data-driven decisions, which is why it's so important for you to understand how data plays a part in the decision-making process.

Data and gut instinct

Detectives and data analysts have a lot in common. Both depend on facts and clues to make decisions. Both collect and look at the evidence. Both talk to people who know part of the story. And both might even follow some footprints to see where they lead. Because whether you’re a detective or a data analyst, your job is all about following steps to collect and understand facts.

Analysts use data-driven decision-making and follow a step-by-step process. You have learned that there are six steps to this process:

  1. Ask questions and define the problem.
  2. Prepare data by collecting and storing the information.
  3. Process data by cleaning and checking the information.
  4. Analyze data to find patterns, relationships, and trends.
  5. Share data with your audience.
  6. Act on the data and use the analysis results.

Analyzing facts is a key part of data-driven decision making because facts lead to patterns that help guide the decisions we make — big and small. Data-driven decision-making is rooted in using facts to guide business strategy. As an analyst, you will be tasked with creating a verified story about the data and sharing it with stakeholders. These stakeholders use your story to make choices based on facts, and make sure that the company is focused on the right goals.

Gut instinct can be a problem

There are other factors influencing the decision making process, too, though. You may have read mysteries where the detective used their gut instinct, and followed a hunch that helped them solve the case. Gut instinct is an intuitive understanding of something with little or no explanation. This isn’t always something conscious; we often pick up on signals without even realizing. You just have a “feeling” it’s right.

But for data analysts, just trusting our gut instinct can be a problem. At the heart of data-driven decision making is data, so we always want to focus on the data to ensure that we’re making informed decisions. When we make decisions based on our gut instinct without any data to back it up, it can lead to mistakes. Or worse, when we ignore the data based on our own personal experiences, we can create bias in our analysis. Businesses that rely on gut instinct to make decisions often make bad choices because they aren’t considering the story their data is actually telling.

Instead of relying on gut instinct, you can build your business knowledge and experience over time. The more you know about how a business works, the easier it will be to figure out what that business needs. And that business knowledge and experience can also help you identify errors and gaps in your data and communicate your findings. For example, a detective might be able to crack open a case because they remember an old case just like the one they’re solving today. Their past experience could help them make a connection that no one else would notice. Maybe their unique background knowledge helps them discover someone is lying, or it could help them uncover new clues. Your business knowledge and experience may help you understand problems intuitively. But, unlike gut instinct, it will give you more than just a feeling to go on.

Data + business knowledge = mystery solved

Blending facts and data with your business knowledge will be a common part of your process. The key is figuring out the exact mix of data and business knowledge for each particular project. A lot of times it will depend on the goals of your analysis. That is why analysts often ask, “How do I define success for this project?”

Successful analysis needs to be accurate, and fast enough to help decision-makers. So try asking yourself these questions about a project:

  • What kind of results are needed?
  • Who will be informed?
  • Am I answering the question being asked?
  • How quickly does a decision need to be made?

For example, if you are working on a rush project, you might need to rely on your own knowledge and experience more than usual. There just isn’t enough time to thoroughly analyze all of the available data. But if you get a project that involves plenty of time and resources, then the best strategy would be to be more data-driven. It’s up to you, the data analyst, to think about the situation and make the best possible choice. You will probably blend facts and knowledge a million different ways over the course of your data analytics career. And the more you practice, the better you will get at finding that perfect blend.

Origins of the data analysis process

When you decided to join this program, you proved that you are a curious person. So let’s tap into your curiosity and talk about the origins of data analysis. We don’t fully know when or why the first person decided to record data about people and things. But we do know it was useful because the idea is still around today!

We also know that data analysis is rooted in statistics, which has a pretty long history itself. Archaeologists mark the start of statistics in ancient Egypt with the building of the pyramids. The Ancient Egyptians were masters of organizing data. They documented their calculations and theories on papyri (paper-like materials), which are now viewed as the earliest examples of spreadsheets and checklists. Today’s data analysts owe a lot to those brilliant scribes, who helped create a more technical and efficient process.

It is time to enter the data analysis life cycle—the process of going from data to decision. Data goes through several phases as it gets created, consumed, tested, processed, and reused. With a life cycle model, all key team members can drive success by planning work both up front and at the end of the data analysis process. While the data analysis life cycle is well known among experts, there isn't a single defined structure of those phases. There might not be one single architecture that’s uniformly followed by every data analysis expert, but there are some shared fundamentals in every data analysis process. This reading provides an overview of several, starting with the process that forms the foundation of the Google Data Analytics Certificate.

The process presented as part of the Google Data Analytics Certificate is one that will be valuable to you as you keep moving forward in your career:

  1. Ask: Business Challenge/Objective/Question
  2. Prepare: Data generation, collection, storage, and data management
  3. Process: Data cleaning/data integrity
  4. Analyze: Data exploration, visualization, and analysis
  5. Share: Communicating and interpreting results
  6. Act: Putting your insights to work to solve the problem

Understanding this process—and all of the iterations that helped make it popular—will be a big part of guiding your own analysis and your work in this program. Let’s go over a few other variations of the data analysis life cycle.

EMC's data analysis life cycle

EMC Corporation's data analytics life cycle is cyclical with six steps:

  1. Discovery
  2. Pre-processing data
  3. Model planning
  4. Model building
  5. Communicate results
  6. Operationalize

EMC Corporation is now Dell EMC. This model, created by David Dietrich, reflects the cyclical nature of real-world projects. The phases aren’t static milestones; each step connects and leads to the next, and eventually repeats. Key questions help analysts test whether they have accomplished enough to move forward and ensure that teams have spent enough time on each of the phases and don’t start modeling before the data is ready. It is a little different from the data analysis life cycle this program is based on, but it has some core ideas in common: the first phase is interested in discovering and asking questions; data has to be prepared before it can be analyzed and used; and then findings should be shared and acted on.

For more information, refer to The Genesis of EMC's Data Analytics Lifecycle.

SAS' iterative life cycle

An iterative life cycle was created by a company called SAS, a leading data analytics solutions provider. It can be used to produce repeatable, reliable, and predictive results:

  1. Ask
  2. Prepare
  3. Explore
  4. Model
  5. Implement
  6. Act
  7. Evaluate

The SAS model emphasizes the cyclical nature of their model by visualizing it as an infinity symbol. Their life cycle has seven steps, many of which we have seen in the other models, like Ask, Prepare, Model, and Act. But this life cycle is also a little different; it includes a step after the act phase designed to help analysts evaluate their solutions and potentially return to the ask phase again.

For more information, refer to Managing the Analytics LIfe Cycle for Decisions at Scale.

Project-based data analytics life cycle

A project-based data analytics life cycle has five simple steps:

  1. Identifying the problem
  2. Designing data requirements
  3. Pre-processing data
  4. Data analysis
  5. Data visualizing

This data analytics project life cycle was developed by Vignesh Prajapati. It doesn’t include the sixth phase, or what we have been referring to as the Act phase. However, it still covers a lot of the same steps as the life cycles we have already described. It begins with identifying the problem, preparing and processing data before analysis, and ends with data visualization.

For more information, refer to Understanding the data analytics project life cycle.

Big data analytics life cycle

Authors Thomas Erl, Wajid Khattak, and Paul Buhler proposed a big data analytics life cycle in their book, Big Data Fundamentals: Concepts, Drivers & Techniques. Their life cycle suggests phases divided into nine steps:

  1. Business case evaluation
  2. Data identification
  3. Data acquisition and filtering
  4. Data extraction
  5. Data validation and cleaning
  6. Data aggregation and representation
  7. Data analysis
  8. Data visualization
  9. Utilization of analysis results

This life cycle appears to have three or four more steps than the previous life cycle models. But in reality, they have just broken down what we have been referring to as Prepare and Process into smaller steps. It emphasizes the individual tasks required for gathering, preparing, and cleaning data before the analysis phase.

For more information, refer to Big Data Adoption and Planning Considerations.

Data life cycle based on research

One final data life cycle informed by Harvard University research has eight phases:

  1. Generation
  2. Collection
  3. Processing
  4. Storage
  5. Management
  6. Analysis
  7. Visualization
  8. Interpretation

This version includes storage, management, and interpretation phases, and excludes the Act phase that has appeared in other models.

For more information, refer to 8 Steps in the Data Life Cycle.

Key takeaway

From our journey to the pyramids and data in Ancient Egypt to now, the way we analyze data has evolved (and continues to do so). The data analysis process is like real life architecture, there are different ways to do things but the same core ideas still appear in each model of the process. Whether you use the structure of this Google Data Analytics Certificate or one of the many other iterations you have learned about, we are here to help guide you as you continue on your data journey.