3.1.2.Difference between data formats and structes - sj50179/Google-Data-Analytics-Professional-Certificate GitHub Wiki

Discover data formats

Discrete data

  • Data that is counted and has a limited number of values

Continuous data

  • Data that is measured and can have almost any numeric value

Nominal data

  • A type of qualitative data that is categorized without a set order

Ordinal data

  • A type of qualitative data with a set order or scale

Internal data

  • Data that lives within a company's own systems

External data

  • Data that lives and is generated outside of an organization

Structured data

  • Data organized in a certain format such as rows and columns

Unstructured data

  • Data that is not organized in any easily identifiable manner

Question 1

An entertainment website displays a star rating for a movie based on user reviews. Users can select from one to five whole stars to rate the movie. The star rating is an example of what type of data? Select all that apply.

  • Ordinal
  • Continuous
  • Nominal
  • Discrete

Correct. The star rating is an example of ordinal data because the number of stars are in order of how much each person liked the movie. It’s also an example of discrete data because a person has to choose a full star measure; half-stars weren’t an option.


Question 2

The use of external data is particularly valuable in which circumstances?

  • When analysis involves data that hasn’t been cleaned
  • When analysis requires a lot of structured data
  • When analysis includes data from audio files
  • When analysis depends on as many data sources as possible

Correct. External data is particularly valuable when an analysis depends on as many sources as possible.

Data formats in practice

When you think about the word "format," a lot of things might come to mind. Think of an advertisement for your favorite store. You might find it in the form of a print ad, a billboard, or even a commercial. The information is presented in the format that works best for you to take it in. The format of a dataset is a lot like that, and choosing the right format will help you manage and use your data in the best way possible.

Data format examples

As with most things, it is easier for definitions to click when we can pair them with real life examples. Review each definition first and then use the examples to lock in your understanding of each data format.

  • Differences between primary and secondary data and examples of each
Data Format Classification Definition Examples
Primary data Collected by a researcher from first-hand sources - Data from an interview you conducted
- Data from a survey returned from 20 participants
- Data from questionnaires you got back from a group of workers
Secondary data Gathered by other people or from other research - Data you bought from a local data analytics firm’s customer profiles
- Demographic data collected by a university; Census data gathered by the federal government
  • Differences between internal and external data and examples of each
Data Format Classification Definition Examples
Internal data Data that lives inside a company’s own systems - Wages of employees across different business units tracked by HR
- Sales data by store location
- Product inventory levels across distribution centers
External data Data that lives outside of a company or organization - National average wages for the various positions throughout your organization
- Credit reports for customers of an auto dealership
  • Differences between continuous and discrete data and examples of each
Data Format Classification Definition Examples
Continuous data Data that is measured and can have almost any numeric value - Height of kids in third grade classes (52.5 inches, 65.7 inches)
- Runtime markers in a video
- Temperature
Discrete data Data that is counted and has a limited number of values - Number of people who visit a hospital on a daily basis (10, 20, 200)
- Room’s maximum capacity allowed
- Tickets sold in the current month
  • Differences between qualitative and quantitative data and examples of each
Data Format Classification Definition Examples
Qualitative Subjective and explanatory measures of qualities and characteristics - Exercise activity most enjoyed
- Favorite brands of most loyal customers
- Fashion preferences of young adults
Quantitative Specific and objective measures of numerical facts - Percentage of board certified doctors who are women
- Population of elephants in Africa
- Distance from Earth to Mars
  • Differences between nominal and ordinal data and examples of each
Data Format Classification Definition Examples
Nominal A type of qualitative data that isn’t categorized with a set order - First time customer, returning customer, regular customer
- New job applicant, existing applicant, internal applicant
- New listing, reduced price listing, foreclosure
Ordinal A type of qualitative data with a set order or scale - Movie ratings (number of stars: 1 star, 2 stars, 3 stars)
- Ranked-choice voting selections (1st, 2nd, 3rd)
- Income level (low income, middle income, high income)
  • Differences between structured and unstructured data and examples of each
Data Format Classification Definition Examples
Structured data Data organized in a certain format, like rows and columns - Expense reports
- Tax returns
- Store inventory
Unstructured data Data that isn’t organized in any easily identifiable manner - Social media posts
- Emails
- Videos

Data model

  • A model that is used for organizing data elements and how they relate to one another

Data elements

  • Pieces of information, such as people's names, account numbers, and addresses

Data modeling levels and techniques

In this reading, you will learn about data modeling and some different types of data models. Data models help keep data consistent and give us a map of how data is organized. This makes it easier for analysts and other stakeholders to make sense of their data and use it in the right ways. As a junior data analyst, you will probably be working with the data models your organization already has in place — but understanding how data models work can help you make sense of other models you might come across on the job.

What is data modeling?

Data modeling is the process of creating diagrams that visually represent how data is organized and structured.  These visual representations are called data models. You can think of data modeling as a blueprint of a house. At any point, there might be electricians, carpenters, and plumbers using that blueprint. Each one of these builders has a different relationship to the blueprint, but they all need it to understand the overall structure of the house. Data models are similar; different users might have different data needs, but the data model gives them an understanding of the structure as a whole.

Levels of data modeling

Each level of data modeling has a different level of detail.

  1. Conceptual data modeling gives you a high-level view of your data structure, such as how you want data to interact across an organization.
  2. Logical data modeling focuses on the technical details of the model such as relationships, attributes, and entities.
  3. Physical data modeling should actually depict how the database was built. By this stage, you are laying out how each database will be put in place and how the databases, applications, and features will interact in specific detail.

More information can be found in this comparison of data models.

Data-modeling techniques

There are a lot of approaches when it comes to developing data models, but two common methods are the Entity Relationship Diagram (ERD) and the Unified Modeling Language (UML) diagram. ERDs are a visual way to understand the relationship between entities in the data model. UML diagrams are very detailed diagrams that describe the structure of a system by showing the system's entities, attributes, operations, and the relationships. As a junior data analyst, you will need to understand that there are different data modeling techniques, but in practice, you will probably be using your organization’s existing model.

You can read more about ERD, UML, and data dictionaries in this data modeling techniques article.

Data analysis and data modeling

Data modeling can help you explore the high-level details of your data and how it is related across the organization’s information systems. Data modeling sometimes requires data analysis to understand how the data is put together; that way, you know how to map the data. And finally, data models make it easier for everyone in your organization to understand and collaborate with you on your data. This is important for you and everyone on your team!

The structure of data

Data is everywhere and it can be stored in lots of ways. Two general categories of data are:

  • Structured data: Organized in a certain format, such as rows and columns.
  • Unstructured data: Not organized in any easy-to-identify way.

For example, when you rate your favorite restaurant online, you're creating structured data. But when you use Google Earth to check out a satellite image of a restaurant location, you're using unstructured data.

Here's a refresher on the characteristics of structured and unstructured data:

Structured data

As we described earlier, structured data is organized in a certain format. This makes it easier to store and query for business needs. If the data is exported, the structure goes along with the data.

Unstructured data

Unstructured data can’t be organized in any easily identifiable manner. And there is much more unstructured than structured data in the world. Video and audio files, text files, social media content, satellite imagery, presentations, PDF files, open-ended survey responses, and websites all qualify as types of unstructured data.

The fairness issue

The lack of structure makes unstructured data difficult to search, manage, and analyze. But recent advancements in artificial intelligence and machine learning algorithms are beginning to change that. Now, the new challenge facing data scientists is making sure these tools are inclusive and unbiased. Otherwise, certain elements of a dataset will be more heavily weighted and/or represented than others. And as you're learning, an unfair dataset does not accurately represent the population, causing skewed outcomes, low accuracy levels, and unreliable analysis.

Test your knowledge on data formats and structures

TOTAL POINTS 4

Question 1

Fill in the blank: The running time of a movie is an example of _____ data.

  • qualitative
  • continuous
  • discrete
  • nominal

Correct. Running times of movies are an example of continuous data, which is measured and can have almost any numeric value.

Question 2

What are the characteristics of unstructured data? Select all that apply.

  • Fits neatly into rows and columns
  • May have an internal structure
  • Is not organized
  • Has a clearly identifiable structure

Correct. Unstructured data is not organized, although it may have an internal structure.

Question 3

Structured data enables data to be grouped together to form relations. This makes it easier for analysts to do what with the data? Select all that apply.

  • Analyze
  • Rewrite
  • Search
  • Store

Correct. Structured data that is grouped together to form relations enables analysts to more easily store, search, and analyze the data.

Question 4

Which of the following is an example of unstructured data?

  • Email message
  • Contact saved on a phone
  • Rating of a local favorite restaurant
  • GPS location

Correct. An example of unstructured data is an email message. Other examples of unstructured data are video files and social media content.

⚠️ **GitHub.com Fallback** ⚠️