3.3.1.Working with databases & Managing data with metadata - sj50179/Google-Data-Analytics-Professional-Certificate GitHub Wiki

Working with databases

Normalized database

  • A database in which only related data is stored in each table

Redundancy

  • When the same piece of data is stored in two or more separate places

Schema

  • A way of describing how something is organized

Databases in data analytics

In this reading you will learn about what relational databases are and how databases can be organized. Databases let analysts manipulate, store, and process data.  This helps them search through data a lot more efficiently to get the best insights.

Relational databases

A relational database is a database that contains a series of tables that can be connected to show relationships. Basically, they allow data analysts to organize and link data based on what the data has in common.

In a non-relational table, you will find all of the possible variables you might be interested in analyzing all grouped together. This can make it really hard to sort through. This is one reason why relational databases are so common in data analysis: they simplify a lot of analysis processes and make data easier to find and use across an entire database.

The key to relational databases

Tables in a relational database are connected by the fields they have in common. You might remember learning about primary and foreign keys before. As a quick refresher, a primary key is an identifier that references a column in which each value is unique. Primary keys are unique, and the value assigned to that key for each row can only exist once within a given table. For example, if customer_id is the primary key in the customer_address table, no two customers will ever have the same customer_id.  By contrast, a foreign key is a field within a table that is a primary key in another table. A table can only have one primary key, but it can have multiple foreign keys. These keys are what create the relationships between tables in a relational database, which helps organize and connect data across multiple tables in the database.

SQL? You’re speaking my language

Databases use a special language to communicate called a query language. Structured Query Language (SQL) is a type of query language that lets data analysts communicate with a database. So, a data analyst will use SQL to create a query to view the specific data that they want from within the larger set. In a relational database, data analysts can write queries to get data from the related tables. SQL is a powerful tool for working with databases — which is why you are going to learn more about it coming up!

Test your knowledge on working with databases

TOTAL POINTS 3

Question 1

Fill in the blank: A _____ is an identifier that references a database column in which each value is unique.

  • primary key
  • relation
  • foreign key
  • field

Correct. A primary key is an identifier that references a column in which each value is unique. A foreign key is a field within a table that’s a primary key in the original table.

Question 2

Fill in the blank: A relational database contains a series of _____ that can be connected to form relationships.

  • fields
  • spreadsheets
  • cells
  • tables

Correct. A relational database contains a series of tables that can be connected to form relationships.

Question 3

A key benefit of working with normalized databases is that they help lower data redundancy. Which of the following is an example of redundancy?

  • The same piece of data being stored in two different places
  • A database containing two foreign keys
  • Team members in different office locations working with the same data
  • A database that forms two or more relationships

Correct. The same piece of data being stored in two different places is an example of redundancy.


Managing data with metadata

Metadata is used in database management to help data analysts interpret the contents of the data within the database.

3 common types of metadata

  • Descriptive - Metadata that describes a piece of data and can be used to identify it at a later point in time
  • Structural - Metadata that indicates how a piece of data is organized and whether it is part of one, or more than one, data collection
  • Administrative - Metadata that indicates the technical source of a digital asset

Metadata creates a single source of truth by keeping things consistent and uniform.

Metadata also makes data more reliable by making sure it's accurate, precise, relevant, and timely.

Metadata repositories

  • Describe the state and location of the metadata
  • Describe the structures of the tables inside
  • Describe how the data flows through the repository
  • Keep track of who accesses the metadata and when

Understanding metadata

Data analytics, by design, is a field that thrives on collecting and organizing data. In this reading, you are going to learn about how to analyze and thoroughly understand every aspect of your data.

https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/1CZaYJ0VSeKmWmCdFSni_A_e4cfae7d6e8a47babeeef90a2cfef9f1_Screen-Shot-2021-06-24-at-11.43.23-AM.png?expiry=1624752000000&hmac=ryozWTn_GPiwI_fTRhWdqWvCqgpppa2n15gFXt3rNe4

Take a look at any data you find. What is it? Where did it come from? Is it useful? How do you know? This is where metadata comes in to provide a deeper understanding of the data. To put it simply, metadata is data about data. In database management, it provides information about other data and helps data analysts interpret the contents of the data within a database.

Regardless of whether you are working with a large or small quantity of data, metadata is the mark of a knowledgeable analytics team, helping to communicate about data across the business and making it easier to reuse data. In essence, metadata tells the who, what, when, where, which, how, and why of data.

Elements of metadata

Before looking at metadata examples, it is important to understand what type of information metadata typically provides.

Title and description

What is the name of the file or website you are examining? What type of content does it contain?

Tags and categories

What is the general overview of the data that you have? Is the data indexed or described in a specific way?

Who created it and when

Where did the data come from, and when was it created? Is it recent, or has it existed for a long time?

Who last modified it and when

Were any changes made to the data?  If yes, were the modifications recent?

Who can access or update it

Is this dataset public? Are special permissions needed to customize or modify the dataset?

Examples of metadata

In today’s digital world, metadata is everywhere, and it is becoming a more common practice to provide metadata on a lot of media and information you interact with. Here are some real-world examples of where to find metadata:

Photos

Whenever a photo is captured with a camera, metadata such as camera filename, date, time, and geolocation are gathered and saved with it.

Emails

When an email is sent or received, there is lots of visible metadata such as subject line, the sender, the recipient and date and time sent. There is also hidden metadata that includes server names, IP addresses, HTML format, and software details.

Spreadsheets and documents

Spreadsheets and documents are already filled with a considerable amount of data so it is no surprise that metadata would also accompany them. Titles, author, creation date, number of pages, user comments as well as names of tabs, tables, and columns are all metadata that one can find in spreadsheets and documents.

Websites

Every web page has a number of standard metadata fields, such as tags and categories, site creator’s name, web page title and description, time of creation and any iconography.

Digital files

Usually, if you right click on any computer file, you will see its metadata. This could consist of file name, file size, date of creation and modification, and type of file.

Books

Metadata is not only digital. Every book has a number of standard metadata on the covers and inside that will inform you of its title, author’s name, a table of contents, publisher information, copyright description, index, and a brief description of the book’s contents.

Data as you know it

Knowing the content and context of your data, as well as how it is structured, is very valuable in your career as a data analyst. When analyzing data, it is important to always understand the full picture. It is not just about the data you are viewing, but how that data comes together. Metadata ensures that you are able to find, use, preserve, and reuse data in the future. Remember, it will be your responsibility to manage and make use of data in its entirety; metadata is as important as the data itself.

Test your knowledge on metadata

TOTAL POINTS 4

Question 1

A large company has several data collections across its many departments. What kind of metadata indicates exactly how many collections a piece of data lives in?

  • Administrative
  • Structural
  • Descriptive
  • Representative

Correct. Structural metadata indicates exactly how many collections data lives in. It provides information about how a piece of data is organized and whether it’s part of one, or more than one, data collection.

Question 2

The date and time a photo was taken is an example of which kind of metadata?

  • Representative
  • Administrative
  • Structural
  • Descriptive

Correct. The date and time a photo was taken is an example of administrative metadata. Administrative metadata indicates the technical source and details for a digital asset.

Question 3

A large metropolitan high school gives each of its students an ID number to differentiate them in its database. What kind of metadata are the ID numbers?

  • Descriptive
  • Administrative
  • Structural
  • Representative

Correct. The ID numbers are descriptive metadata. Descriptive metadata describes a piece of data or can be used to identify it at any time.

Question 4

A company needs to merge third-party data with its own data. Which of the following actions will help make this process successful? Select all that apply.

  • Use the metadata to standardize the data.
  • Alter the company’s metadata to more closely reflect the incoming metadata.
  • Replace the incoming data’s metadata with its own company metadata.
  • Use the metadata to evaluate the third-party data’s quality and credibility.

Correct. The company can use the metadata to standardize the data and evaluate the third-party data’s quality and credibility.