3.2.3.Data ethics and privacy - quanganh2001/Google-Data-Analytics-Professional-Certificate-Coursera GitHub Wiki

Data anonymization

What is data anonymization?

You have been learning about the importance of privacy in data analytics. Now, it is time to talk about data anonymization and what types of data should be anonymized. Personally identifiable information, or PII, is information that can be used by itself or with other data to track down a person's identity.

Data anonymization is the process of protecting people's private or sensitive data by eliminating that kind of information. Typically, data anonymization involves blanking, hashing, or masking personal information, often by using fixed-length codes to represent data columns, or hiding data with altered values.

Your role in data anonymization

Organizations have a responsibility to protect their data and the personal information that data might contain. As a data analyst, you might be expected to understand what data needs to be anonymized, but you generally wouldn't be responsible for the data anonymization itself. A rare exception might be if you work with a copy of the data for testing or development purposes. In this case, you could be required to anonymize the data before you work with it.

What types of data should be anonymized?

Healthcare and financial data are two of the most sensitive types of data. These industries rely a lot on data anonymization techniques. After all, the stakes are very high. That’s why data in these two industries usually goes through de-identification, which is a process used to wipe data clean of all personally identifying information.

_A7hyXf1RbuO4cl39WW7IA_a44ff7152c9a4c518ef105d4717a66e3_Screen-Shot-2020-12-17-at-7 51 07-AM

Data anonymization is used in just about every industry. That is why it is so important for data analysts to understand the basics. Here is a list of data that is often anonymized:

  • Telephone numbers
  • Names
  • License plates and license numbers
  • Social security numbers
  • IP addresses
  • Medical records
  • Email addresses
  • Photographs
  • Account numbers

For some people, it just makes sense that this type of data should be anonymized. For others, we have to be very specific about what needs to be anonymized. Imagine a world where we all had access to each other’s addresses, account numbers, and other identifiable information. That would invade a lot of people’s privacy and make the world less safe. Data anonymization is one of the ways we can keep data private and secure!

Test your knowledge on data ethics and privacy

Question 1

Fill in the blank: _____ states that all data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data.

A. Openness

B. Currency

C. Transaction transparency

D. Privacy

Explain: Transaction transparency states that all data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data.

Question 2

A data analyst removes personally identifying information from a dataset. What task are they performing?

A. Data visualization

B. Data sorting

C. Data anonymization

D. Data collection

The correct answer is C. Data anonymization. Explain: They are performing data anonymization, which is the process of protecting people's private or sensitive data by eliminating identifying information.

Question 3

Before completing a survey, an individual acknowledges reading information about how and why the data they provide will be used. What is this concept called?

A. Privacy

B. Consent

C. Discretion

D. Currecy

The correct answer is B. Consent. Explain: This concept is called consent. Consent is the aspect of data ethics that presumes an individual’s right to know how and why their personal data will be used before agreeing to provide it.