3.4.2.Securing data - quanganh2001/Google-Data-Analytics-Professional-Certificate-Coursera GitHub Wiki

Balancing security and analytics

The battle between security and data analytics

Data security means protecting data from unauthorized access or corruption by putting safety measures in place. Usually the purpose of data security is to keep unauthorized users from accessing or viewing sensitive data. Data analysts have to find a way to balance data security with their actual analysis needs. This can be tricky-- we want to keep our data safe and secure, but we also want to use it as soon as possible so that we can make meaningful and timely observations.

In order to do this, companies need to find ways to balance their data security measures with their data access needs.

kZCs4TZmRNmQrOE2ZkTZrg_aedb925fce3b47feb2918020a55a7d41_Screen-Shot-2020-12-18-at-1 08 57-PM

Luckily, there are a few security measures that can help companies do just that. The two we will talk about here are encryption and tokenization.

Encryption uses a unique algorithm to alter data and make it unusable by users and applications that don’t know the algorithm. This algorithm is saved as a “key” which can be used to reverse the encryption; so if you have the key, you can still use the data in its original form.

Tokenization replaces the data elements you want to protect with randomly generated data referred to as a “token.” The original data is stored in a separate location and mapped to the tokens. To access the complete original data, the user or application needs to have permission to use the tokenized data and the token mapping. This means that even if the tokenized data is hacked, the original data is still safe and secure in a separate location.

Encryption and tokenization are just some of the data security options out there. There are a lot of others, like using authentication devices for AI technology.

As a junior data analyst, you probably won’t be responsible for building out these systems. A lot of companies have entire teams dedicated to data security or hire third party companies that specialize in data security to create these systems. But it is important to know that all companies have a responsibility to keep their data secure, and to understand some of the potential systems your future employer might use.

Self-Reflection: Protecting your resources

Overview

BZoPGCbqS32aDxgm6mt9kg_bfbeb57eeb1743938a5041c2251e2ff0_line-y

Now that you have learned about the importance of data security, you can pause for a moment and think about what you are learning. In this self-reflection, you will consider your thoughts about data privacy, collaboration, and version control, then respond to brief questions.

This self-reflection will help you develop insights into your own learning and prepare you to apply your knowledge of data privacy to your experience with Kaggle. As you answer questions—and come up with questions of your own—you will consider concepts, practices, and principles to help refine your understanding and reinforce your learning. You’ve done the hard work, so make sure to get the most out of it: This reflection will help your knowledge stick!

Privacy

BZoPGCbqS32aDxgm6mt9kg_bfbeb57eeb1743938a5041c2251e2ff0_line-y

On Kaggle, you can upload your own datasets and keep them private. This means that they are visible and accessible by only you. You also have the option to add collaborators to your dataset, whom you can add as viewers or editors. Viewers are able to see your private dataset and editors are able to make changes to your private dataset.

You can share the link to your private dataset so anyone with the link is able to view it. If you don’t want this feature, you can disable it in the Settings tab of your dataset.

Note: If you have a private dataset on Kaggle and you choose to make it public, you will not be able to make the dataset private again. The only option you would have is to delete the dataset from Kaggle completely.

Collaboration

BZoPGCbqS32aDxgm6mt9kg_bfbeb57eeb1743938a5041c2251e2ff0_line-y

Any notebooks that you create on Kaggle are private by default. Like in datasets, you can add collaborators as viewers or editors. You can also make a notebook public, which will share it with the entire Kaggle community.

If you add collaborators to your Kaggle notebook, they can make changes to it. You want to make sure you communicate and coordinate with your collaborators because the last person who saves the notebook will overwrite all of the previous work. If you’d like more fine-grained control of changes to your code, a system like GitHub provides more version control.

Version control

BZoPGCbqS32aDxgm6mt9kg_bfbeb57eeb1743938a5041c2251e2ff0_line-y

As for version control, Kaggle has its own style of letting you keep records of your progress. You can read all of the details in this post, but think back to when you’ve done some work in a Kaggle notebook and clicked on the Save Version button.

When you clicked this button then clicked Save, you did it without changing anything. But you also have the option to add a short descriptive note about what changes you’ve made.

This can be helpful when you’ve made changes to your notebook but want to go back to an earlier version. To do this, go to Edit mode and click on the number next to the Save Version text at the top of your notebook.

_q8X46hcTySvF-OoXP8kqg_cfc1b1aa71324c149b947d4d2fea43f1_saveversion1

This will open a navigation bar on the right side of the screen and list out all of the versions of your notebook. When you click on different versions of your notebook, the left side of the screen will populate with the code and text from that version.

mfs78ZSBQyC7O_GUgUMg6w_c4a7d7dfc12b4a1f8986c6a19dd9aff1_viewversion

Then, once the version has run, your screen will appear like this:

_O0JZDklS5CtCWQ5JfuQaw_8a4b7acf91074e988919880e6d37abf1_Screenshot-2021-07-18-10 57 22-PM

From this screen you can also open the version in Viewer mode, pin a version as the default, or even change the version name. Pinning a version as the default can be helpful when you have a working version of your notebook available to the Kaggle community, but want to make changes and updates that might not work the first time you implement them. This allows you to safely make changes behind the scenes while sharing with the Kaggle community the most recent working version of your notebook.

Reflection

BZoPGCbqS32aDxgm6mt9kg_bfbeb57eeb1743938a5041c2251e2ff0_line-y

Consider what you learned about data security in Kaggle:

  • What are some cases in which you should use the privacy, collaboration, and version control features on Kaggle?
  • What other scenarios can you think of where you might want to pin a different version of your notebook other than the most recent version?

Now, write 2-3 sentences (40-60 words) in response to each of these questions. Type your response in the text box below.

Explain: Great work reinforcing your learning with a thoughtful self-reflection! A good reflection on this topic would include how and when you should apply your knowledge of data privacy and version control when working in Kaggle.

Understanding how to maintain privacy and record your progress with version control are essential skills for data analyst jobs, where you are often expected to collaborate with other analysts. Knowing about privacy standards and how to ensure effective collaboration will prevent you from exposing important data or losing precious work. Going forward, you can apply your knowledge of data security to other platforms or your future projects.

Test your knowledge on securing your data

Question 1

Fill in the blank: Data security involves using _____ to protect data from unauthorized access or corruption.

A. foldering

B. data validation

C. metadata

D. safety measures

The correct answer is D. safety measures. Explain: Data security involves using safety measures to protect data from unauthorized access or corruption.

Question 2

When using data security measures, analysts can choose between protecting an entire spreadsheet or protecting certain cells within the spreadsheet. True or False?

A. True

B. False

Explain: When using data security measures, analysts can choose between protecting an entire spreadsheet or protecting certain cells within the spreadsheet. Data security can be used to protect an entire spreadsheet, specific parts of a spreadsheet, or even just a single cell.

Question 3

What tools can data analysts use to control who can access or edit a spreadsheet? Select all that apply.

  • Sharing permissions
  • Encryption
  • Filters
  • Tabs

Explain: Data analysts use encryption and sharing permissions to control who can access or edit a spreadsheet.