Course 3‐4 - Forestreee/Data-Analytics GitHub Wiki
Google Data Analytics Professional
Prepare Data for Exploration
WEEK4 - Organizing and protecting your data
Good organization skills are a big part of most types of work, and data analytics is no different. In this part of the course, you’ll learn the best practices for organizing data and keeping it secure. You’ll also learn how analysts use file naming conventions to help them keep their work organized.
Learning Objectives
- Explain steps that can be taken to secure data.
- Discuss the use of file-naming conventions by data analysts.
- Describe best practices for organizing data.
Effectively organize data
Feel confident in your data
Up until now, we've focused on preparing your data for processing and analysis.
we'll explore another big part of that process, organizing and protecting your data. Keeping your data organized is important for a few reasons; it makes it easier to find and use, helps you avoid making mistakes during your analysis and helps to protect it.
Coming up, we'll go over the basics of organizing data for personal and professional use and file naming conventions. Then we'll take a look at some security features for spreadsheets. By the end of these next few videos, you'll be able to do all these things and you'll be able to explain these steps to stakeholders, so they can feel confident that your data practices are safe and secure. When you're ready to get started, go ahead to the next video. There we'll get started with organizing data for personal use.
Let's get organized
Whether you're organizing your personal data for your own use or organizing project data for work, there are certain procedures you want to follow to make sure your data is easy to find and use. We'll cover some best organization practices and also check out some different ways project data can be organized.
There are plenty of best practices you can use when organizing data, including naming conventions, foldering, and archiving older files.
We've talked about file naming before, which is also known as naming conventions. These are consistent guidelines that describe the content, date, or version of a file in its name.
Basically, this means you want to use logical and descriptive names for your files to make them easier to find and use.
Speaking of easily finding things, organizing your files into folders helps keep project-related files together in one place. This is called foldering. For example, all the files related to your vacation plan might go in the Vacation2025 folder. You might then break that folder down even further by creating subfolders like itinerary or photos, depending on what else you'd like to easily access.
It can also be useful to move old projects to a separate location to create an archive and cut down on clutter. It's so much easier to find and use my files when I name them something meaningful and searchable and when I organize them into folders. It makes all my data more accessible and useful.
In addition to these three best practices, there are two more things you'll want to consider when organizing data for work use. First, the project data you'll be using for work could be accessed and used by multiple people. It's important to align your naming and storage practices with your team to avoid any confusion. Your team might also develop metadata practices like creating a file that outlines project naming conventions for easy reference.
Secondly, you want to think about how often you're making copies of data and storing it in different places. Most importantly, because if data is stored in lots of different databases or spreadsheets, it can contradict itself and lead to mistakes later on. Also storing data in multiple places takes up a lot of space. Relational databases can help you avoid data duplication and store your data more efficiently. You can use these practices to organize data in different ways according to your project.
Let's look at some examples of data organization. I have some sample project folders here, each organized in a slightly different way. Let's open them up and see what they look like. We'll start with the high-level Finances folder. The Finances folder has been organized categorically. There are subfolders like budget, invoices, and payroll that represent different categories. There are subfolders like budget, invoices, and payroll that represent different categories. Let's click on "Invoices" to see what's in there. In the invoices folder, you can see that we have another set of subfolders labeled by year, 2014, 2015.... Looks like these are in chronological order. Sometimes the way files are organized can tell us how the data within those files is also organized.
Let's open a file to see if that's right. In the 2014 subfolder, there's a file with invoices from June.
If we open it, we can see that they've been organized by date, just like the folders. There's different ways to organize data depending on what you need it for.
The categorical organization of the subfolders and finances made it easy for me to go straight to the invoices, but the chronological organization of the invoices subfolder can help us find financial data from the exact date we're looking for.
Here's another way to think about it. Unorganized data is like a messy room. It's overwhelming, hard to find anything in, and gets worse the longer you avoid cleaning it up. But by making sure early on you know where to put your files, you can keep your work data organized, easy to use, and error-free.
Now that you see how important it is to keep data organized for both personal and work use, we'll take a closer look at file naming conventions and how they carry over into your databases. See you in the next video.
Organization guidelines
Best practices for file naming conventions Review the following file naming recommendations:
- Work out and agree on file naming conventions early on in a project to avoid renaming files again and again.
- Align your file naming with your team's or company's existing file-naming conventions.
- Ensure that your file names are meaningful; consider including information like project name and anything else that will help you quickly identify (and use) the file for the right purpose.
- Include the date and version number in file names; common formats are YYYYMMDD for dates and v## for versions (or revisions).
- Create a text file as a sample file with content that describes (breaks down) the file naming convention and a file name that applies it.
- Avoid spaces and special characters in file names. Instead, use dashes, underscores, or capital letters. Spaces and special characters can cause errors in some applications.
Best practices for keeping files organized Remember these tips for staying organized as you work with files:
- Create folders and subfolders in a logical hierarchy so related files are stored together.
- Separate ongoing from completed work so your current project files are easier to find. Archive older files in a separate folder, or in an external storage location.
- If your files aren't automatically backed up, manually back them up often to avoid losing important work.
All about file naming
So you've heard me mention the idea of using meaningful and logical file names to help organize your data. But using consistent file names can also streamline or even automate your analysis process, saving you time and energy in the long run.
When you use consistent guidelines that describe the content, date, or version of a file and its name, you're using file naming conventions. As we've already discovered, these file naming conventions help us organize, access, process, and analyze our data.
So here are some general tips on creating file naming conventions that are both logical and functional. Here's some quick file naming Do's. Work out your conventions early to avoid having to spend time redoing it later. Align your file naming with your team and make sure your file names are meaningful with references to the project name, creation date, revision version, or any other useful information needed to understand what's in that file.
Now, there's some other simple things you can do to make sure your file naming conventions are on point. First of all, you want to keep your file name short and sweet. They're supposed to be quick reference points that tell you what's in a file. From earlier videos, we know that we want to include dates and revision numbers in our file names. I recommend formatting it by year, month, and day because that follows the international date standard. Different countries have different date conventions, so keep that in mind. When you include revision numbers in a file name, lead with a zero, so that if you run into double digits of revisions, it's already built into your conventions. Another good rule is to use hyphens, underscores, or capitalized letters instead of using spaces. Spaces and special characters might not be recognized by your software. Plus avoiding spaces definitely makes it easier to work in SQL.
My last bit of advice: create a text file that lays out all your naming conventions on a project. This is really helpful if someone new joins your team, or if you just need a quick reminder while you're working on something.
We talked about this earlier when we covered metadata, which is data about data. It helps explain what data there is and how it's being organized. When you use consistent, meaningful file naming conventions throughout your project, your data will be easy to find and use, and you can save yourself time, too. Up next, we'll keep looking at spreadsheets and we'll talk about security features and how you can use them to protect your data now that it's organized.
Learning Log: Review file structure and naming conventions
Access your learning log Link to learning log template: Review file structure and naming conventions edit ver
Reflection
- Why are file structure and naming conventions so important? What are the consequences of poor organization for data analysts at work?
- How would you structure folders and files? What naming conventions would you use?
- What appeals to you about these choices?
Effective naming and organization methods
- Good file names follow a naming convention and reference the project; date or version number; and include hyphens, underscores, or capital letters.
- A file’s project and date or version must be clear and free of spaces or special characters.
- Backing up files, creating a hierarchical folder structure, and keeping ongoing work separate are all effective ways to keep files organized.
- Good file organization includes making it easy to find current, related files that are backed up regularly.
Practice Quiz: how to organize data
Securing data
Security features in spreadsheets
Okay, now that our data's organized and easy to find, it's time to start thinking about how to protect it. The good news is that spreadsheets come with security features already built in.
We'll look at different spreadsheet programs and how their security features, like sheet protections and access control, are similar. When I say "security features," you might be imagining ways to protect data from other people. But that's just one kind of security.
Security features can be designed to keep unauthorized users from viewing certain files, or just lock your worksheets so that you don't accidentally break your formulas. This is called data security.
Data security means protecting data from unauthorized access or corruption by adopting safety measures.
Whatever spreadsheet program you're using will have similar security measures built in. As a data analyst, you'll run into Google Sheets and Excel a lot. Let's talk about what they have in common.
First, both programs have features that let you protect your spreadsheets or parts of your spreadsheets from being edited, from the entire worksheet down to single cells in a table. If you're collaborating with other users, you can easily lock down your formulas so that they aren't accidentally broken.
Because these programs are located in different places, these features are slightly different. For Excel spreadsheets, you can encrypt files and worksheets with passwords before emailing them to other users. In Google Sheets, these settings are found under the sharing menu, which allows you to control who can see or edit the sheet online. Google Sheets can also be copied so that users can work with that data without altering the original.
Tabs can also be hidden and unhidden in Sheets and Excel, allowing you to change what data is being viewed. But remember, even hidden tabs can be unhidden by someone else, so be sure you're okay with those tabs still being accessible.
As a data analyst, data security will be a priority. But no matter which program you use to create spreadsheets, there are security features to help you keep your work safe and secure. There are some other basic best practices you can take to keep your data more secure overall, which we'll cover later in a reading.
You've made it to the end of this module. Congrats. In these videos, we've covered strategies for organizing data for personal and work use, how to develop functional file naming conventions, and some security measures you can take advantage of in spreadsheets.
Before you move on to the next step in the data analysis lifecycle. It's important that you make sure your data is prepared, and that includes organizing and securing it. As usual, after this video, you'll have your weekly challenge.
I know you've got this. Then after the weekly challenge, there's some optional material all about connecting to the online data community. As you start building your career in data analytics, it'll be really valuable to connect with others, learn about new trends in the field and share your own work.
I think you'll get a lot out of those videos. That'll help you develop a professional online presence and find ways to communicate with people in your field, which is key as networking becomes more and more online and remote work opportunities become the norm. But if you feel pretty confident about your online presence, you can move into the course challenge instead. Good luck on this weekly challenge, and I'll see you soon!
Balancing security and analytics
The battle between security and data analytics Data security means protecting data from unauthorized access or corruption by putting safety measures in place. Usually the purpose of data security is to keep unauthorized users from accessing or viewing sensitive data. Data analysts have to find a way to balance data security with their actual analysis needs. This can be tricky-- we want to keep our data safe and secure, but we also want to use it as soon as possible so that we can make meaningful and timely observations.
In order to do this, companies need to find ways to balance their data security measures with their data access needs.
Luckily, there are a few security measures that can help companies do just that. The two we will talk about here are encryption and tokenization.
Encryption uses a unique algorithm to alter data and make it unusable by users and applications that don’t know the algorithm. This algorithm is saved as a “key” which can be used to reverse the encryption; so if you have the key, you can still use the data in its original form.
Tokenization replaces the data elements you want to protect with randomly generated data referred to as a “token.” The original data is stored in a separate location and mapped to the tokens. To access the complete original data, the user or application needs to have permission to use the tokenized data and the token mapping. This means that even if the tokenized data is hacked, the original data is still safe and secure in a separate location.
Encryption and tokenization are just some of the data security options out there. There are a lot of others, like using authentication devices for AI technology.
As a junior data analyst, you probably won’t be responsible for building out these systems. A lot of companies have entire teams dedicated to data security or hire third party companies that specialize in data security to create these systems. But it is important to know that all companies have a responsibility to keep their data secure, and to understand some of the potential systems your future employer might use.
Self-Reflection: Protecting your resources
Reflection
Practice Quiz: securing your data