7.1.1.Learn programming using R - sj50179/Google-Data-Analytics-Professional-Certificate GitHub Wiki

The R vs. Python debate

People often wonder which programming language they should learn first. You might be wondering about this, too. This certificate teaches the open-source programming language, R. R is a great starting point for foundational data analysis, and it has helpful packages that beginners can apply to projects. Python isn’t covered in the curriculum, but we encourage you to explore Python after completing the certificate. If you are curious about other programming languages, make every effort to continue learning.

Any language a beginner starts to learn will have some advantages and challenges. Let’s put this into context by looking at R and Python. The following table is a high-level overview based on a sampling of articles and opinions of those in the field. You can review the information without necessarily picking a side in the R vs. Python debate. In fact, if you check out RStudio’s blog article in the Additional resources section, it’s actually more about working together than winning a debate.

Languages R Python
Common features - Open-source
- Data stored in data frames
- Formulas and functions readily available
- Community for code development and support
- Open-source
- Data stored in data frames
- Formulas and functions readily available
- Community for code development and support
Unique advantages - Data manipulation, data visualization, and statistics packages
- "Scalpel" approach to data: find packages to do what you want with the data
- Easy syntax for machine learning needs
- Integrates with cloud platforms like Google Cloud, Amazon Web Services, and Azure
Unique challenges - Inconsistent naming conventions make it harder for beginners to select the right functions
- Methods for handling variables may be a little complex for beginners to understand
- Many more decisions for beginners to make about data input/output, structure, variables, packages, and objects
- "Swiss army knife" approach to data: figure out a way to do what you want with the data

Additional resources

For more information on comparing R and Python, refer to these resources:

R versus Python, a comprehensive guide for data professionals

  • This article is written by a data professional with extensive experience using both languages and provides a detailed comparison.

R vs Python for Data Analysis — An Objective Comparison

Key takeaways

Certain aspects make some programming languages easier to learn than others. But, that doesn’t make the harder languages impossible for beginners to learn. On the flip side, a programming language’s popularity doesn’t always make it the best language for beginners either.

R has been used by professionals who have a statistical or research-oriented approach to solving problems; among them are scientists, statisticians, and engineers. Python has been used by professionals looking for solutions in the data itself, those who must heavily mine data for answers; among them are data scientists, machine learning specialists, and software developers.

As you grow as a data analytics professional, you may need to learn additional programming languages. The skills and competencies you learn from your first programming experience are a good foundation. That's why this course focuses on the basics of R. You can develop the right perspective, that programming languages play an important part in the data analysis process no matter what job title you have.

The good news is that many of the concepts and coding principles that you will learn from using R in this course are transferable to other programming languages. You will also learn how to write R code in an Integrated Development Environment (IDE) called RStudio. RStudio allows you to manage projects that use R or Python, or even a combination of the two. Refer to RStudio: A Single Home for R & Python for more information. So, after you have worked with R and RStudio, learning Python or another programming language in the future will be more intuitive.

For a better idea of popular programming languages by job role, refer to Ways to learn about programming. The programming languages most commonly used by data analysts, web designers, mobile and web application developers, and game developers are listed, along with links to resources to help you start learning more about those languages.

Data analyst

A data analyst collects, transforms, and organizes data to draw conclusions, make predictions, and drive informed decision-making. The most popular programming languages used by data analysts are R and Python.

R offers convenient statistical features for data analysis and is useful for creating advanced data visualizations. Check out these resources to learn more about R:

Python is a general-purpose language that you can use to create what you need for data analysis. Here are a few resources to begin learning Python:

From spreadsheets to SQL to R

Although the programming language R might be new to you, it actually has a lot of similarities to the other tools you have explored in this program. In this reading, you will compare spreadsheet programs, SQL, and R to have a better sense of how to use each moving forward.

Spreadsheets, SQL, and R: a comparison

As a data analyst, there is a good chance you will work with SQL, R, and spreadsheets at some point in your career. Each tool has its own strengths and weaknesses, but they all make the data analysis process smoother and more efficient. There are two main things that all three have in common:

  • They all use filters: for example, you can easily filter a dataset using any of these tools. In R, you can use the filter function. This performs the same task as a basic SELECT-FROM-WHERE SQL query. In a spreadsheet, you can create a filter using the menu options.
  • They all use functions: In spreadsheets, you use functions in formulas, and in SQL, you include them in queries. In R, you will use functions in the code that is part of your analysis.

The table below presents key questions to explore a few more ways that these tools compare to each other. You can use this as a general guide as you begin to navigate R.

Key Question Spreadsheets SQL R
What is it? A program that uses rows and columns to organize data and allows for analysis and manipulation through formulas, functions, and built-in features A database programming language used to communicate with databases to conduct an analysis of data A general purpose programming language used for statistical analysis, visualization, and other data analysis
W​hat is a primary advantage? Includes a variety of visualization tools and features A​llows users to manipulate and reorganize data as needed to aid analysis P​rovides an accessible language to organize, modify, and clean data frames, and create insightful data visualizations
Which datasets does it work best with? Smaller datasets Larger datasets Larger datasets
What is the source of the data? Entered manually or imported from an external source Accessed from an external database Loaded with R when installed, imported from your computer, or loaded from external sources
Where is the data from my analysis usually stored? In a spreadsheet file on your computer Inside tables in the accessed database In an R file on your computer
Do I use formulas and functions? Yes Yes Yes
Can I create visualizations? Yes Yes, by using an additional tool like a database management system (DBMS) or a business intelligence (BI) tool Yes

When to use RStudio

As a data analyst, you will have plenty of tools to work with in each phase of your analysis. Sometimes, you will be able to meet your objectives by working in a spreadsheet program or using SQL with a database. In this reading, you will go through some examples of when working in R and RStudio might be your better option instead.

Why RStudio?

One of your core tasks as an analyst will be converting raw data into insights that are accurate, useful, and interesting. That can be tricky to do when the raw data is complex. R and RStudio are designed to handle large data sets, which spreadsheets might not be able to handle as well. RStudio also makes it easy to reproduce your work on different datasets. When you input your code, it's simple to just load a new dataset and run your scripts again. You can also create more detailed visualizations using RStudio.

When RStudio truly shines

When the data is spread across multiple categories or groups, it can be challenging to manage your analysis, visualize trends, and build graphics. And the more groups of data that you need to work with, the harder those tasks become. That’s where RStudio comes in.

For example, imagine you are analyzing sales data for every city across an entire country. That is a lot of data from a lot of different groups–in this case, each city has its own group of data.

Here are a few ways RStudio could help in this situation:

  • Using RStudio makes it easy to take a specific analysis step and perform it for each group using basic code. In this example, you could calculate the yearly average sales data for every city.
  • RStudio also allows for flexible data visualization. You can visualize differences across the cities effectively using plotting features like facets–which you’ll learn more about later on.
  • You can also use RStudio to automatically create an output of summary stats—or even your visualized plots—for each group.

As you learn more about R and RStudio moving forward in this program, you’ll get a better understanding of when RStudio should be your data analysis tool of choice.

For more information

  • The Advantages of RStudio: This web page explains some of the reasons why RStudio is many analysts’ preferred choice for interfacing with R. You’ll learn about the advantages of using RStudio for data analysis, from ease of use to accessibility of graphics and more.
  • Data analysis and R programming: This online introduction to data analysis and R programming is a good starting point for R and RStudio users. It also includes a list of detailed explanations about the advantages of using R and RStudio. You’ll also find a helpful guide for getting set up with RStudio.

Connecting with other analysts in the R community

R is a powerful tool in your data analysis toolkit–and it also has a powerful community of users who are excited to share, collaborate, and connect with others. This reading will give you a few places where you can start to connect, online and in-person, with other analysts in the R community.

Online communities

Online communities allow you to connect with other R users no matter where you live. This list includes forums and discussion channels where you can join the conversation. It also includes social media tags you can use on your existing social media platforms to connect with other data analysts.

  • RStudio Community: The RStudio Community forum is a great place to get help and find solutions to challenges you have with R–and maybe help someone else out, too!
  • r/RLanguage: The R language subreddit is an active online community on the social media platform Reddit, where R users go to discuss R, ask questions, and share tips.
  • rOpenSci: rOpenSci has a community forum where R users can ask questions and search for solutions. It also includes links to their Best Practices guide and support pages.
  • R4DS Online Learning Community and Slack channel: This is a community with another Slack channel where R learners and mentors can gather and connect. This is a great place to chat about using R for data science.
  • Twitter #rstats: If you use Twitter, you can connect with other R users using the hashtag #rstats; a lot of R developers and analysts are active on Twitter.

Meetups

Many organizations host both in-person and online meetups for R users; you should always practice caution and be safe whenever attending meetups in-person.

  • Local Data Analytics meetups: These meetups are a great way to meet other people who are interested in data analytics and build your network. These meetups are location-based, so you can connect with other data analysts in your area.
  • R User Groups: This list contains links to regional R communities, including subreddits and meetup groups. This is a useful resource if you are interested in finding R users in your area.
  • RLadies Meetups: These are in-person and virtual meetups specifically for R enthusiasts who identify as underrepresented or marginalized. These meetups are also location-based and can help you connect with other data analysts in your area.

R can be tricky to learn, but luckily there is a strong community of R users who are interested in working together and helping each other out. These resources are a good starting point if you want to begin connecting with the larger data analyst community, so take advantage of them!

Test your knowledge on programming with RStudio

TOTAL POINTS 3

Question 1

What type of software application is RStudio?

  • Integrated development environment
  • Data visualization tool
  • Source editor
  • Database

Correct. RStudio is a type of software application known as an integrated development environment (IDE). An IDE brings together all the tools you may want to use in a single place.

Question 2

RStudio includes which of the following panes? Select all that apply.

  • Command pane
  • R console pane
  • Source editor pane
  • Environment pane

Correct. RStudio includes an R console pane for executing commands, a source editor pane for writing code, and an environment pane for managing loaded data.

Question 3

If you write code directly in the R source editor, RStudio can save your code when you close your current session.

  • True
  • False

Correct. If you write code directly in the R source editor, RStudio can save your code when you close your current session.

⚠️ **GitHub.com Fallback** ⚠️