7.1.2.Programming as a data analyst - quanganh2001/Google-Data-Analytics-Professional-Certificate-Coursera GitHub Wiki

Ways to learn about programming

Writing programming language code can be an exciting and rewarding experience. The programming field has a long history of people helping each other improve their skills and develop best practices. You will focus on the R programming language in this course, but in the future you might choose to pursue additional programming languages based on your interests and professional goals. This reading is a general guide to help you decide which programming languages are best suited for you.

rSmouYuXTv2pqLmLl1799Q_066fd98714e24f1e91522e87536cae58_Screen-Shot-2021-03-04-at-10 13 34-AM

Popular programming languages by profession

Let’s go through some potential job titles you might encounter and the most popular programming languages used in those professions. Also included is a list of additional resources for you to explore and learn more about each of the programming languages introduced.

Data analyst

A data analyst collects, transforms, and organizes data to draw conclusions, make predictions, and drive informed decision-making. The most popular programming languages used by data analysts are R and Python.

R offers convenient statistical features for data analysis and is useful for creating advanced data visualizations. Check out these resources to learn more about R:

Python is a general-purpose language that you can use to create what you need for data analysis. Here are a few resources to begin learning Python:

Kaggle is an online repository of various datasets that can be used in both R and Python. It's a robust platform that regularly hosts solution-based competitions using data sets in high-interest industries. Learners may also explore a vast trove of data modeling discussions, trending plug-in models, and useful code snippets. Here are some great resources to get started in Kaggle:

  • Datasets: explore and download a vast collection of data sets while up-voting your favorite collection.
  • Competitions: commit individually or collaborate in a team towards data competitions for the possibility of financial rewards. Even without winning the competitions, this is a great way to network with other analysts.
  • Learn: use this resource for an additional perspective on data visualization, linear regression techniques, or time series charting code.

Web designer

A web designer is responsible for the styling and layout of web pages containing text, graphics, and video. Web designers generally use Hypertext Markup Language v5 (HTML5) and Cascading Style Sheets (CSS) to create web pages.

HTML5 provides structure for web pages and is used to connect to hosting platforms. Learn more about HTML5 and CSS using these resources:

CSS is used for web page design and controls graphic elements (color, layout, and font) and page presentation on multiple devices (large screens, mobile screens, and printers). Check out these cheat sheets for CSS:

Mobile application developer

A mobile application developer uses programming to create applications used on laptops, mobile phones, and tablets. The most popular programming languages for mobile application developers are Swift, Java, and C#.

Swift (for Apple platforms) is an open source scripting language for macOS, iOS, watchOS, and tvOS. Its main goal is to make applications run faster. Browse these resources for more information about Swift:

  • Swift.org: an open source community with resources to learn how to use Swift, including videos and sample code
  • Swift developer site: an Apple developer website with information for developers who want to use Swift
  • Swift development resources: Apple’s collection of documentation, sample code, videos, and recommended books

Java (for Android devices) is the official language for Android development. The article I want to develop Android apps - which languages should I learn? explores some other languages used for Android development. Check out these resources for Java:

C# (pronounced C-sharp) is an object-oriented programming language that is widely used to create mobile apps in the .NET open source developer platform. Xamarin extends the .NET platform with a framework for developers to create cross-platform mobile apps for both iOS and Android. Here are a few resources to help you learn C#:

Web application developer A web application developer designs and develops network applications used across the web. The most popular programming languages used by web application developers are Java, Python, Ruby, and PHP.

Java is widely used to create enterprise web applications that can run on multiple clients. Java’s main strength is its “Write Once, Run Anywhere” (WORA) approach.Browse these resources to learn more about Java:

Python is a general-purpose programming language. Check out the Python resources listed in the data analyst section.

Ruby is a general-purpose, object-oriented programming language used for web application development. Ruby isn't the same as Ruby on Rails, which is an open source web application framework that runs using Ruby. Browse these resources to learn more about Ruby:

  • Ruby news: information about the latest Ruby releases and links to other resources
  • Ruby documentation: includes guides, tutorials, and reference material to help you learn more about Ruby
  • Ruby programmer’s guide: a tutorial and reference guide for Ruby
  • Learn Ruby from Codecademy: a website with free basic interactive lessons, and additional activities that can be accessed with a monthly subscription

PHP is a scripting language particularly suited for web application development. It was based on Perl, another programming language. PHP is simple, flexible, and relatively easy to learn. Check out these resources to learn more about PHP:

Game developer

A game developer is an application developer who specializes in video game creation. Game developers most commonly use the programming languages C# and C++.

C# is an object-oriented programming language that is widely used to create games. Check out the C# resources listed in the mobile application developer section.

C++ is an extension of the C programming language that is also used to create console games, like those for Xbox. Browse more information about C++:

Tips for learning programming languages

Here are a few tips to follow when you start learning a new programming language:

  • Define a practice project and use the language to help you complete it. This makes the learning process more practical and engaging.
  • Keep previous concepts and coding principles in mind. Many of these are transferable between programming languages. So, after you have learned one language, learning a second or third programming language tends to be much easier.
  • Create and keep good notes and cheat sheets in whatever format (handwritten or typed) that works best for you.
  • Create an online filing system for information that you can easily access while you work in various programming environments.

From spreadsheets to SQL to R

Although the programming language R might be new to you, it actually has a lot of similarities to the other tools you have explored in this program. In this reading, you will compare spreadsheet programs, SQL, and R to have a better sense of how to use each moving forward.

Spreadsheets, SQL, and R: a comparison

As a data analyst, there is a good chance you will work with SQL, R, and spreadsheets at some point in your career. Each tool has its own strengths and weaknesses, but they all make the data analysis process smoother and more efficient. There are two main things that all three have in common:

  • They all use filters: for example, you can easily filter a dataset using any of these tools. In R, you can use the filter function. This performs the same task as a basic SELECT-FROM-WHERE SQL query. In a spreadsheet, you can create a filter using the menu options.
  • They all use functions: In spreadsheets, you use functions in formulas, and in SQL, you include them in queries. In R, you will use functions in the code that is part of your analysis.

The table below presents key questions to explore a few more ways that these tools compare to each other. You can use this as a general guide as you begin to navigate R.

Key question Spreadsheets SQL R
What is it? A program that uses rows and columns to organize data and allows for analysis and manipulation through formulas, functions, and built-in features A database programming language used to communicate with databases to conduct an analysis of data A general purpose programming language used for statistical analysis, visualization, and other data analysis
What is a primary advantage? Includes a variety of visualization tools and features Allows users to manipulate and reorganize data as needed to aid analysis Provides an accessible language to organize, modify, and clean data frames, and create insightful data visualizations
Which datasets does it work best with? Smaller datasets Larger datasets Larger datasets
What is the source of the data Entered manually or imported from an external source Accessed from an external database Loaded with R when installed, imported from your computer, or loaded from external sources
Where is the data from my analysis usually stored? In a spreadsheet file on your computer Inside tables in the accessed database In an R file on your computer
Do I use formulas and functions? Yes Yes Yes
Can I create visualizations? Yes Yes, by using an additional tool like a database management system (DBMS) or a business intelligence (BI) tool Yes

Optional Hands-On Activity: Downloading and installing R

Activity overview

UWFf-U9hTzKhX_lPYX8yBw_8c2e9cd211e3479a89816c7b1816ab07_image4

Earlier in this course, you learned about R, a programming language used for statistical analysis, visualization, and other data analysis. In this activity, you’ll complete the steps to download and install R on your computer.

By the time you complete this activity, you will be able to use R without internet access and independent of the RStudio cloud-based suite. This will enable you to use R with more flexibility, which is important for programming effectively during your career as a data analyst.

Prepare for installation

UWFf-U9hTzKhX_lPYX8yBw_8c2e9cd211e3479a89816c7b1816ab07_image4

  • Note: This is an optional activity. RStudio Cloud (which has transitioned to the company Posit Cloud) is the primary tool you will use for this course, but you can also install R to your computer for offline use. Please keep in mind that Chrome OS does not support the installation of R. If you are completing this course on a Chromebook, you should skip this activity or refer to the Linux workaround linked below.

In order to get started, you need to know your operating system. Your operating system (OS) is the firmware that makes up your computer’s main interface. Some common OS’s include MacOS (Apple), Windows OS (Microsoft), Chrome OS (Google). The OS on your device determines which version of R you will install.

  • Note: If you use Chrome OS, you will need to enable Linux (Beta) in order to use R. This guide details how to install R on a Chromebook. Otherwise, you can use an online coding platform like RStudio Cloud or Kaggle.

Once you have determined your OS and the version of R it requires, it is time to download and install its assets.

Download R

UWFf-U9hTzKhX_lPYX8yBw_8c2e9cd211e3479a89816c7b1816ab07_image4

  1. Go to the R website and navigate to the download page on the Comprehensive R Archive Network. The download page brings you to a list of locations to download R.
  2. Click one of the “mirrors,” or download locations. This will bring you to a page with download links corresponding to each OS. Don’t worry about which mirror to pick--all of them host the same R installation files.
  3. Find your OS, click its corresponding link, and download the base package. The description should say “Binaries for base distribution.”
  4. Click the download link to begin downloading R.

Install R

UWFf-U9hTzKhX_lPYX8yBw_8c2e9cd211e3479a89816c7b1816ab07_image4

  1. Once your download is complete, open the downloaded file. This will open R.
  2. Select your preferred language from the drop-down menu. Then, click Next >.
  3. Review the license information for R for your OS. This describes its open-source availability, which means it may be modified and shared by the people who use it. Click Next >.
  4. Choose the install location for R. To pick an install location, click Browse and navigate to the folder you’d like to select. If you are not picky about where you want to install these files, the default location provided will be fine. Click Next >.
  5. Click the checkboxes for the appropriate files you need. For example, if you have a 64-bit system, only download those files. Click Next >.
  6. Select No for customizing your startup options. Click Next >. Then at the following screen, click Next >. You have now installed R to your computer.

Using R

UWFf-U9hTzKhX_lPYX8yBw_8c2e9cd211e3479a89816c7b1816ab07_image4

  1. Open R and locate the R Console. This is a window in which you can write and execute commands in R. Find the > symbol at the bottom of the console and click the empty space to the right of it.
  2. Enter a simple display command for your first command. Type print(“Hello world!”) into the command prompt. Press Enter (Windows) or Return (Mac) to show the result: [1] "Hello World!" Note that whenever you execute a command, R will give a number to each line of output that results.
  3. Enter a simple mathematical equation for your second command. Type 1+2 into the command prompt. Press Enter (Windows) or Return (Mac) to receive the answer, which is 3. Later in this course, you will practice more simple math in R.
  4. Enter a quit command for your last command. Type q() into the prompt and press Enter (Windows) or Return (Mac). The program will close.

Reflection

UWFf-U9hTzKhX_lPYX8yBw_8c2e9cd211e3479a89816c7b1816ab07_image4

In this activity, you downloaded and installed files for the R programming language. In the text box below, write 2-3 sentences (40-60 words) in response to each of the following questions:

  • What is an advantage of installing R instead of using it on an online platform?
  • How will learning R help you build your data analytics skills?

Explain: Congratulations on completing this hands-on activity! A good response would include that downloading and installing R is very helpful for flexible programming, as you won’t have to use an online client.

You can use R for a variety of analytical and mathematical processes, which are crucial to your future duties as a data analyst. The more you become familiar with R and how to use it, the more prepared you will be for any data analysis problem that comes your way.

Optional Hands-On Activity: R Console

Activity overview

UWFf-U9hTzKhX_lPYX8yBw_8c2e9cd211e3479a89816c7b1816ab07_image4

In the last activity, you downloaded and installed R. You can use the R environment and programming language to conduct data analysis and create visualizations. In this activity, you'll review the basics of working with the R Console and learn how to write and execute a basic command.

This will enable you to better understand the standard R interface. While you will use RStudio for most of the activities in this course, it is useful to know the basics of a programming interface as this will likely come up in your day-to-day work as a data analyst.

What is the R Console?

UWFf-U9hTzKhX_lPYX8yBw_8c2e9cd211e3479a89816c7b1816ab07_image4

  • Note: This is an optional activity. RStudio Cloud is the primary tool you will use for this course, but you can also install R to your computer for offline use. Please keep in mind that Chrome OS does not support the installation of R. If you are completing this course on a Chromebook, you should skip this activity or refer to the Linux workaround linked below.

The R Console is the program window in R where you make use of the R programming language. It is an interface that lets you view, write, edit, and execute your R code.

Programs like RStudio, an interactive development environment (IDE) for programming in R, use the R Console and other tools to make it easier to write and execute R code. In RStudio, the R Console is often referred to as the console pane (pictured below). It lets you perform any tasks you’d do in the R Console.

EweDoFLbTpyHg6BS256cJQ_205b67d297bf440ebc5abc74a4bff360_Screenshot-2021-03-10-6 23 14-PM---Display-2

However, as you start coding in R, it’s helpful to begin with the simplicity of just the R Console. During this hands-on activity, you’ll use the R Console to perform simple mathematical operations.

Use the R console

UWFf-U9hTzKhX_lPYX8yBw_8c2e9cd211e3479a89816c7b1816ab07_image4

  1. Open the R program to use the R Console on your computer. You will find that the console populates a default message. The message starts with R version and your version number, and ends with Type ‘q()’ to quit R. Above the message, you will find a menu with icons that represent the functions of the console and graphical user interface (known in the program as RGui).

_evzXeFGQ2er813hRkNniQ_b970aa76028749a69ff390d31af92fb6_Screenshot-2021-03-10-6 27 30-PM---Display-2

  1. Click in the blank space to the right of the > symbol at the bottom of the console.

This is the prompt, and anything you type after it will be read as executable R code when you press Enter (Windows) or Return (Mac). Keep in mind that everything you write in the R Console disappears after you end your session (or close the console). If you want to save the code you execute, it is better to save it in a text file or an .rmd file (which you will learn more about in upcoming lessons).

yZk9QlL7S-CZPUJS-8vgyQ_d222826cb6034031b67ce88e49197a00_Screenshot-2021-03-10-6 29 56-PM---Display-2

  1. Type citation() after the prompt and press Enter (Windows) or Return (Mac). This returns instructions for how to cite R in a publication. You don’t need to worry about this now, but it will be helpful if you ever use R in a research paper or article.

After you execute the line, the > prompt will generate again and you will be able to write a new line of R code. Now, write a mathematical operation. Start with simple addition by using the plus operator (+).

  1. Type 4, then a +, then the number 5. The text you type should look like: 4+5. Press Enter (Windows) or Return (Mac). The R Console will return the answer to this question, which is 9.
  2. On a new line, type 5-4 to use the subtraction operator (-). Press Enter (Windows) or Return (Mac) to execute the code and return the answer, which is 1.
  3. On a new line, use the multiplication operator () to multiply two numbers. Type **102** and then press Enter (Windows) or Return (Mac). This will execute the code and return the answer, which is 20.
  4. On a new line, use the division (/) operator to divide two numbers. Type 10/2 and then press Enter (Windows) or Return (Mac). This will execute the code and return the answer, which is 5.

Your R code and results should look like this:

qrMpIzlpReuzKSM5aaXrWw_6d8c3933512d4c84842d8f34706134d4_Screenshot-2021-03-10-6 33 05-PM---Display-2

Congratulations, you’ve written code in R! You can use R to complete mathematical operations, among many other useful data analysis tasks. This is just the beginning of your journey with writing in R.

Reflection

UWFf-U9hTzKhX_lPYX8yBw_8c2e9cd211e3479a89816c7b1816ab07_image4

In this activity, you used the R console to write some basic functions. In the text box below, write 2-3 sentences (40-60 words) in response to each of the following questions:

  • What does the R console teach you about programming in the R interface?
  • What is the difference between using the R console versus writing R code in a text file?

Explain: Congratulations on completing this hands-on activity! A good response would include that learning how to use the R Console and other R programming environments is fundamental to performing data analysis.

The R console is a simple environment in which you can write single lines of R code. It won’t save your code beyond a single session, but it is very valuable for running simple functions. In upcoming activities, you will use RStudio, an interactive development environment that builds on the simplicity of the R console.

Test your knowledge on programming languages

Question 1

Fill in the blank: Programming involves _____ a computer to perform an action or set of actions.

A. instructing

B. filtering

C. updating

D. training

The correct answer is A. instructing. Explain: Programming means giving instructions to a computer to perform an action or set of actions.

Question 2

What are the benefits of using a programming language to work with your data? Select all that apply.\

  • Save time
  • Clarify the steps of your analysis
  • Easily reproduce and share your work
  • Choose a business task for analysis

Question 3

The R programming language can be used for which of the following tasks? Select all that apply.

  • Visualization
  • Statistical analysis
  • Gaming
  • Data analysis

Explain: The R programming language can be used for statistical analysis, visualization, and data analysis.