6.2.3.Optional: Work with multiple data sources - quanganh2001/Google-Data-Analytics-Professional-Certificate-Coursera GitHub Wiki

Hands-On Activity: Practice linking data in Tableau

Activity overview

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

The video you just watched showed you how to make and visualize JOINs in Tableau. Now, you can use the datasets and instructions in this activity to perform the JOINs yourself. Feel free to refer back to the previous video if you get stuck.

In earlier activities, you worked in Tableau to create a data visualization. In this activity, you will review a scenario, link different data sources in Tableau, and create visualizations using multiple datasets.

By the time you complete this activity, you will be able to make visualizations out of data from multiple sources. This will enable you to visualize comparisons and combinations of data, which will allow you to share more complex projects in your career as a data analyst.

What you will need Click the link to create a copy of the datasets and download them. Note that upon opening the CO2 dataset below, you will immediately see the About page load with a single URL code at top. Navigate to the bottom of the document, and you can access all the raw and cleaned CO2 data by clicking on the different sheets. All of these sheets will be accessed by the Tableau platform once loaded. If you don’t have a Google account, download the datasets directly from the attachments below.

Link to datasets: CO2, energy, total population, and gdp total

OR

Download datasets:

1. CO2

2. Energy data

3. totalpopulation

4. gdptotal

The scenario

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

Imagine you are working as a data analyst at a policy research institute. For your current project, you need to create a visualization that shows the CO2 emissions per capita for each country from 2000-2011. You need to provide a visual presentation that not only allows someone to visually compare CO2 emissions between countries from year to year, but also provides information about each county’s population, GDP, and energy use.

You already have a dataset that includes emissions for each country between the years 1960 - 2011. But, the information that you need on energy use, total population, and GDP you had to collect from a government website. Each dataset is in a separate file. Moreover, some of the information is missing for some countries.

Often you will work with datasets that are missing information. Whether or not you need to find this missing information will depend on your project. In this case, you will notice that the missing information is from the 1960s, 1970s, and 1980s.

Luckily, your project is only concerned with the data from 2000-2011. You need an efficient way to utilize some data from one source, and some data from other sources. Taking just the information that you need from each source and creating a new data source takes a lot of time.

Tableau allows you to link data from different sources, as well as import data from different formats. While you won’t be working with one in this assignment, Tableau allows you to use a Web Data Connector. This tool allows you to import the data you need directly from another site. Your visualizations will update when the data sources for your visualization are updated.

Load the data

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

  1. Log in to Tableau Public.
  • Note: Tableau frequently updates its user interface. The latest changes may not be reflected in the screenshots, but the principles in this activity remain the same. Adapting to changes in software updates is an essential skill for data analysts, and we encourage you to practice troubleshooting. You can also reach out to your community of learners on the discussion forum for help.
  1. Go to your profile and click Create a Viz.
  2. From the Connect to Data window, go to the Files tab and open the CO2 dataset you downloaded earlier.
  3. From the Data Source tab on the bottom of the interface, go to the Connections header at the top of the left-side column.

nNHLrn4KQZKRy65-CvGSfg_025c7ba9d0684a3f8b184e7ff68d21f1_findme

  1. Click the + icon to add another data source. Start with the energy dataset.
  2. Repeat step 5 for the other datasets, gdptotal and totalpopulation.

Now, you should have all four datasets loaded into Tableau. The datasets will be on the left-hand side of your screen under Connections.

Note: As you progress through the activity, make sure to save your progress by clicking File, then Publish or the Publish button in the top right corner of the screen. If you are asked to "Create an Extract," do so. It may take some time to create an extract of the data you are using in this activity, but it is essentially the same as saving your progress.

You’ll notice that Tableau has already added one of the data sources into the area Multiple Connections. In the screenshot below, Energy is already loaded.

xOBw5gwXRAmgcOYMF-QJIw_148b82778dfe41da86d07eba9d2629da_DAC6M2L4SR1-SS2 0

If one of the datasets has already been loaded in, you can remove it by dragging the box to the left-hand side (the grey area) of the screen. Click on the box labeled Energy in the center-top of the screen and drag it off to the left to delete it.

Once you remove the data from Multiple Connections, it should appear similar to the image below.

ElL7gKUVQbqS-4ClFfG6LQ_30e09cb76f15476187ebc8aa77fa52e1_DAC6M2L4SR1-SS2

Make connections with JOINs

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

Now, you’ll set up the connections between the different datasets by creating JOINs between them. You learned about JOINs in the previous course.

As a refresher: INNER JOINs and OUTER JOINs are types of relationships that can be used to combine data based on common columns of information.

In Tableau, you’ll notice that the former JOINs window has now become a multi-purpose Relationships window. By double-clicking your data tables, you can edit the JOINs instead of the relationships.

Follow these instructions to create JOINs in Tableau:

  1. Click on CO2 under Connections.
  2. Under Sheets, you will notice all the different sheets in the CO2 dataset. Find CO2 Data Cleaned and double-click on it to load it.
  3. Hover your cursor over the right side of the CO2 Data Cleaned box and click on the arrow.
  4. Select Open to open the CO2 Data Cleaned dataset. Make sure you complete this step. This allows you to change the physical table, which will allow you to create JOINs. Otherwise, you will only be able to edit Relationships. Usually, you could use either option to accomplish the same goal. But for the purposes of this activity, we specifically want to use JOINs.

Your screen should appear similar to the screenshot below.

UqyTapQ1QhGsk2qUNfIRtw_a7c3d0cc5b9b459d869288ecf0edc1f1_Screenshot-2021-06-06-4 58 30-PM

  1. Click on the energy dataset under Connections.
  2. Drag the energy sheet across to the CO2 Data Cleaned box under Multiple Connections. A pop-up window for a Join will appear.
  3. The popup window may automatically populate with Year from CO2 Data Cleaned and Year1 from Energy. If not, put Year on the left side of the chart and Year1 on the right side.
  4. Click on Add new join clause under Year. A dropdown menu will appear.
  5. Select Country Name on the left side and Country on the right side.
  6. Click the X to close the dropdown menu.

lIBshvhNTneAbIb4TS53fw_83ad47273215482bbc58083435fe6df1_join-table

  1. Click Update now to examine the dataset. You will notice that Year and Year1 have a number sign above them. Change the data type to date for each of these columns.
  2. In the column, Year click on the # (not the arrow next to it) and select Date from the available options.

After completing the first field, you will notice a red exclamation mark between CO2 Data Cleaned and Energy. This indicates that the columns you have joined are no longer of the same data type. One is formatted as date, and the other numeric.

You will also notice that after changing Year (CO2 Data Cleaned) to a Date type, the data preview pane will no longer display properly.

  1. To fix this, go to the column list in the lower left of the screen.

yvCRj6NvQy2wkY-jb7MtiA_96788660bb1c467996d92db0be510ff1_tableaujoinfix1

  1. Select the # icon next to the Year1 (Energy) column. Then, change the data type to Date.

dzDttM_NTdKw7bTPzY3Sww_72b17dfc6d94400ba493f9cd473b7cf1_tableaujoinfix3

You may need to click Update Now in the preview pane to display the data properly. Make sure to repeat this step when you change more data types later on in this exercise.

The red exclamation mark will disappear. You might notice that all the years have been put into a month/day/year form with the default month and day as January 1st. This will not create any problems when creating a visualization, as you will filter the data by year.

Connect additional datasets

Before adding any additional joins, the data type for Year(Gdptotal) needs to change.

A pop-up window will appear for the join. It might already be populated with Year1 under Datasource and Year(Gdptotal) under gdptotal.

  1. Click on gdptotal under Connections.
  2. Under Sheets, drag the gdptotal sheet into the white space underneath the energy box.
  3. Go to the column list in the lower left of the screen, scroll until you find the column Year(Gdptotal). Click on # above it. A drop-down menu will appear.
  4. Select Date from the drop-down menu.

If the data preview does not display properly, fix the date type in the lower left pane.

  1. Click on the Venn diagram between energy and gdptotal. Click on Add new join clause under year. A drop-down menu will appear.
  2. Under CO2 Data Cleaned click on Country Name.

K4_JxDVaTFKPycQ1WrxSYw_9c604f9425d64f8fa2f23fc2f73883f1_realdropdown

  1. Click on the empty field under gdptotal across from Country Name. A dropdown menu will appear.
  2. Set the right side of the join statement to Country1.
  3. Close the Join pop-up by clicking on its exit button.

Now you are going to join totalpopulation, the last of the four datasets that you downloaded.

  1. Click on totalpopulation under Connections.
  2. Under Sheets, drag the totalpopulation sheet into the white space to the right of the energy and gdptotal boxes.

A pop-up window will appear for the join. It should already be populated with Year under Datasource and Year(totalpopulation) under totalpopulation.

  1. Go to the column list in the lower left of the screen, scroll until you find the Year(totalpopulation) column. Click on # above it. A drop-down menu will appear.
  2. Select Date from the drop-down menu.

If the data preview does not display properly, fix the date type in the lower left pane.

  1. Click on the Venn diagram to the left of totalpopulation. Click on Add new join clause under Year. A drop-down menu will appear.
  2. Under CO2 Data Cleaned click Country Name.
  3. Click on the empty field under totalpopulation across from Country Name. A dropdown menu will appear.
  4. Click Country (totalpopulation).
  5. Close the Join pop-up by clicking on its exit button.
  6. Click the Update button to view your data columns.

Congratulations! You have successfully joined four different sources of data.

You should take some time to study your dataset. The only years you should notice in your dataset are between 2000-2011. While your dataset CO2 went from 1960-2011, and your other datasets went from 2000-2015, the intersection (the years they have in common) only includes 2000-2011. This is just the time span that you need.

If the dataset had gone beyond those dates, you would have filtered out the unneeded years in your visualization.

Reviewing the dataset, you may have noticed that some of your measurement values need to be changed. The data type for the column Energy use is listed as string data. You can tell this because of the Abc icon above the name. The column currentGDP is also formatted as type string.

  1. Find the Abc icon above the Energy use column. Change it to Number (decimal).
  2. Find the Abc icon above the currentGDP column. Change it to Number (whole).

If the data preview does not display properly, fix the date type in the lower left pane.

Create a visualization

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

At the bottom of your screen, you will notice a tab labeled Sheet 1.

  1. Click on the tab Sheet1.
  2. Drag Country Name under CO2 Data Cleaned into the Detail square.
  1. Drag CO2 Per Capita to Color.
  2. Click on Color, then Edit Colors.
  3. Click on the Palette dropdown and change it from Automatic to Red-Green diverging.
  4. Check the boxes for Stepped Color and Reversed. (Because green is generally viewed as positive for CO2 emissions, you want the colors to move towards red as emissions go up.)
  5. Click the Show Advanced dropdown.
  6. Check the Start and End boxes.

You might have noticed that the legend on the right-hand side of the screen shows Sum(CO2 per capita). You need to change the start and end values in order to notice color contrasts showing red shades.

The lowest CO2 Per Capita emission for any year is 0.0396 and the largest is 61.9898.

  1. Enter 0 into the Start field, and 62 into the End field. Click OK. Click the X button.

Note: These values are the highest and lowest emissions between 2000-2011. Your screen should now look like this:

Dy9EYpbvTJ6vRGKW7xye1g_d2d01901c6d94693a60ce96acbbdadf1_rgdiv

  1. Drag Year from under CO2 Data Cleaned into the Filters area.

  2. Click on Years, Next, All, OK.

BOdEa3alR5qnRGt2pfeanQ_388635cd400f478598eaa8c36d59269c_image

  1. In the Filters box, right-click on YEAR(Year)

  2. Select Show Filter. The filter will appear on the right side of the screen.

  3. Click on the arrow to the right of YEAR(Year) on the far-right side of the screen.

  4. Select Single Value (dropdown). Now the areas are colored only for the values of each year. Use the checkboxes in the list to choose which years you want to include in the visualization. You can select only the years between 2000-2011 to view the emissions relevant to the scenario.

Congratulations! You've linked your data and made a comprehensive data visualization in Tableau.

Final Note: It is important to understand that you are not required to submit any exported dashboards or visualizations from the Tableau Public interface to move forward in this course. Your reflection answer is her for personal guidance and growth. Moreover, it is highly suggested that you export any visualizations from your gallery as a PDF or embedding the interactive dashboard for your portfolio.

For more information on exporting and embedding visualizations, please refer to the section titled, Tableau Public: Export.

Reflection

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

In this activity, you used Tableau to link data together and visualize it. In the text box below, write 2-3 sentences (40-60 words) in response to each of the following questions:

  • What did linking data from multiple sources allow you to do with your visualization in Tableau?
  • What other kinds of datasets could you link to the four you used in this activity? What kinds of comparisons or insights could you make?
  • If you couldn’t link data in this way, how would you make complex comparative datasets and visualizations like this?

Explain: Congratulations on completing this hands-on activity! A good response would include that linking data allows you to combine different features of multiple datasets without having to create a new dataset as you visualize comparisons and combinations of data.

With Tableau and other visualization software, you can simplify the process of combining and visualizing data. Otherwise, you would need to select the information you need and create a new data source, which takes a lot of time. This simplified process will allow you to share more insights with your peers and stakeholders throughout your career as a data analyst.

Tableau resources for combining multiple data sources

Now that you have some experience working with data in Tableau, you are ready to start doing more, including combining multiple data sources. This reading will provide you with some how-to guides for that, and other helpful resources you can use as you practice using Tableau on your own.

Resource Description
Set up data sources This page links to other resources explaining how to set up your data sources and prepare them for analysis once you have connected them to your Tableau account. It specifically includes articles explaining how to join or blend data, and what a union is and how they work. This is a great starting point as you get ready to begin using and combining data sources.
Join your data Joining refers to the process of combining data sources based on common fields. This article gives a more detailed explanation of the different joins, how to use them in Tableau, and an example join with a step-by-step guide.
Don't be scared of relationships Relationships allow you to combine multiple data sources in Tableau. This is a more flexible alternative to joins, and doesn’t force you to create one single table with your multiple data sources. This article will give you more insight into how relationships work.
How relationships differ from joins This article goes into more detail about the differences between using relationships and joins , and guides you through the process of using relationships to combine data.
Blend your data Data blending is another method you can use to combine multiple data sources. Instead of truly combining the data, blends allow you to query and aggregate data from multiple sources. This resource goes into more detail about blending and includes a tutorial.
Combining multiple date fields This resource provides examples that explain how to combine date fields when using four different methods of data combination in Tableau.

These are just a few resources you can use. You can also find more information online or in the Tableau community forums .