Getting Started with YouTube Networks - strohne/Facepager GitHub Wiki

This Getting Started helps you to construct a network of related YouTube videos. In the first part, you will learn how to collect and export the needed data with Facepager. The second part contains an R script to prepare the data by creating a node list and an edge list. The third part shows how to visualize the network with Gephi.

The following software is needed for this tutorial:

If you never worked with these tools it will take you some time to get familiar. Eventually, you hopefully learn the basics about how to create all kinds of networks.

Part 1: Facepager

The following steps show how to extract the data for the network from YouTube and export it using Facepager. A network consists of nodes and the relations between these nodes, the edges. Think about how your network should look like before collecting the data. For a network of related videos you can, for example, decide whether the nodes should be the titles of the videos or the channels posting the videos. In the following example, the relationship between two videos is based on videos recommended by YouTube on the basis of the current video.

  1. Create a database: Click New Database in the Menu Bar to create a blank database.

  2. Login to YouTube: In the YouTube tab of the Query Setup click on Login to Google and login to get a valid Access Token. Notice: the Access Token is like a password to Google. Since it may be printed in the Status Log and saved in the application settings don't give anyone untrusted access to your computer or to the Status Log.

  3. Add nodes: As a starting point of the network, add a first YouTube video as a node. Add a YouTube video by clicking Add Nodes in the Menu Bar. You find the ID of the video in the last part of the URL (the part coming after "watch?v="). For example add O2t8EDyBPPs to get comments to the video https://www.youtube.com/watch?v=O2t8EDyBPPs. By the way: the video is very much related to this Getting Started, you should definitely watch it.

  4. Get information about video: In a first query, you can collect information about the video. This is required for creating the network later on, as this first video is one node of the network. Click on Presets in the Menu Bar and apply the YouTube preset "Get video statistics". You can adjust the parameter part by removing "contentDetails" and "statistics" in the right field, so that you only have "snippet" in there. This reduces the quota cost of your query. For more information , read the section "Quota usage" in the YouTube Data API Overview. Fetch the data by selecting the node and clicking Fetch data. Inspect the data by expanding your node or clicking Expand nodes in the Menu Bar. Select the child node and look at the raw data in the Data View to the right.

  5. Get related videos: In a second query, fetch the videos related to your first video. Load the preset "Get related videos". To fetch the data, click on the first node and set the Node level to 2 in the settings section. This will fetch the data for the child node, which was created in the first query. Click Fetch Data and inspect the related videos by expanding the node.

  6. Get related videos of the related videos: In a third query, fetch the videos related to the videos from your second query. Click on your first node and set the Node level to 3 in the settings section. This will apply your query on all the child nodes on the third level. Don’t change any other settings of the query. Click Fetch Data.

  7. Get further related videos: You can collect more related videos for deeper levels. Adjust the Node level in the settings section according to the level of the child nodes on the deepest level. Beware, the higher the node level, the more child nodes exist and the longer the query will take. You will reach the rate limit of the API at some point. In this case, you can either stop your data collection and export the data as it is or wait until midnight, as you have free requests every day.

  8. Export data: Expand the nodes and select all the nodes you want to export. Click Export Data to get a CSV file. Notice the options in the export mode field of the export dialog. You can open CSV files with Excel or any statistics software you like.

Part 2: R script

The following R script prepares the data for a network analysis by creating a nodes list and an edges list. These lists can be used in R or Gephi.

  1. Create a new RStudio project.
  2. Place the CSV file (related_videos.csv) into the project folder of your project.
  3. Create a new R script, paste the following code, then run the script to produce two new CSV files containing nodes and edges.
#
# Load packages and data ----
#

library(tidyverse)
videos <- read_csv2("related_videos.csv",na = "None")

#
# Prepare data ----
#

# filter out irrelevant rows  (filter)
# select relevant columns (select)
videos <- videos %>%
  filter(object_type == "data") %>%
  select(id,parent_id,object_id,snippet.title,level)

# Create edge list:
# - join parent row to every row  (left_join)
# - Select and rename columns  (select)
# - Remove duplicates  (distinct)
videos.edges <- videos %>%
  left_join(videos,by= c("parent_id"="id")) %>%
  select(Source=object_id.y,Target=object_id.x) %>% 
  distinct() %>% 
  na.omit()

# Create node list:
# - Select and rename columns (select)
# - Remove duplicates
videos.nodes <- videos %>%
  select(Id=object_id,Label=snippet.title) %>% 
  distinct()


#
# Save for Gephi ----
#

write_csv(videos.edges,"related_videos_edges.csv",na = "")
write_csv(videos.nodes,"related_videos_nodes.csv",na = "")

Part 3: Gephi

The following steps show you how to visualize the network of related videos and perform some basic network analysis.

  1. Import data: Click on Import spreadsheet in the section "Data Laboratory" to first load the nodes list and then the edges list. Navigate through the dialog box and make sure that the nodes list is imported as "Nodes table" and that the edges list is imported as "Edges table". Also select appending to existing workspace at the import report window.
  2. Organize network: You can see the randomly distributed nodes in the section "Overview". Choose the layout algorithm Force Atlas 2 for a visualization of the network. Click on Run to start the simulation. In this layout, connected nodes are positioned closer to each other. Play around with the settings, for example pull the nodes apart by a higher scaling or avoid overlapping nodes by clicking on Prevent Overlap. In order to further explore the network, calculate network measures in the "Statistics" section or change the colour or size of nodes in the "Appearance" section.

To get a more profound idea of the available features in Gephi, you find tutorials on YouTube about network visualization and analysis.

What is next?

Credits go to ChantalGrtnr!

⚠️ **GitHub.com Fallback** ⚠️