Nodes - strohne/Facepager GitHub Wiki

Nodes are the objects of the data collection. This can be any object returned by an API – such as Facebook posts, Twitter tweets or YouTube comments. When doing webscraping nodes can be links or media files. Usually one object corresponds to one row. Here the terms row, object and node all refer to the same concept. We apologize for being somewhat unclear.

In Facepager, there are four different kinds of nodes. You can see the node type in the Object Type column:

Object type Explanation
seed Seed nodes were manually added by you using the Add nodes button.
data The data returned from the API or from webscraping is sliced into single data nodes. For example, a data node is created for each tweet.
empty Requesting data for the chosen node did not return any errors, but the result was empty.
offcut The remaining part of the returned data, after slicing data nodes, is put into an offcut node. Here you find, for example, data about the pagination. If you fetch multiple pages using the Maximum pages setting, one offcut node is created for each requested page.
unpacked You can slice your data using the Extract data function. The created nodes have the object type "unpacked".

For example, if you want to collect the list of Twitter followers from the accounts "TheAcademy", "HBO" and "goldenglobes", these accounts are your nodes. You can add starting nodes (also called seed nodes) by clicking Add Nodes in the Menu Bar.

In case you need seed nodes with additional keys, load a CSV file in the Add Nodes-feature. Each column in the CSV file becomes a key in the node's detail data. Note: In the node view (the main table) Facepager only shows date configured in the column setup. Thus, to see all data from a CSV file, click one node and then Add all columns.

Seed Nodes

Objects may contain other objects (they may be nested). For example, a Twitter account such as TheAcademy has followers. When you fetch the followers for this account one node for every follower is automatically inserted under the node "TheAcademy". In this example, the IDs of the accounts are contained in the Object ID-column. These new nodes are the child nodes of your previously added parent nodes. You'll notice - depending on your operating system - an arrow or plus sign besides the objects. Clicking it will unfold the object showing subordinated "child"-objects. Of course, objects may have multiple levels or relationships. Manually added seed nodes are on the first node level, while the child nodes are on the second or deeper level.

Parent and Child Nodes

You can easily fetch data for multiple nodes on deeper levels without selecting every single node. First, click on the ancestor or parent node, no matter on which level. Second, increase the Node level in the general settings section. If you, for example, want to fetch the followers of TheAcademy’s, HBO’s and goldenglobes’ followers, you aim at the child nodes on node level 2. Select their common parent node, set the node level to "2" and fetch the data. As a result, new nodes (the followers’ followers) are automatically inserted as new rows on node level 3.

Nodes on Level 2