Query Setup. CrowdTangle - strohne/Facepager GitHub Wiki
Note: This page is under construction. Please excuse errors.
CrowdTangle is a service from Facebook to analyze public social media data from Facebook and Instagram. For data collection, the service provides the CrowdTangle API, which can be accessed via the Generic Module in Facepager. The kind of data provided by CrowdTangle includes:
- the content of posts,
- metadata about the post itself, e.g. the author, the date or the type of post,
- interaction data, e.g. how many likes, comments, shares or views a post received.
Some basic options on how to interact with the CrowdTangle API to collect data are explained below. In case you look for more detailed information, have a look at the resources provided by CrowdTangle:
- You will find a quick introduction in the Getting Started section and a very helpful introductory video at the webpage.
- The complete documentation can be accessed at https://github.com/CrowdTangle/API/.
- A cheat sheet summarizes the central concepts and endpoints.
- The FAQ answers questions from academics and researchers. To get a quick start with the CrowdTangle API, check out the Presets for predefined scenarios and the Getting Started.
At the moment, access to CrowdTangle’s data is restricted to academics and researchers in specific fields. Thus, before you can interact with the API, you have to apply for access. You will find more information about the registration process and how to apply for an access token in the help section of CrowdTangle.
Once applied successfully, you can access CrowdTangle’s interface and the API. However, some queries are not available by default – like the full-text search endpoint (/posts/search). In case you need these endpoints, you have to contact the CrowdTangle team to get them unlocked.
Also, the number of possible calls to the API is limited – which you should keep in mind when planning your data collection. Here, too, it is possible to ask CrowdTangle to have more requests activated. Rate Limits are by default:
- /Posts: 6 Calls/Minute
- /Posts/Search: 6 Calls/Minute
- /Links: 2 Calls/Minute
CrowdTangle does not track all data from Facebook and Instagram, but only a) data from very active pages, b) data from pages that other researchers have added, c) data from pages that you have added yourself. Further, all of the accounts and groups must be public. In case you look for more information of what type of accounts and metrics are tracked and not tracked, have a look at CrowdTangle’s blog post about available data.
In order to search certain pages, they must first be added to CrowdTangle. This is what dashboards are for: With your CrowdTangle Dashboard you create custom lists of public accounts and groups you want to monitor and save searches or posts.
Once, the desired content is tracked by CrowdTangle, there are three ways to collect the data via the API:
- If one knows the ID of a post, the content can be retrieved directly by using the ID.
- If you have defined lists or searches in the dashboard, the list content can be retrieved.
- You can search CrowdTangle using keywords or URLs.
So, there are different objects you can query in CrowdTangle: posts, dashboards (via your access token), lists, URLs or search terms. To get the respective data, you have to add the search terms or identifiers of these objects in Facepager as seed nodes (Add Nodes
). These seed nodes are shown in the column "Object ID" in the main window. Facepager then assembles the API query for each of these Object-IDs with the help of placeholders in the query setup:
Please remind: This screenshot was taken on 26th of August in 2021. If you try this out yourself, remind that APIs are changing constantly, so this approach may be outdated by the time you read this text.
In case you want to fetch post-data and don’t know the ID of a post in advance, you need to consider CrowdTangle’s hierarchical structure to fetch the IDs of the subordinate objects: dashboards contain lists, lists contain accounts (pages or groups), which in turn have posts. In consequence, if you want to have detail data about posts, you need the data about the page or list first. Thus, a typical pipeline for collecting post or account data from lists would be:
- Get the list IDs from your dashboard (endpoint GET/Lists). You have to use your own access token as a parameter for this query.
- Fetch accounts contained in the list. You can skip this step if you aim at all posts in a list.
- Fetch posts of an account or all posts in a list.
- Fetch details for the posts. Be aware of the different post ID formats. For Instagram it's
<post_id>_<page_id>
, while for Facebook it's<page_id>_<post_id>
.
In Facepager’s query setup you can make all the configurations that are necessary for querying CrowdTangle.
Query settings | Explanation |
---|---|
Base path | The Base path should match the version of the CrowdTangle API https://api.crowdtangle.com. Check the documentation regularly if something has been updated. |
Resource | Select the endpoint you want to use for fetching data from Facebook or Instagram. The given options in the drop-down menu in Facepager are suggestions, which may help you to know what is possible. For example, if you are interested in the posts about a special topic choose /post/search or select /lists to get the items of a list. |
Parameters | Type the parameter names into the fields of the left column and put the belonging parameter values into the right ones. See the documentation for all the possible parameter options. Click the little buttons next to the value column to open a window with more space for typing the field names. Placeholders used in the Resource field have to be defined here and will be replaced by the given value. Helpful additional parameters might be: - startDate and endDate to reduce the fetching-time - sortby to get posts by date - the default setting will give you posts according to their performance, so you might miss underperforming posts. |
Maximum pages | With this setting you set the number of “pages” Facepager tries to download for the selected node. Imagine you want to fetch 900 posts for an Instagram list. Crowdtangle won’t let you get all these posts at once. While the limit of the API lies at about 100 at the time of this writing, you can set the limit in the parameters to 100 and the maximum pages to 9. Thus, for 900 posts there are 9 "pages" of data available, each containing 100 posts. See the documentation of the CrowdTangle API for more information about pagination. |
Access Token | Crawling data from CrowdTangle requires an access token. By pasting it into the access token field Facepager is ready to fetch the data, you don’t have to click any other buttons. Note: The Access Token is stored locally on your computer. No personal data is submitted to the developers or any other authority. |
See the documentation of the Generic module for further explanation.
Please read the following terms of services from CrowdTangle to get more information about
-
the data policies: https://www.crowdtangle.com/data-policy
-
and the CrowdTangle terms of service: https://www.crowdtangle.com/terms.
Searching for more answers and explanations? Just read our FAQ.