Extraction Keys - strohne/Facepager GitHub Wiki

Extraction keys are used to extract and prepare values from nested data.
For example, you define which values from a complex API response (as visible in the data view) should be visible in the columns (of the nodes view). Extraction keys are used in the following places:

Query Setup: Placeholders contain extraction keys for constructing the URL or the payload of a request from nodes data. Further, the query response is sliced using a key.
Column Setup: Each column in the nodes view is defined by an extraction key.
Extract Data Feature: You can slice your data using keys in the data view.

Each extraction key follows the pattern name[conditions]=path|modifiers:

Name (optional): Specifies the name of the column in the output table. If omitted, the column name defaults to the path key. The column name and the path key are separated by an equals sign =. Example: firstcomment=comments.data.0.message defines a column named firstcomment.
Conditions (optional): You can collapse values from different paths into a single column by using conditions. This is particularly useful when combining data from different levels. If you add a level comparison expression to a column name, the extraction key is only applied if the condition is met. Thus, add the same column with appropriate conditions and paths for every level. The following column setup example will generate one column named "label" with different data depending on the node level (counting starts at 0):
```
label[$level=1]=entity.name
label[$level=2]=mydata.description
```
Path (required): A dot-separated path that navigates through the nested data to extract the desired value. The path key may include wildcard symbols. * matches any single element at that level in the path. ** matches any number of nested levels in the path. Example: user.photos.*.url extracts the URL of each photo and concatenates the URLs with a semicolon. Further, the path key may be a literal string if surrounded by quotes. This way you can add fixed values to the columns or specify filenames such as "rules.json" to be passed to the file modifier. Moreover, there are some reserved extraction keys to access the fixed data of a node: $level, $object_id, $object_type, $object_key, $query_status, $query_time, $query_type.
Modifiers (optional): Postprocessing operations applied to the extracted value. Multiple modifiers can be chained and are separated by the pipe symbol |. Modifiers may have options, which follow a colon : and are separated by commas. Example: filepath|thumb:100 extracts the value at filepath and then applies the thumb modifier with the option 100.

Usually, extraction keys work on the nodes data as it is visible in the data view. But if you have a filename in your data, you can also dive into files on your computer and extract data from them using the file modifier. That comes in handy, for example, if you already have a folder with data files or images. You can include file content in columns, query URLs or the query payload.

Path keys

With path keys you pull out data from the nodes. You can use this concept in placeholders or to define the columns of the Nodes View. The starting point is the data shown in the Data View. These data are formatted as JSON, which follows a quite simple logic of key-value pairs. To get the value on the right side you use the key to the left. Data may be arranged as a nested hierarchy. Nested key-value pairs are addressed by chaining the keys separated by a dot, e.g. comments.data.

To quickly get a specific key you can select it in the Data View and click Add Column. This will add the corresponding key to the Custom Table Columns (Column Setup) field right below the Data View. You can click Add All Columns to add keys for all values at once. Nested data will be output as a JSON string containing all the data. For example, the key comments.data gives you all items pasted together:

Keys 1

To get a single value you can use a key addressing deeper values. For example, the key comments.data.0.message will give you the message content of the first comment only. For addressing multiple values, you may use the asterisk-operator *. Replace a key with the asterisk to address all keys on the same level. All values will be concatenated by semicolons. While comments.data.0.message will only address the first comment, comments.data.*.message will give you the messages of all comments, separated by semicolons.

Keys 2

This works the same way for other fields, e.g. comments.data.*.created_time.

Remember, that only columns defined in the Column Setup are exported by Facepager. So keys relate to columns in the resulting Excel sheet, while values are the row-values in a single column.

Values in double quotes are handled as literal values. This may be useful to assemble complex payloads without saving them into a preset (e.g. "rules.json", see the examples below).

Modifiers

In placeholders and for the column setup, using keys, you extract specific values from the detail data of a node. Sometimes those values need some kind of post proecessing. For example, you can extract a part of the value using regular expressions. In Facepager, post-processing steps are added after the key as modifiers with the pipe symbol |. Multiple modifiers can be chained by using a pipe |.

Mosts modifiers consist of a name followed by a colon and options after the colon. Some modifiers don't have options and the colon is omitted, see below.

You can escape special characters such as the pipe with a backslash.

modifier	options	examples
css	Add css selectors to extract elements from HTML or XML	`css:a` `css:div.article`
xpath	Add xpath selectors to extract elements from HTML or XML.	`xpath://a/@href` `xpath://a/text()` `css:div.article\|xpath:string()`
re	Use regular expressions to find and extract text. The first matching group (first parentheses) is returned. Special characters are escaped by a backslash.	`re:[0-9]+` `re:hashtag/(\\w+)`
js	Parse Javascript and get the content of variables or objects by their name. This is helpful if your HTML contains script tags.	`js:captions` `xpath://script/text()\|js:display_comments`
json	Parse JSON and get components by key. This is helpful if your HTML contains JSON inside of tags or JavaScript. Without a key after the colon, the value is converted to a json string. This is useful for reading text files and escaping the content so that it can be used in the payload.	`json:messagetext` `xpath://script/text()\|js:commentslist\|json:comment`
not	Return false if any of the values contains the given value, otherwise true. This is helpful for stopping pagination if specific data is not present	`not:hasNextPage` `summary\|not:hasNextPage`
is	Return true if any of the values contains the given value, otherwise false. This is helpful for stopping pagination if specific data is present	`is:lastPage` `summary\|is:lastPage`
length	Get the number of values in a list.	`length` `css:div.article\|length`
last	The last value of a list.
first	The first value of list.
max	The maximum value in a list.
min	The minimum value in a list.
join	Concatenate the values. By default a semicolon is used as separator. You can change this with the option (e.g. `tags.*\|join:,`)
utc	Convert a Unix timestamp to a formatted UTC date. Note: before Facepager 4.5 this modifier was called timestamp.	`utc`
timestamp	Convert a formatted ISO date (e.g. 2021-10-01 13:10:00) to a Unix timestamp. Note: before Facepager 4.5 the utc modifier was called timestamp.	`timestamp`
timestamp	Convert a formatted date to a Unix timestamp. Provide the format after the colon, you will find the syntax in the reference of strptime, see https://docs.python.org/3/library/datetime.html. Note: before Facepager 4.5 the utc modifier was called timestamp.	`timestamp:%Y-%m-%d %H:%M:%S`
shortdate	Convert shortdates (e.g. on Twitter) to a formatted UTC date. The parsed pattern is `%a %b %d %H:%M:%S %z %Y`	`shortdate`
encode	Change encoding of the text.	`encode:utf-8` `css:div.article\|encode:utf-8`
base64	Base64 encode the value. This is helpful for uploading base64 encoded data	`base64`
file	The previous value is interpreted as filename and loaded from your upload folder. By default, files are loaded in binary mode. Set the txt option to load the content in text mode. For fixed file names you can use a literatal value in double quotes instead of a key. You can add filenames from a folder with the `Add Nodes` -> `Add files` button.	`Object ID\|file` `filename\|file` `"rules.json"
thumb	The previous value is interpreted as an image filename. A data URL with the base64-encoded image is generated. Optionally, provide the desired size in pixels (e.g. `filename\|thumb:60`)