Files - strohne/Facepager GitHub Wiki
It is recommended to store media files (images, videos, audios) in a local folder and not directly in the Facepager database. With large data, the database may blow up and become unresponsive.
The query setup for downloading files to a folder is similar to working with APIs or doing webscraping.
First, configure a query in the Generic module. For example, if you added URLs as seed nodes,
set the base path to <Object ID>
, clear resource field, parameters and headers and select the GET method.
Second, in the query setup, select file
in the response selector, choose a download folder and define the filename.
The default setting for filename is <Object ID>
and for the file extension <None>
.
This will create a file for each node, with the Object ID as filename and the extension is guessed from the content type
returned by the server.
Note: Files are not overwritten. If a file with the same name already exists, a suffix number is added to the file name. For each downloaded file, a new child node is created with some basic information about the file. This file metadata is useful for matching your nodes with the files, even if a suffix was added to the file names.
In the Generic Module, file content can be uploaded by using a placeholder in the payload of POST requests.
A special kind of extraction key
is used for working with file contents: <Object ID|file>
.
By using the pipe operator |
in conjunction with the file modifier the value of the placeholder is interpreted as a file name.
The placeholder is then replaced by the contents of the file.
The filename is relative to the upload folder selected in the query settings (below the payload field). To upload a file, you place it in the upload folder and insert the placeholder into the payload field.
If you need to upload files with base64-encoding, add the base64-modifier : <Object ID|file|base64>
.
To load files as text (and not bytes) use the txt-option of the file-modifier: <Object ID|file:txt>
.
You can use other keys instead of Object ID
, for example, if you have a filename key in your detail data: <filename|file>
.
You can refer to fixed filenames (instead of keys to extract data) by quoting.
For example, the placeholder <"rules.json"|file:txt>
can be used to insert the content
of a text file named "rules.json" into the payload field.
See the presets coming with Facepager for an example or read the Getting Started with Google Cloud Platform.
Instead of querying data from an API, Facepager supports processing files in a local folder as if they were downloaded from the web. This comes in handy, for example, if you have already downloaded HTML files and want to extract data from them offline.
The procedure is straight forward: Use the Add nodes
button, click the files option and select a bunch of files.
Each file will be added as seed node:
- The Object ID is a file-URL pointing to file that can be processed like any other URL.
- Each seed node will have the filename and the full path in the detail data.
You have three options to process the file content:
- Issue a get request:
In the query setup, set the base path to
<Object ID>
, clear resource field, parameters and headers and select the GET method. Fetch data. Use extraction keys as if the response came from the web. - Use a placeholder in the payload as described above.
You can either refer to the file by its Object ID (absolute path in
<Object ID|file>
) or if you moved the files to a different location, select an upload folder and use the filename key containing a path relative to the upload folder:<filename|file>
. - You can extract data in the column setup in the same way as with placeholders in the query setup.
For example, you can generate base64-encoded Data-URLs from images:
<filename|file|thumb:60>
(60x60px thumbnails).
If you refer to relative paths, in the column setup, files are searched in the upload folder, not in the download folder.