Configuration - CERIT-SC/funnel-gdi GitHub Wiki
Funnel can run without external configuration as it comes with default configuration. However, for most cases it's not enough.
Configuration From YAML File
You can begin with the configuration provided in the
default-config.yaml
file. Create a local YAML file, remove the configuration parts you don't need,
modify the ones that need fixing for your scenario (following sections give an
overview of the setup). Finally, provide the location of the configuration to
the funnel command using either -c
or --config
flag:
./funnel -c my-config.yaml server run
The location of the flag is not important as long as it is followed by the
file-path: ./funnel server run -c my-config.yaml
is also valid.
Configuration From Flags
Though it's not commonly used, it's good to know that there are also command-line flags that can override the configuration-file options. The list of flags is displayed using this command:
./funnel server run --help
Logger
This section in the configuration file helps to fine-tune verbosity and format of the Funnel program:
Level
: verbosity asdebug
,ìnfo
(default),warn
, anderror
.OutputFile
: when non-empty, redirects logging to the specified file-path. By default, it's empty and logs are displayed on the console through STDERR.Formatter
: whenjson
is specified, the logs are formatted as multi-line JSON-formatted records. Otherwise, a multi-line attribute-value formatted log-records are printed (non-configurable format).
Logger:
Level: debug
OutputFile: ""
Formatter: json
Database
Database is declared through the Database
field, and corresponding
database-specific section. Possible values for the field:
Local file-based databases:
badger
boltdb
(default)
Locally deployable databases:
elastic
(ElasticSearch)mongodb
(MongoDB)
Cloud services:
datastore
(Google Datastore)dynamodb
(Amazon DynamoDB)
Funnel uses the database for storing tasks and logs. The data size depends on the usage activity. Per task, the data size should be quite modest.
Depending on the choice of the database, also review and configure the corresponding database, and remove other database sections from the configuration file.
For local databases, make sure that the configured paths refer to mounted volumes, for example under the work directory. Otherwise, the data is not persisted when the Funnel container is removed/upgraded.
Storages (Input And Output)
Funnel comes with support for many storages that are enabled by default.
Therefore, when you want to disable a storage, it is not enough to remove the
configuration section. You need to explicitly set Disabled: true
.
Local Storage (File-System)
Specify where task executors can read input-files and write exported output-files on the local file-system.
This is not to be confused with where task executors (e.g. containers) create and modify files inside the process (which is contained within the work-directory).
LocalStorage:
Disabled: false
AllowedDirs:
- /mnt/funnel-files/
This storage is used when a file URL uses the file
protocol.
For example: file:///mnt/funnel-files/project-x/specimen.dat
.
More details in the official documentation.
HTTP
Specify the timeout for retrieving a file over HTTP (using the GET
method):
HTTPStorage:
Disabled: false
Timeout: 30s
This storage is used when a file URL uses the http
or https
protocol.
For example: https://ftp.example.org/project-x/specimen.dat
.
NOTE: credentials in the URL are not supported for initiating Basic authentication.
FTP
FTP can be used for fetching and uploading inputs and outputs of executors.
The provided URL may contain credentials for interacting with the service.
Without credentials in the URL, the user and password from configuration
(by default: anonymous
:anonymous
) is used.
FTPStorage:
Disabled: false
Timeout: 30s
User: anonymous
Password: anonymous
This storage is used when a file URL uses the ftp
or sftp
protocol.
For example: sftp://ftp.example.org/project-x/specimen.dat
.
Sensitive Data Archive (SDA)
This archive can be used only for input files (i.e. read-only storage). In addition, for using the storage, user must be authenticated via Life Science AAI, so that SDA could use the access token for inspecting user's passport and visas, and verify user's permission to access the data.
To activate this storage, the URL to the SDA service (sda-download) is required, and the service must be available when Funnel is launched.
SDAStorage:
ServiceURL: https://sda.example.org/
Timeout: 30s
This storage is used when a file URL uses the sda
protocol.
For example: sda://DATASET_ID_001/specimen.dat
.
Note that user's need to rely on the SDA service defined in the Funnel configuration.
NOTE: the custom-developed SDA-plugin supports Crypt4gh encryption/decryption.
When the reference file ends with .c4gh
, the plugin also tries to decrypt the
file so that the executor would not have to. Failure to decrypt the file
results in the failure of the task.
HTSGET
This service is used for filtering the source BAM/VCF files to reduce the amount of data to be downloaded. Therefore, the first request to the HTSGET service results in a JSON that describes additional data-request that Funnel must send (usually to the data-storage) to fetch parts of the target file.
To activate this storage, the URL to the HTSGET service is required, and the service must be available when Funnel is launched.
HTSGETStorage:
ServiceURL: https://htsget.example.org/
Timeout: 30s
This storage is used when a file URL uses the htsget
protocol.
As valid examples:
htsget://reads/DATASET_2000/synthetic-bam?class=header
htsget://variants/DATASET_2000/synthetic-vcf?referenceName=chr20
Note that user's need to rely on the HTSGET service defined in the Funnel configuration.
More information about the HTSGET specification is here.
NOTE: the custom-developed HTSGET-plugin supports Crypt4gh encryption/decryption.
When the reference file ends with .c4gh
, the plugin also tries to decrypt the
file so that the executor would not have to. Failure to decrypt the file
results in the failure of the task.
S3 And Other Cloud Storages
As these storages are well documented, we just reference them here:
- S3 (both Amazon, and other S3 deployments)
- Google Cloud Storage
- OpenStack Swift
These storages are not explicitly disabled. Except for Amazon S3, they won't
become active by default, as they need to be configured with access credentials.
Access to public Amazon S3 buckets is enabled by default. As an example, the
following is a valid Amazon S3 URL: s3://example-bucket/hello.txt
.
For convenience, here is a configuration snippet for disabling these providers:
AmazonS3:
Disabled: true
GenericS3: []
Swift:
Disabled: true
NOTE: it is not possible to disable specific use-cases (reading or writing) of these storages. However, this can be restricted by:
- S3 role permissions (role is defined by the configured
Key
andSecret
) - disabling S3 storages, and forcing users to read files from public buckets using the HTTP protocol instead.
Computation
Only one computation method can be defined per Funnel instance. By default, it
relies on Docker and therefore the docker
command is expected to be available in the system path (as well as the daemon
process).
Computation is defined as follows (local Docker in this example):
Compute: local
If you run Funnel from the container, be sure to share the Docker socket file to the container process:
docker run -v /var/run/docker.sock:/var/run/docker.sock ...
Container-based computation is also configurable but the only time you might
want to change it, is when you want to use different command-line tool. Here is
the default Docker based setup for local
computations:
Worker:
Container:
DriverCommand: docker
RunCommand: >-
run -i --read-only
{{if .RemoveContainer}}--rm{{end}}
{{range $k, $v := .Env}}--env {{$k}}={{$v}} {{end}}
{{range $k, $v := .Tags}}--label {{$k}}={{$v}} {{end}}
{{if .Name}}--name {{.Name}}{{end}}
{{if .Workdir}}--workdir {{.Workdir}}{{end}}
{{range .Volumes}}--volume {{.HostPath}}:{{.ContainerPath}}:{{if .Readonly}}ro{{else}}rw{{end}} {{end}}
{{.Image}} {{.Command}}
PullCommand: pull {{.Image}}
StopCommand: rm -f {{.Name}}
Alternative computation management options:
local, htcondor, slurm, pbs, gridengine, manual, aws-batch
HTTP and gRPC Servers
Funnel runs both HTTP based Task Execution Service API and also gRPC based API
behind the HTTP API. The APIs are defined in gRPC proto
files:
- tes.proto defines the TES API.
- scheduler.proto
defines endpoints for listing nodes (
GET /v1/nodes
andGET /v1/nodes/{id}
). - events.proto defines internal API for managing computational events.
In terms of deployment, gRPC API is relevant only if you plan to deploy a multi-node Funnel cluster: events API is used for publishing events from nodes.
Therefore, just focus on configuring the HTTP server:
Server:
ServiceName tes.example.org
HostName: tes.example.org
HTTPPort: 8000
RPCPort: 9090
DisableHTTPCache: true
The last parameter affects whether HTTP caching should be turned of for TES API
responses. If you plan to use the web-based dashboard, the true
value is the
best, as otherwise browser may serve old responses from its cache.
Access Control
Funnel, by default, allows anyone to use its API and dashboard UI. To introduce user-based access control, there are two options:
- define users in the configuration for HTTP Basic authentication;
- define an OIDC service in the configuration for delegating user authentication.
Finally, once user-authentication is enabled, also define the preferred task-acces mode for users. More details below.
Basic Authentication
Define users under the Server configuration. Optionally, as user can be marked as an administrator. Typically users can just see their own tasks. Admins can see (and cancel) the tasks of all users.
Server:
BasicAuth:
- User: admin
Password: admin-pass-example
Admin: true
- User: user1
Password: user1-pass-example
If the list of users is empty, Basic authentication is NOT enforced.
Credentials must be passed to the Funnel API using the HTTP header:
Authorization: Basic encoded-credentials-here
.
OIDC Authentication
This assumes that there is an authentication service that supports the OIDC
standard. The Funnel service instance must be registered at the OIDC provider
Typically, a redirect URL must be registered, and it should end with /login
,
for example: https://funnel.example.org/login
. The OIDC provider must provide
client ID and secret values, and standard configuration URL. Now register that
info in the Funnel configuration file under the Server-section:
Server:
OidcAuth:
ServiceConfigURL: https://www.example.org/oidc/.well-known/openid-configuration
ClientId: to-be-copied-from-oidc
ClientSecret: to-be-copied-from-oidc
RequireScope: email
RequireAudience:
RedirectURL: https://funnel.example.org/login
Admins:
- [email protected]
Note that the RedirectURL
must be valid and it must match the one registered
at the OIDC provider. RequireScope
defines one or more space-separated
authentication scope values. RequireAudience
is optional, and can be used for
enforcing that the user-presented Access Token is valid.
The section for Admins
is optional: it can be used list user IDs (in the
sub
claim) that can be elevated to the admin-role just like with the Basic
authentication.
User's Access Token must be presented to the Funnel API using the HTTP header:
Authorization: Bearer access-token-here
.
The dashboard stores the access-token in an HTTP cookie named jwt
.
Expiration time of the cookie is determined by the OIDC service. Funnel does not invalidate the token as it might affect some tasks if the task is using token based authentication for accessing storage.
Task-Access Mode
Funnel provides following options for the Server.TaskAccess
option:
All
(default) – all authenticated users can view and cancel all tasks;Owner
– tasks are visible to the users who created them (admins are not privileged);OwnerOrAdmin
- extendsOwner
by allowing Admin-users see and cancel everything.
The recommended option for most cases is OwnerOrAdmin
.
Summary
Here is a simple configuration for running Funnel locally on Docker as the computation service.
Compute: local
Database: badger
Badger:
Path: /opt/funnel/badger.db
Logger:
Level: info
Formatter: json
EventWriters:
- log
- badger
Server:
ServiceName: tes.example.org
HostName: tes.example.org
HTTPPort: 8000
RPCPort: 9090
DisableHTTPCache: true
TaskAccess: OwnerOrAdmin
# Keep either BasicAuth or OidcAuth but not both.
BasicAuth:
- User: admin
Password: admin-pass-example
Admin: true
OidcAuth:
ServiceConfigURL: https://www.example.org/oidc/.well-known/openid-configuration
ClientId: to-be-copied-from-oidc
ClientSecret: to-be-copied-from-oidc
RequireScope: email
RequireAudience:
RedirectURL: https://funnel.example.org/login
Admins:
- [email protected]
Worker:
WorkDir: /opt/funnel/work
LocalStorage:
Disabled: false
AllowedDirs:
- /mnt/funnel-files/
HTTPStorage:
Disabled: false
Timeout: 30s
HTSGETStorage:
ServiceURL: https://htsget.example.org/
Timeout: 30s
SDAStorage:
ServiceURL: https://sda.example.org/
Timeout: 30s
GenericS3: []
AmazonS3:
Disabled: true