FLARE OpenWhisk setup on Jetstream - FLARE-forecast/flare-forecast.github.io GitHub Wiki

Overview

This document describes how to deploy an OpenWhisk setup on Jetstream from scratch. This setup is the simplest possible OpenWhisk "local" deployment - with one node (m1.medium, or larger) hosting the entire cluster, and using Jetstream S3 storage for data. The instructions follow the documentation for deployment of OpenWhisk using ansible in Ubuntu 18.04. Future documentation will consider a multi-node deployment.

The cluster is setup as follows:

  • m1.medium instance
  • 100GB volume
  • a floating public IP address
  • a security group that allows incoming ssh (port 22 for management), icmp (for ping), http and https (ports 80 and 443, for the OpenWhisk event API).

Terminology:

  • host name: flare-frontend
  • Public IP address: PubIP
  • Private IP address: PrivateIP_frontend

Private network/subnet on Jetstream:

  • Network: flarenet
  • Associated subnet: flaresubnet
  • Subnet: 10.1.1.0/24
  • Gateway: 10.1.1.1

Before getting started: obtain Jetstream API Access

Contact XSEDE to request access to Jetstream API, following the guidelines

Review documentation on API access

Deploy flare-frontend instance (using Horizon GUI)

To avoid repeating documentation, refer to this page and follow the steps there to create a network, security group, and instance using the Horizon GUI. Before you start the steps in the document below to deploy an instance for your cluster front-end, make note of the following:.

  • When launching an instance, select Ubuntu 18.04 from the list of JS-API-Featured
  • Select the m1.medium flavor, and give it a 100GB volume
  • Leave the default settings (create new volume: YES; delete volume on instance delete: NO)
  • Name this instance “flare-frontend”
  • As part of the process, you need to create a network. Call it “flarenet”. You may use the default 10.1.1.0/24 subnet (call it “flaresubnet”) with 10.1.1.1 as gateway address.
  • You will also need to request a floating public IP address, so the cluster has a stable public address for the OpenWhisk NGINX server. We’ll refer to this floating IP address as Pub_IP in the rest of the documentation; it will look something like 129.114.x.y (depending on what Jetstream public network you’re assigned)
  • When creating security rules, ensure that you also add https, in addition to ssh and icmp as per the instructions
  • After the flare-frontend instance is running, use the “Actions” drop-down to assign the floating IP address PubIP

If all goes well, at the end of the process you should be able to ssh into your deployed instance with the key you provided:

ssh -i your_private_key ubuntu@Pub_IP

You should be able to see the status of the instance in the Horizon GUI under Project->Compute->Instances

Check and take note of the two IP addresses you see - you will need them later:

  • PrivateIP_Frontend (a private address, looks like 10.1.1.xx)
  • PubIP (a public address, looks like e.g. 129.114.xx.yy)

Initial configuration of flare-frontend (using terminal+ssh)

ssh into the front-end from your computer, and install dependences including Docker:

ssh -i your_private_key ubuntu@Pub_IP
sudo apt-get update
sudo apt-get upgrade -y
sudo apt-get install curl git python-pip python-setuptools build-essential libssl-dev libffi-dev python-dev software-properties-common -y
sudo pip install ansible==2.5.2
sudo pip install jinja2==2.9.6
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository  "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt update
sudo apt-get install docker-ce docker-ce-cli containerd.io -y
sudo groupadd -f docker
sudo usermod -a -G docker $USER

Logout from flare-frontend, and log back in, so the Docker group addition takes effect. Sanity check that Docker is running:

exit
ssh -i your_private_key ubuntu@Pub_IP
docker ps

The command should work (but should not see no containers running yet)

Setup CouchDB

Here we will install CouchDB on the host. It can also be installed as a Docker container, but we’ll stick to a host install.

In a terminal in flare-frontend:

sudo apt-get install -y apt-transport-https gnupg ca-certificates
curl -L https://couchdb.apache.org/repo/bintray-pubkey.asc | sudo apt-key add -
echo "deb https://apache.bintray.com/couchdb-deb bionic main" | sudo tee -a /etc/apt/sources.list.d/couchdb.list
sudo apt-get update
sudo apt-get install couchdb -y

A text screen will pop up:

  • Click Ok
  • Then select Standalone, Ok
  • Then enter your PrivateIP_frontend private address (it will look like 10.1.1.xx, as per above) as the interface to bind
  • Click Ok
  • Enter a strong password for your CouchDB admin. Don't use special characters, though, as it can create problems with command-line arguments later on. We’ll call this CouchDB_pwd in the rest of the document

Sanity check that CouchDB is listening:

curl PrivateIP_frontend:5984/_node/_local/_config/ -u admin

Set the reduce_limit as per OpenWhisk recommendation:

curl -X PUT http://PrivateIP_frontend:5984/_node/_local/_config/query_server_config/reduce_limit -d '"false"' -u admin

Restart CouchDB:

sudo service couchdb restart

Sanity check

Check that you can access the database by issuing, from flare-frontend:

curl PrivateIP_frontend:5984

You should see an output like:

{"couchdb":"Welcome","version":"3.1.1","git_sha":"ce596c65d","uuid":"0faXXXXXXX91","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The Apache Software Foundation"}}

Install OpenWhisk on flare-frontend (using terminal+ssh):

Clone the OpenWhisk repository and perform initial setup, using the provided setup script:

cd
git clone https://github.com/openwhisk/openwhisk.git
cd openwhisk
cd tools/ubuntu-setup && ./all.sh

Configure OpenWhisk to use the CouchDB instance you just set up - make sure to replace PrivateIP_fronend and CouchDB_pwd appropriately:

cd ~/openwhisk/ansible
export OW_DB=CouchDB
export OW_DB_PROTOCOL=http
export OW_DB_HOST=PrivateIP_frontend
export OW_DB_PORT=5984
export OW_DB_USERNAME=admin
export OW_DB_PASSWORD=CouchDB_pwd

Perform initial ansible setup:

The OpenWhisk git repo as of 03/23/2021 has a syntax error in a file used by ansible - you'll need to fix that manually. The file name is ~/openwhisk/ansible/group_vars/all - you need to look for a line with "default(1 second)", and remove the "second" string so it looks like: "default(1)"

cd ~/openwhisk/ansible
vi group_vars/all
ansible-playbook setup.yml

Check that the file db_local.ini is populated with the variables for CouchDB as configured above - in particular, double-check the password is correct:

cat db_local.ini
[db_creds]
db_provider=CouchDB
db_username=admin
db_password=CouchDB_pwd
db_protocol=http
db_host=PrivateIP_frontend
db_port=5984

Change the maximum run time and memory of containers

FLARE containers can run for longer than the default of 5 minutes. Let's set a generous timeout:

vi ~/openwhisk/common/scala/src/main/resources/application.conf

scroll down and change the time-limit max to 500 m (should be around line 435) and max memory to 4GB, and save:

    time-limit {
        min = 100 ms
        max = 500 m
        std = 1 m
    }

    # action memory configuration
    memory {
        min = 128 m
        max = 4096 m
        std = 256 m
    }

Finish OpenWhisk install and deploy

Some of these steps may take several minutes...

Install nodejs and npm, as prerequisites for OpenWhisk:

curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash -
sudo apt-get install -y nodejs

Run the pre-requisites ansible playbook

cd ~/openwhisk/ansible
ansible-playbook prereq.yml

Run Gradle

cd ~/openwhisk
./gradlew distDocker

Initialize and clear the database

** These two ansible playbooks (initdb, wipe) should only be activated the first time you install the cluster, or you will lose data!!!

cd ~/openwhisk/ansible
ansible-playbook initdb.yml
ansible-playbook wipe.yml

Deploy OpenWhisk containers

You will need to repeat these four steps after a reboot to deploy OpenWhisk

cd ~/openwhisk/ansible
ansible-playbook openwhisk.yml
ansible-playbook postdeploy.yml
ansible-playbook apigateway.yml
ansible-playbook routemgmt.yml

Now you should see all the containers running - docker ps should show containers for redis, kafka, apigateway, zookeeper, nginx, etc:

CONTAINER ID        IMAGE                                 COMMAND                  CREATED              STATUS              PORTS                                                                                                 NAMES
39821871f84a        openwhisk/action-nodejs-v10:nightly   "docker-entrypoint.s…"   About a minute ago   Up About a minute   8080/tcp                                                                                              wsk00_12_prewarm_nodejs10
fd243538e53b        openwhisk/apigateway:nightly          "/usr/bin/dumb-init …"   9 minutes ago        Up 9 minutes        80/tcp, 8423/tcp, 0.0.0.0:9000->9000/tcp, 0.0.0.0:9001->8080/tcp                                      apigateway
55c8dca89186        redis:4.0                             "docker-entrypoint.s…"   10 minutes ago       Up 10 minutes       0.0.0.0:6379->6379/tcp                                                                                redis
fc5979aec02c        nginx:1.19                            "/docker-entrypoint.…"   22 minutes ago       Up 22 minutes       0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp                                                              nginx
cf6c3791bc17        whisk/invoker:latest                  "/bin/sh -c 'exec /i…"   22 minutes ago       Up 22 minutes       0.0.0.0:17000->17000/tcp, 0.0.0.0:18000->18000/tcp, 0.0.0.0:12001->8080/tcp                           invoker0
16289d9747e2        whisk/controller:latest               "/bin/sh -c 'exec /i…"   28 minutes ago       Up 28 minutes       0.0.0.0:15000->15000/tcp, 0.0.0.0:16000->16000/tcp, 0.0.0.0:8000->2551/tcp, 0.0.0.0:10001->8080/tcp   controller0
ea7f490a7f77        wurstmeister/kafka:2.12-2.3.1         "start-kafka.sh"         29 minutes ago       Up 29 minutes       0.0.0.0:9072->9072/tcp, 0.0.0.0:9093->9093/tcp                                                        kafka0
4f1f60c4c74f        zookeeper:3.4                         "/docker-entrypoint.…"   29 minutes ago       Up 29 minutes       0.0.0.0:2181->2181/tcp, 0.0.0.0:2888->2888/tcp, 0.0.0.0:3888->3888/tcp                                zookeeper0

Install wsk OpenWhisk command-line interface (CLI)

Download the wsk CLI amd64 binary from the releases repository, then copy it to /usr/local/bin and set properties to use the cluster:

cd 
wget https://github.com/apache/openwhisk-cli/releases/download/latest/OpenWhisk_CLI-latest-linux-amd64.tgz
tar -xzf OpenWhisk_CLI-latest-linux-amd64.tgz
chmod +x wsk
sudo cp wsk /usr/local/bin
wsk property set --apihost PrivateIP_frontend
wsk property set --auth `cat ~/openwhisk/ansible/files/auth.guest`

And you can test the wsk CLI and run a simple task:

wsk -i action invoke /whisk.system/utils/echo -p message hello --result

If all goes well, you should see:

{
    "message": "hello"
}

Use wskadmin to delete guest user and create FLARE user

The installation process creates a user account in the "guest" namespace:

cd ~/openwhisk/tools/admin
./wskadmin user list guest -a

Let us create a new user, FLARE, in a new namespace, FLARE:

./wskadmin user create FLARE -ns FLARE
./wskadmin user list FLARE -a

Now let's set the wsk CLI authentication key for this user; replace the argument after --auth with the key created for the FLARE user above:

wsk property set --auth XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY

Now you should be able to issue wsk commands as FLARE:

wsk property get --auth
wsk -i action invoke /whisk.system/utils/echo -p message hello --result

And you can delete the default guest account:

./wskadmin user delete guest -ns guest

Install the OpenWhisk alarm package

Back to the flare-frontend node, let's install the alarm package, which creates timer events.

cd
git clone https://github.com/apache/openwhisk-package-alarms.git
export WSK_HOME="$HOME/openwhisk"
export OPENWHISK_HOME=$WSK_HOME
export AUTH_KEY=`cat ~/openwhisk/ansible/files/auth.whisk.system`
export DB_HOST=`awk -F "=" '/db_host/ {print $2}' $WSK_HOME/ansible/db_local.ini`
export DB_PORT=`awk -F "=" '/db_port/ {print $2}' $WSK_HOME/ansible/db_local.ini`
### replace CouchDB_Password below
export DB_URL=http://admin:CouchDB_Password@$DB_HOST:$DB_PORT
export DB_PREFIX=`awk -F "=" '/db.prefix/ {print $2}' $WSK_HOME/whisk.properties`
export EDGE_HOST=`awk -F "=" '/APIHOST/ {print $2}' $HOME/.wskprops`
export API_HOST=`awk -F "=" '/APIHOST/ {print $2}' $HOME/.wskprops`
cd openwhisk-package-alarms
./installCatalog.sh $AUTH_KEY $EDGE_HOST $API_HOST 2 $DB_URL $DB_PREFIX

To verify the installation, use following commands:

wsk -i package get --summary /whisk.system/alarms
wsk -i action get --summary /whisk.system/alarms/alarm

Now build the Docker image for the alarm:

./gradlew :distDocker

Now run the Docker image - again replace CouchDB_Password appropriately:

set -e
set -x
WSK_HOME=$HOME/openwhisk
DB_PROTOCOL=`awk -F "=" '/db_protocol/ {print $2}' $WSK_HOME/ansible/db_local.ini`
DB_HOST=`awk -F "=" '/db_host/ {print $2}' $WSK_HOME/ansible/db_local.ini`
DB_PORT=`awk -F "=" '/db_port/ {print $2}' $WSK_HOME/ansible/db_local.ini`
DB_PREFIX=`awk -F "=" '/db.prefix/ {print $2}' $WSK_HOME/whisk.properties`

### Substitute password here
$ docker run -d -p 11001:8080 -v $HOME/wsklogs/alarms:/logs \
-e PORT=8080 -e ROUTER_HOST=172.17.0.1 \
-e DB_PREFIX=${DB_PREFIX} -e DB_USERNAME=admin \
-e DB_PASSWORD=CouchDB_Password -e DB_HOST=${DB_HOST}:${DB_PORT} \
-e DB_PROTOCOL=${DB_PROTOCOL} whisk/catalog_alarms

The Docker container should now be running locally on port 11001, which is verified from docker ps:

$ docker ps | grep alarms
CONTAINER ID        IMAGE                                 COMMAND                  CREATED              STATUS              PORTS                                                                                                 NAMES
2a1fc0060272        whisk/catalog_alarms                  "docker-entrypoint.s…"   4 weeks ago          Up 4 weeks          0.0.0.0:11001->8080/tcp                                                                               boring_noether

Configure SSL certificates

Now we will configure SSL for the public API endpoint, so that our OpenWhisk cluster can receive requests from sites such as github.

Roadmap

We'll use letsencrypt to get a free certificate. letsencrypt requires that a web server successfully responds to a challenge in order to issue a short-term certificate; this is a little tricky, as the OpenWhisk nginx server is not configured to do that. The instructions below do this by temporarily bringing down OpenWhisk, then run the certbot CLI tool to use the ACME protocol to grab a certificate from letsencrypt.

Temporarily turn off nginx to request certificate

Let's kill our existing nginx and apigateway containers:

docker kill nginx
docker kill apigateway

Install certbot from snap

cd
mkdir certbot
cd certbot
sudo snap install --classic certbot
sudo ln -s /snap/bin/certbot /usr/bin/certbot

Grab certificate from letsencrypt using certbot

Note: for this, you will need to find out the domain name that is mapped to your flare-frontend's PubIP. You can find this out by running:

nslookup PubIP

It should look like this (replacing XXX, YYY, ZZZ, WWW with numbers which match the PubIP address) js-XXX-YYY-ZZZ-WWW.jetstream-cloud.org

cd ~/certbot
sudo certbot certonly --standalone

Copy certificate/key and save old nginx configuration

cd ~/certbot
mkdir old-certs
cp ~/openwhisk/ansible/roles/nginx/files/openwhisk-server-cert.pem old-certs
cp ~/openwhisk/ansible/roles/nginx/files/openwhisk-server-key.pem old-certs
sudo bash
# cp /etc/letsencrypt/live/js-XXX-YYY-ZZZ-WWW.jetstream-cloud.org/fullchain.pem ~/openwhisk/ansible/roles/nginx/files/openwhisk-server-cert.pem
# cp /etc/letsencrypt/live/js-XXX-YYY-ZZZ-WWW.jetstream-cloud.org/privkey.pem ~/openwhisk/ansible/roles/nginx/files/openwhisk-server-key.pem
# chown ubuntu.ubuntu ~/openwhisk/ansible/roles/nginx/files/openwhisk-*.pem

Restart OpenWhisk

cd ~/openwhisk/ansible
ansible-playbook openwhisk.yml
ansible-playbook postdeploy.yml
ansible-playbook apigateway.yml
ansible-playbook routemgmt.yml

Check that it all works

From a browser, open: https://js-XXX-YYY-ZZZ-WWW.jetstream-cloud.org

Configure GitHub to issue Webhook/API calls

Instructions adapted from this document

Create access token (using browser/GitHub)

First, create a personal access token with the GitHub account

** Make sure you copy and paste the token to a password manager - treat it like a password! You'll need to generate a new one if you lose it

While generating your personal access token, give it any available name (e.g. FLARE). Be sure to select the following scopes, which will allow you to use wsk to register an event on GitHub that will fire every time there is a commit to a repository:

  • repo: repo:status to allow access to commit status
  • admin:repo_hook: write:repo_hook to allow the feed action to create your webhook

Create OpenWhisk trigger (using ssh/flare-frontend)

On flare-frontend, create a trigger for the GitHub push event type by using the myGit/webhook feed.

First, create a package binding that is configured for your GitHub repository, and with the personal access token created in the previous step. Replace myGitUser with your GitHub user name, myGitRepo with your repository name, and accessToken you had saved from the GitHub page:

wsk -i package bind /whisk.system/github myGit \
  --param username myGitUser \
  --param repository myGitRepo \
  --param accessToken aaaaa1111a1a1a1a1a111111aaaaaa1111aa1a1a

Create a trigger for the GitHub push event type by using your myGit/webhook feed:

wsk -i trigger create myGitTrigger --feed myGit/webhook --param events push

Update webhook (on GitHub)

The previous step has created a webhook on GitHub, but you need to edit it as follows before it works:

Test webhook

On flare-frontend, use these commands to create an action and poll activations:

wsk -i rule create myRule myGitTrigger /whisk.system/samples/greeting
wsk -i activation poll

In GitHub, commit a new file to the repository, and check that the event appears on the activation poll

Deleting

If you want to delete what you created for testing above:

wsk -i package delete myGit
wsk -i trigger delete myGitTrigger

Add invoker nodes

TBD

Miscellaneous notes, to-dos

When ./gradlew distDocker failed

When it failed with the following message, the issue was a missing file in

Step 19/20 : COPY openwhisk-server-key.pem /cert-gen
COPY failed: stat /var/lib/docker/tmp/docker-builder297477457/openwhisk-server-key.pem: no such file or directory

This is due to how we're setting up the letsencrypt cert, and file names.

In principle, I'd expect OW to instead look for privkey.pem for this file from:

/home/ubuntu/openwhisk/ansible/group_vars/all

    cert: "{{ nginx_ssl_server_cert | default('fullchain.pem') }}"
    key: "{{ nginx_ssl_server_key | default('privkey.pem') }}"
#    cert: "{{ nginx_ssl_server_cert | default('openwhisk-server-cert.pem') }}"
#    key: "{{ nginx_ssl_server_key | default('openwhisk-server-key.pem') }}"

But it didn't seem to?

Perhaps better to just:

cd /openwhisk/ansible/roles/nginx/files
cp fullchain.pem openwhisk-server-cert.pem
cp privkey.pem openwhisk-server-key.pem

Using step tool for certs

Using the step CLI tool seems like a much cleaner way to get a certificate from letsencrypt.

Should try bringing nginx down, run step, copy certs to the right place, restart