Driver: GitHub - adobe/aquarium-fish GitHub Wiki
GitHub is the first gate driver that enables GitHub Actions and Workflows to work with Aquarium complex by providing self-hosted runners.
- REST API Pull way
- Webhook Push way (TODO)
- Automatic balancing of API rates (TODO)
- Managing the stale runners (in case images works improperly)
- Filtering by repo full name
This gate was designed around GitHub API, but it has flaws with isolation - one of the important Aquarium features. That means you will need to open your workers to github.com (yes, even if you use enterprise). The workers as well as code checkout is affected - so you will have a hard time:
- Controlling your dependencies by firewall, because that will open access to everything github has
- Preventing leaking of your intellectual property - yep, anyone who have access to pipelines will be able to push your code to public repo and call it a day
In order to protect your IP and dependencies - you in theory could use some sort of MITM https proxy, but that will complicate the things quite a bit, so good luck!
You probably got an idea, that at least you can filter just access to github via CIDRs, right? That is not easy - because it's everywhere! There is a huge list and you will quickly find that even with consolidation it's getting well over 3000 entities (IP v4+v6).
But in general you will need to make sure your worker have access to the next services to be able to use all github features:
-
https://api.github.com:443
- general -
ssh://github.com:22
- code checkout through SSH -
https://github.com:443
- code checkout through HTTPS -
https://codeload.github.com:443
- actions -
https://*.actions.githubusercontent.com:443
- actions
All the current necessary CIDR's you can find in https://api.github.com/meta (most critical ones are "api" and "actions*").
You just need to specify the label you want for the workflow like that:
jobs:
test1:
runs-on:
- self-hosted
- <AQUARIUM_LABEL_NAME>[:VERSION]
Version pinning is useful for releases to completely reproduce the environment, but for current main branch it's better to use the latest label (no need to specify VERSION).
There are 2 ways to serve github with requested self-hosted workers - through receiving webhook requests and via direct REST API checking. There are some pros and cons of each method, but they are really works best together - so if you can afford that I would recommend to use both.
- Receiving webhook requests (Push) - you will need to organize public internet endpoint to receive the github.com requests and you can't verify if request is lost on the road, but that worth a shot because reaction to request will be immediate and can scale as much as you can imagine.
- Periodic update (Pull) - requires direct access to github.com, auth on GitHub, limited by rate per request, causing delays, not really scalable. But it is easy to setup and quite reliable way.
- Hybrid (Push+Pull) - when both ways starts to work together they are canceling weak points of each other. This way you can be sure you don't miss a request and reaction to adding new repo is as quick as possible.
If you starting and want to play with this gate without investing much in infrastructure - just choose the Poll way: simple Token auth and repo webhook will work well for you.
Using GitHub API you need to understand the requests budget - by default for Token and App it's 5000 requests/h, but for enterprise App it's 15000/h (you can read more in GitHub REST docs). So in order to properly utilize the given budget Gate calculates how much was spent:
- Hooks receiving: for each repo that passed filter (2 requests per repo)
- How many hooks Gate found and rolling amount for the last 24 hours of deliveries received
All those numbers allows the Gate to keep the deliveries checks interval as low as possible (down to the check limit that is by default is set to 30s). But be careful if you want to use alot of repositories - the requests budget could easily, so look at the log/monitoring to keep the budget under control.
If the Webhooks Push enabled as well - the Budget will be used much less, because Gate will skip the delivered events and only focus on undelivered ones.
No matter what the way (or both) you choose - you need to create Webhook in Github for org or repository. In case you need to use Poll as well, please create Auth method to access Github API.
Please make sure you know the difference between auth methods, especially if you doing it for your company. Otherwise you will be needed to spend additional time on migration (which is possible, but takes quite a bit of time).
Easiest way for testing, but could cause issues on a long run.
- Go to https://github.com/settings/personal-access-tokens
- Click "Generate new token" button
- Set whatever token name/description you like
- Resource owner is important - token it will allow access to hooks only within this organization
- In Repository permissions specify only the next ones:
-
Webhooks
, Read only - it's needed to read webhooks via API -
Administration
, Read and Write - it's used to create and manage the self-hosted runners
-
- Copy and save the created token to put it later in your Fish github gate configuration
Github Apps have a number of benefits over the regular token - for example in enterprise you have improved API call restrictions - 15000 per hour instead of default 5000 for token.
Creating of an App is a big topic, but here we will cover the simplest one.
- You need to choose where you want to create an App:
- For personal App: https://github.com/settings/apps/new
- For organization App: https://github.com/organizations/%ORG%/settings/apps/new
- Then pick an App name and put some description in.
If you plan to use a number of Fish clusters for different purposes - it's better to reflect
that in the App name - for example add
-Org-Dev
suffix to show - Choose Homepage URL - it's required, but we will not use it because app will be private
- Skip Callback URL - it's needed for users and we will be just one user of this App
- Deactivate webhook - we would not need it for our purposes
- In Repository/Organization permissions specify only
Webhooks
one. - And select "Only on this account" in "Where can this GitHub App be installed?"
- Click "Create GitHub App"
After app was created - you need to copy it's Client ID
(on the top) and generate private key:
- In Private keys click "Generate a private key" button
- Save the file somewhere and copy it's content
When it's done - it's time to install the app in the repos you want to read info from:
- Now in the same GitHub App settings pick "Install App" menu item
- Pick the account to use for installation and click "Install" button
- It will ask "How to install the App" - and you can choose all repos or just specific repos
- Picked the repos to restrict the App installation and click confirmation button
- After that on the installed App page in browser address bar you will see a numeric the App Installation ID - copy it to later put in Fish github gate config
Webhooks are used for 2 purposes: to store deliveries which are available through the api and to send push requests to github gate.
Since webhook is not just a sender, but also a storage - it's relatively easy to use it both ways. So to properly setup the webhook you need to do the next steps:
- Go to your repository settings and click "Webhooks" menu item
- Click "Add webhook" button
- Specify
Payload URL
- please be careful here, you need to pick the safe not-existing endpoint here to accidentally not share your webhook information with hackers, who can create the domain you mentioning here. I would recommend to usehttps://_aquarium_fish_github_gate_
because it will be hard to create DNS record with something like that and it's descriptive. - Set the
Content type
to "application/json" - Create some long random secret (>32 chars), so it will be hard to remember by human and save it.
- Recommend to use SSL verification as additional way to protect your webhook
- From the events list we need just one -
Workflow jobs
, because it's responsible to notify about the requested or no more needed compute resource. - After that click "Add webhook"
To receive push events to be delivered you need a clear communication path from github server to your Aquarium Fish node. In case you using github.com - you will need a public address or some sort of load balancer for your Fish cluster. With that comes a responsibility to protect your endpoint from all sorts of attacks so please be careful with this way of receiving the events.
With that said, and you have all the required components and setup the infrastructure - just follow
the Pull-only way
and set the Payload URL to the actual endpoint that pointed to the gate's
bind_address
(from Aquarium Fish side
configuration) you already have.
Yeah you still need to configure the GitHub driver gate to make it work, because there is alot of variables that can't be predicted.
TODO
You need just to define the binding_address
to listen on a port, so it will stream data.
In general the configuration is not that hard, especially since you already have all the necessary things after configuring GitHub side. So let's put it all in Aquarium config.yml:
drivers:
gates:
github/github.com_<YOUR_ORG>:
api_app_id: <GITHUB_APP_ID>
api_app_install_id: <GITHUB_INSTALLATION_ID>
api_app_key: |
<GITHUB_APP_PEM_PRIVATE_KEY>
filters:
<ORG>/<REPO>: # Supports path wildcards
webhook_secret: <GITHUB_WEBHOOK_SECRET>
In general it's not that hard to configure the runner - it just needs to be fed with url, token and labels that this runner will serve. Aquarium will supply the worker with metadata mechanism, so the runner startup script can execute the next steps:
# You need to download and unpack github runner for your os/arch and after that:
$ ./config.sh --unattended --ephemeral --no-default-labels --url "$GITHUB_RUNNER_URL" \
--token "$GITHUB_RUNNER_REG_TOKEN" --name "$GITHUB_RUNNER_NAME" \
--labels "$GITHUB_RUNNER_LABELS" --work "$GITHUB_RUNNER_WORKSPACE"
$ ./run.sh
Potentially you want to automate that and bake it in using Aquarium-Bait, so you don't need to deal with those commands anymore.