Credentials Setup - scrapinghub/shub-workflow GitHub Wiki
Previous Chapter: Introduction
shub-workflow
library depends on scrapinghub python bindings library. For operation on ScrapyCloud (SC), the client from scrapinghub library needs either an explicit
passing of SC credentials in its constructor, or to setup it via the environment variable SH_APIKEY
. shub-workflow
chooses the second alternative, so you need to
setup this environment variable somehow in your project. There are several alternatives. The recommended approach is to add a setting with the ScrapyCloud apikey in
zyte dash settings and set it as environment variable in the project settings.py
file, e.g (assuming the zyte project setting is named SC_APIKEY
, edit according
to your case):
import os
(...)
from shub_workflow.utils import kumo_settings
(...)
settings_from_kumo = kumo_settings()
if "SC_APIKEY" in settings_from_kumo:
os.environ["SH_APIKEY"] = settings_from_kumo["SC_APIKEY"]
This approach avoids to hard code the SC apikey directly on settings.py
. Of course, if you need to run shub-workflow scripts in your local environment, you will not have
access to kumo settings, so you need to set up SH_APIKEY
environment variable locally.
There are even safer alternatives, that difficult visibility of this credential. For example, if your project is based on a Dockerfile
image, and assuming you have the
SH_APIKEY
environment variable set locally, you can add the following lines to Dockerfile
:
ARG SHUB_APIKEY
ENV SH_APIKEY $SHUB_APIKEY
and, at the deploy time with shub
:
$ shub image upload <target> -b SHUB_APIKEY="$SH_APIKEY"
The -b
option of shub
passes building arguments to docker build. In this case, the building argument SHUB_APIKEY
is set with the value of the local SH_APIKEY
environment variable. The ARG directive in Dockerfile declares that SHUB_APIKEY
is an accepted building argument. The ENV line then sets the variable environment SH_APIKEY
using the value of the passed argument. The final effect is that the SH_APIKEY
is only visible within the docker container once instantiated. This method ensures that only
the developers involved in the project need to know the project api key. For extra safety you may rely on bitbucket buildbot. The shub image upload
line above will be
specified in the bitbucket-pipelines.yml
file, and the local SH_APIKEY
environment variable set up via bitbucket repository settings->repository variables
. In this case,
only the repository admin need to have access to the apikey.
May be a more practical alternative, without using a custom docker image, would be to include the mentioned Dockerfile lines in the scrapinghub-scrapy-stack Dockerfile and make available the '-b' option to 'shub deploy' command. At the moment of writing these lines, this feature is not available, though.
Next Chapter: Crawl Managers