1c. Getting started: Local Environment - EY-Data-Science-Program/2021-Better-Working-World-Data-Challenge GitHub Wiki
This is a good option if you have access to a reasonably powerful computer or laptop. Using the local environment you don't need to worry about limited credits in Azure or running up a bill if your credits are exhausted.
The Cube in a Box uses Docker to launch a complete Open Data Cube environment. Once running you will be able to access the interactive Python environment using Jupyter in your browser, pre-indexed with satellite and linescan data.
What You Need
You will need:
- Docker installed and running on your computer.
- Docker Compose to coordinate multiple docker containers.
- A Terminal to run the setup commands. On MacOS you can use Terminal.app, on Linux you probably know what to do, on Windows
- Git You will also need git installed to clone this repository.
- Alternatively you can download a zip of this repository.
Running A Local Data Cube
Once you have met the above requirements, you're ready to launch your very own personal Open Data Cube!
Open a terminal and run the following commands:
Step 1: Get the code
If using Windows, you can create a new folder in C:\Users\user\dockerfiles\
such as cube-in-a-box
, and open a Windows Powershell terminal inside it then
clone the repository.
If you are using git, you can clone the repository using:
git clone https://github.com/EY-Data-Science-Program/2021-Better-Working-World-Data-Challenge.git
Alternatively download the zip file (linked above) and unzip it. Then in your terminal navigate the the repository folder.
You can check you're in the right place by running ls
and see a list of files
including install-cube.sh
.
Step 2: Start the containers
You may need to update your local docker-compose.yml file starting the containers:
- Open the
docker-compose.yml
file in your editor - under the
jupyter
service, changeports
from"80:8888"
to"8888:8888"
- add
JUPYTER_ALLOW_INSECURE_WRITES=true
underjupyter
:environment
Run the command:
docker-compose up -d
This will start the PostgreSQL database and Jupyter contaiers. You should see the following output:
Starting 2021-better-working-world-data-challenge_postgres_1 ... done
Starting 2021-better-working-world-data-challenge_jupyter_1 ... done
Later on you can use docker-compose stop
to stop the containers (so they're
not using your computer CPU and memory).
To start back up run docker-compose start
. Make sure you run the commands
from the repository folder.
Step 3: Populate the Data Cube
This step will load the image index data into the database.
If you have the make
tool installed (likely if you're on MacOS/Linux) you can
simply run the below command:
make prepare
If you don't have make, and are on MacOS, Linux or are using Cygwin or WSL on Windows, you can run:
./install-cube.sh - true
If you're using Windows CMD, you may need to run each of the below commands directly:
Initialise the datacube DB
docker-compose exec jupyter datacube -v system init
Add some custom metadata
docker-compose exec jupyter datacube metadata add /scripts/data/metadata.eo_plus.yaml
docker-compose exec jupyter datacube metadata add /scripts/data/eo3_landsat_ard.odc-type.yaml
And add some product definitions
docker-compose exec jupyter datacube product add /scripts/data/ga_s2a_ard_nbar_granule.odc-product.yaml
docker-compose exec jupyter datacube product add /scripts/data/ga_s2b_ard_nbar_granule.odc-product.yaml
docker-compose exec jupyter datacube product add /scripts/data/ga_ls7e_ard_3.odc-product.yaml
docker-compose exec jupyter datacube product add /scripts/data/ga_ls8c_ard_3.odc-product.yaml
docker-compose exec jupyter datacube product add /scripts/data/linescan.odc-product.yaml
Now index some datasets
docker-compose exec jupyter bash -c "dc-index-from-tar --protocol https --ignore-lineage -p ga_ls7e_ard_3 -p ga_ls8c_ard_3 /scripts/data/ls78.tar.gz"
docker-compose exec jupyter bash -c "dc-index-from-tar --protocol https --ignore-lineage -p ga_s2a_ard_nbar_granule -p ga_s2b_ard_nbar_granule /scripts/data/s2ab.tar.gz"
docker-compose exec jupyter bash -c "dc-index-from-tar --protocol https --ignore-lineage -p linescan /scripts/data/linescan.tar.gz"
Step 4: Open Jupyter
If all was successful, you should be able to access your local Jupyter.
Then go to http://localhost:8888/ and enter the password "secretpassword" to login.
You will be prompted for a password which is set to secretpassword
. Don't
worry, only users on your computer can access this.
If you made the change in step 2 to use port 8888, then use http://localhost:8888
Step 5: Do Some Data Science!
That's it!
If you have any trouble, head to Troubleshooting: FAQ.