1c. Getting started: Local Environment - EY-Data-Science-Program/2021-Better-Working-World-Data-Challenge GitHub Wiki

This is a good option if you have access to a reasonably powerful computer or laptop. Using the local environment you don't need to worry about limited credits in Azure or running up a bill if your credits are exhausted.

The Cube in a Box uses Docker to launch a complete Open Data Cube environment. Once running you will be able to access the interactive Python environment using Jupyter in your browser, pre-indexed with satellite and linescan data.

What You Need

You will need:

  • Docker installed and running on your computer.
  • Docker Compose to coordinate multiple docker containers.
  • A Terminal to run the setup commands. On MacOS you can use Terminal.app, on Linux you probably know what to do, on Windows
  • Git You will also need git installed to clone this repository.
  • Alternatively you can download a zip of this repository.

Running A Local Data Cube

Once you have met the above requirements, you're ready to launch your very own personal Open Data Cube!

Open a terminal and run the following commands:

Step 1: Get the code

If using Windows, you can create a new folder in C:\Users\user\dockerfiles\ such as cube-in-a-box, and open a Windows Powershell terminal inside it then clone the repository.

If you are using git, you can clone the repository using:

git clone https://github.com/EY-Data-Science-Program/2021-Better-Working-World-Data-Challenge.git

Alternatively download the zip file (linked above) and unzip it. Then in your terminal navigate the the repository folder.

You can check you're in the right place by running ls and see a list of files including install-cube.sh.

Step 2: Start the containers

You may need to update your local docker-compose.yml file starting the containers:

  • Open the docker-compose.yml file in your editor
  • under the jupyter service, change ports from "80:8888" to "8888:8888"
  • add JUPYTER_ALLOW_INSECURE_WRITES=true under jupyter: environment

Run the command:

docker-compose up -d

This will start the PostgreSQL database and Jupyter contaiers. You should see the following output:

Starting 2021-better-working-world-data-challenge_postgres_1 ... done
Starting 2021-better-working-world-data-challenge_jupyter_1  ... done

Later on you can use docker-compose stop to stop the containers (so they're not using your computer CPU and memory).

To start back up run docker-compose start. Make sure you run the commands from the repository folder.

Step 3: Populate the Data Cube

This step will load the image index data into the database.

If you have the make tool installed (likely if you're on MacOS/Linux) you can simply run the below command:

make prepare

If you don't have make, and are on MacOS, Linux or are using Cygwin or WSL on Windows, you can run:

./install-cube.sh - true

If you're using Windows CMD, you may need to run each of the below commands directly:

Initialise the datacube DB

docker-compose exec jupyter datacube -v system init

Add some custom metadata

docker-compose exec jupyter datacube metadata add /scripts/data/metadata.eo_plus.yaml
docker-compose exec jupyter datacube metadata add /scripts/data/eo3_landsat_ard.odc-type.yaml

And add some product definitions

docker-compose exec jupyter datacube product add /scripts/data/ga_s2a_ard_nbar_granule.odc-product.yaml
docker-compose exec jupyter datacube product add /scripts/data/ga_s2b_ard_nbar_granule.odc-product.yaml
docker-compose exec jupyter datacube product add /scripts/data/ga_ls7e_ard_3.odc-product.yaml
docker-compose exec jupyter datacube product add /scripts/data/ga_ls8c_ard_3.odc-product.yaml
docker-compose exec jupyter datacube product add /scripts/data/linescan.odc-product.yaml

Now index some datasets

docker-compose exec jupyter bash -c "dc-index-from-tar --protocol https --ignore-lineage -p ga_ls7e_ard_3 -p ga_ls8c_ard_3 /scripts/data/ls78.tar.gz"
docker-compose exec jupyter bash -c "dc-index-from-tar --protocol https --ignore-lineage -p ga_s2a_ard_nbar_granule -p ga_s2b_ard_nbar_granule /scripts/data/s2ab.tar.gz"
docker-compose exec jupyter bash -c "dc-index-from-tar --protocol https --ignore-lineage -p linescan /scripts/data/linescan.tar.gz"

Step 4: Open Jupyter

If all was successful, you should be able to access your local Jupyter.

Then go to http://localhost:8888/ and enter the password "secretpassword" to login.

You will be prompted for a password which is set to secretpassword. Don't worry, only users on your computer can access this.

If you made the change in step 2 to use port 8888, then use http://localhost:8888

Step 5: Do Some Data Science!

That's it!

If you have any trouble, head to Troubleshooting: FAQ.