Setting Up a Machine Deep Learning Project - WM-SEMERU/SemeruGuidelines GitHub Wiki

This section explains how to set up a machine learning project to perform experiments or create a development environment. In order to use our scripts and steps, you should be firstly familiar with the following technology: Docker Container, DVC, databases, and VSCode. We strongly suggest reviewing tutorials on how to use these technologies in-depth. Here we just present the basic elements needed to create an environment for our machines and resources. Please follow these steps:

1. Create a .devcontainer

Each GitHub project should have a development environment (.devcontainer folder) to develop, maintain and evolve the machine learning solution. Inside this folder, you need to create and configure two files: devcontainer.json and Dockerfile.

1.1 Configuring devcontainer.json

Under the label mount, you need to provide the paths of your GitHub project and DVC generic data folder (see File System). Please, take special attention to the label runArgs since it is necessary for allowing GPUs to work in the container. If you want to use more technologies or libraries in your docker, you will need to complete this file accordingly.

devcontainer

1.2 Configuring Dockerfile

In our dockerfile, we are using the Tensorflow-GPU image. You will need to update this image if required (line 4). The docker file also needs some keyserver exceptions (line 7). If you have a list of library dependencies, you will need to install them in the docker (lines 10-15). DVC is required for versioning data (line 18). A safe directory command is necessary to add to your workspace project (line 21).

Dockerfile.

2. Building & Deploying the container

Once you create the json config file and the docker file, your VScode detects automatically the .devcontainer folder. VScode asks you to build the docker and deploy your GitHub project automatically in the container. You can also ask docker to re-build the container if your files are modified or updated (Ctrl + P -> Rebuild Container)

Docker Container under vscode

vscode manages the creation of the docker container. It provides an id and installs all required dependencies. Note that the IDs have usually this form vsc-[project name]-[random number].

dockercontainer

Docker Image under vscode

vscode also attached the images that each member is creating under their usernames. Ideally, each person working on the same GitHub project should have their own image.

dockerimage

3. Versioning Data

Semeru employs DVC to manage data versioning. You need to init DVC under the Github project once the container is running. When you init DVC, a .dvc folder is created. The remote should be configured as follows:

dvcconfig

4. The SEMERU Software Architecture

5. The File Systems

Tree Data Folder (/scratch)

Current Folder

scratchcurrent

How it should be

scratchshouldbe

Tree Semeru Folder

treesemeru1

treesemeru2