Run on Windows - MaastrichtU-IDS/data2services-pipeline GitHub Wiki

Disclaimer: the pipeline has not been tested on Windows as extensively as on Linux, and Windows is not as stable, so you might encounter some issues. Please quickly report them in issues, especially if a you have found a solution.

We recommend to use Git Bash to clone the repository, and the Windows PowerShell terminal (which is easier to use than the basic terminal).

All windows scripts are in the resources/windows_scripts folder and designed to be run from this directory.

cd resources/windows_scripts

Install and fix Docker

  • Install here. You will need to create an account on Docker Hub for Windows.

  • Virtualization and Hyper-V must be activated.

    • Docker will propose to install virtualization automatically after the Docker installation if they are not installed.
    • Note that Docker Hyper-V is not available for Windows 10 Home edition (you will need Pro or Enterprise edition)
    • If you still have issues with activating virtualization, check here.
  • Share drive in Docker > Settings > Shared Drives > Share Drive C (or the on available, you will need to work in this drive, Docker will only be able to access data in Shared Drives)

  • Firewall detected issue: common, see with your IT department or deactivate your firewall

  • If Docker can't access internet when building you might want to change the DNS (to use Google's one). E.g.: wget: unable to resolve host address: go to Docker Settings > Network > DNS Server > Fixed: 8.8.8.8

Clone

Open the Git Bash application to download the directory with git. And execute the following commands to download the code required to run the pipeline:

# IMPORTANT: fix a bug on Windows. Newline causing Apache Drill execution to fail:
# Standard_init_linux.go:175 exec user process caused no such file
git config --global core.autocrlf false

git clone --recursive https://github.com/MaastrichtU-IDS/data2services-pipeline.git

Build

  • You need to download Apache Drill installation bundle and GraphDB standalone zip

    • Register to get an email with download URL: request the Free version
    • Download as standalone server: a zip file
  • Put Apache Drill and GraphDB files in their own folder in the data2services-pipeline git repository (let thenm unzipped)

  • Build the images

cd resources/windows_scripts
./build.bat

# Create graphdb and graphdb-import directories in /data
mkdir /data/graphdb
mkdir /data/graphdb-import

Run Drill and GraphDB services

In a production environment it is considered that both Apache Drill and GraphDB services are present. Use docker to start them. Other RDF stores should also work, but have not been tested yet.

Be careful, you might want to change the volumes to add the disk location required on Windows: c:/data:/data:ro

# Start Apache Drill
docker run -dit --rm -p 8047:8047 -p 31010:31010 --name drill -v c:/data:/data:ro apache-drill
# Start GraphDB
docker run -d --rm --name graphdb -p 7200:7200 -v c:/data/graphdb:/opt/graphdb/home -v c:/data/graphdb-import:/root/graphdb-import graphdb

Create "test" repository by accessing http://localhost:7200/repository

Run using Docker command

Check the Run using Docker commands part of the main documentation to run the different parts of the pipeline.

Be careful you will need to edit the folder paths to point to the path you are using (c:/data by default).

And make the command one line (remove newlines and \ as the PowerShell doesn't handle them).