Batch Scripts - LATC/EU-data-cloud GitHub Wiki

The following scripts are used in the Eurostat Linked Data conversion process. The scripts can be used together or standalone in order to serve different scenarios. In the following, a brief description on how to run each script is outlined.

ParseToC

Parses the Table of Contents and prints the dataset URLs:

How to Run on Windows: ParseToC.bat -n 5

How to Run on Linux: sh ParseToC.sh -n 5

where

* `n` represents the number of dataset URLs to print

Type -h for help.

UnCompressFile

Uncompress the contents of the compressed dataset file:

How to Run on Windows: UnCompressFile.bat -i c:/test/zip/bsbu_m.sdmx.zip -o c:/uncompress/

How to Run on Linux: sh UnCompressFile.sh -i ~/test/zip/bsbu_m.sdmx.zip -o ~/uncompress/

where

* `i` is the input directory path of the compressed file
* `o` is output directory path where the contents of the compressed file will be stored

Type -h for help.

DonwloadZipFile

Downloads the compressed dataset file from the specified URL:

How to Run on Windows: DownloadZip.bat -p c:/test/zip/ -t c:/test/tsv/ -u "http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&downfile=data/apro_cpb_sugar.sdmx.zip"

How to Run on Linux: sh DownloadZip.sh -p ~/test/zip/ -t ~/test/tsv/ -u "http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&downfile=data/apro_cpb_sugar.sdmx.zip"

where

* `p` is the directory path where the compressed `.zip` file will be stored
* `t` is the directory path where the compressed `.tsv` file will be stored
* `u` is the URL of the dataset file

Type -h for help.

DSDParser

Parses the Data Structure Definition (DSD) of a dataset and converts it into RDF using Data Cube vocabulary.

How to Run on Windows: DSDParser.bat -i c:/tempZip/bsbu_m.dsd.xml -o c:/test/ -f TURTLE -a c:/sdmx-code.ttl

How to Run on Linux: sh DSDParser.sh -i ~/tempZip/dsd/bsbu_m.dsd.xml -o ~/test/ -f TURTLE -a ~/sdmx-code.ttl

where

* `i` is the file path of DSD xml file
* `o` is the output directory path where RDF will be stored
* `f` is the format for RDF serialization (RDF/XML, TURTLE, N-TRIPLES)
* `a` is the file path of `sdmx-code.ttl`. It can be downloaded from http://code.google.com/p/publishing-statistical-data/source/browse/trunk/specs/src/main/vocab/sdmx-code.ttl

Type -h for help.

SDMXParser

Parses the SDMX dataset observations and converts it into RDF using DataCube vocabulary.

How to Run on Windows: SDMXParser.bat -f tsieb010 -o c:/test/ -i c:/tempZip/tsieb010.sdmx.xml -l c:/log/ -t c:/tsv/tsieb010.tsv.gz

How to Run on Linux: sh SDMXParser.sh -f tsieb010 -o ~/test/ -i ~/sdmx/tsieb010.sdmx.xml -l ~/log/ -t ~/tsv/tsieb010.tsv.gz

where

* `f` is the name of the datset
* `o` is the output directory path where RDF will be stored
* `i` is the file path of the SDMX `xml` file
* `l` is the directory path where the logs of the dataset conversion will be stored
* `t` is the file path of the SDMX `tsv` file

Type -h for help.

Metadata

Generates the VoID file which will be used to populate the triple store described in Step 5 and Step6.

How to Run on Windows: Metadata.bat -i c:/toc/table_of_contents.xml -o c:/test/

How to Run on Linux: sh Metadata.sh -i ~/toc/table_of_contents.xml -o ~/test/

where

* `i` is the file path of the table of contents (optional parameter)
* `o` is the output directory path where the VoID file will be stored

Type -h for help.

DictionaryParser

Converts the dictionaries/codelists into RDF. It further generates a catalog file which is used to load all dictionaries/codelists into the triple store

How to Run on Windows: DictionaryParser.bat -i c:/dicPath/ -o c:/outputPath/ -c c:/catalogPath/ -f TURTLE

How to Run on Linux: sh DictionaryParser.sh -i ~/dicPath/ -o ~/outputPath/ -c ~/catalogPath/ -f TURTLE

where

* `i` is the directory path where the dictionaries are stored
* `o` is the directory path where the RDF will be stored
* `c` is the directory path where the catalog file will be stored
* `f` is the format for RDF serialization (RDF/XML, TURTLE, N-TRIPLES). This RDF serialization is *only* used to create the catalog file. Dictionaries are generated only in RDF/XML format

Type -h for help.

EuroStatMirror

Downloads all the compressed Datasets files from the Bulk Download page by extracting URLs from Table of Contents.

How to Run on Windows: EuroStatMirror.bat -p c:/zip/ -t c:/tsv/

How to Run Linux: sh EuroStatMirror.sh -p ~/zip/ -t ~/tsv/

where

* `p` is the directory path where the `zip` files are downloaded
* `t` is the directory path where the `tsv` files are downloaded

Type -h for help.

Main

Converts the complete Eurostat datasets into RDF:

How to Run: sh Main.sh -i ~/sdmx-code.ttl -l ~/logs/

where

* `i` is the file path of `sdmx-code.ttl`. It can be downloaded from http://code.google.com/p/publishing-statistical-data/source/browse/trunk/specs/src/main/vocab/sdmx-code.ttl
* `l` is the directory path where logs will be generated

Type -h for help.

Dataset Titles

Generates the titles of the datasets in RDF.

How to Run : sh DatasetTitles.sh -o ~/title/

where

* `o` is the output directory path where the RDF will be stored

Type -h for help.