Batch Scripts - LATC/EU-data-cloud GitHub Wiki
The following scripts are used in the Eurostat Linked Data conversion process. The scripts can be used together or standalone in order to serve different scenarios. In the following, a brief description on how to run each script is outlined.
ParseToC
Parses the Table of Contents and prints the dataset URLs:
How to Run on Windows: ParseToC.bat -n 5
How to Run on Linux: sh ParseToC.sh -n 5
where
* `n` represents the number of dataset URLs to print
Type -h
for help.
UnCompressFile
Uncompress the contents of the compressed dataset file:
How to Run on Windows: UnCompressFile.bat -i c:/test/zip/bsbu_m.sdmx.zip -o c:/uncompress/
How to Run on Linux: sh UnCompressFile.sh -i ~/test/zip/bsbu_m.sdmx.zip -o ~/uncompress/
where
* `i` is the input directory path of the compressed file
* `o` is output directory path where the contents of the compressed file will be stored
Type -h
for help.
DonwloadZipFile
Downloads the compressed dataset file from the specified URL:
How to Run on Windows: DownloadZip.bat -p c:/test/zip/ -t c:/test/tsv/ -u "http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&downfile=data/apro_cpb_sugar.sdmx.zip"
How to Run on Linux: sh DownloadZip.sh -p ~/test/zip/ -t ~/test/tsv/ -u "http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&downfile=data/apro_cpb_sugar.sdmx.zip"
where
* `p` is the directory path where the compressed `.zip` file will be stored
* `t` is the directory path where the compressed `.tsv` file will be stored
* `u` is the URL of the dataset file
Type -h
for help.
DSDParser
Parses the Data Structure Definition (DSD) of a dataset and converts it into RDF using Data Cube vocabulary.
How to Run on Windows: DSDParser.bat -i c:/tempZip/bsbu_m.dsd.xml -o c:/test/ -f TURTLE -a c:/sdmx-code.ttl
How to Run on Linux: sh DSDParser.sh -i ~/tempZip/dsd/bsbu_m.dsd.xml -o ~/test/ -f TURTLE -a ~/sdmx-code.ttl
where
* `i` is the file path of DSD xml file
* `o` is the output directory path where RDF will be stored
* `f` is the format for RDF serialization (RDF/XML, TURTLE, N-TRIPLES)
* `a` is the file path of `sdmx-code.ttl`. It can be downloaded from http://code.google.com/p/publishing-statistical-data/source/browse/trunk/specs/src/main/vocab/sdmx-code.ttl
Type -h
for help.
SDMXParser
Parses the SDMX dataset observations and converts it into RDF using DataCube vocabulary.
How to Run on Windows: SDMXParser.bat -f tsieb010 -o c:/test/ -i c:/tempZip/tsieb010.sdmx.xml -l c:/log/ -t c:/tsv/tsieb010.tsv.gz
How to Run on Linux: sh SDMXParser.sh -f tsieb010 -o ~/test/ -i ~/sdmx/tsieb010.sdmx.xml -l ~/log/ -t ~/tsv/tsieb010.tsv.gz
where
* `f` is the name of the datset
* `o` is the output directory path where RDF will be stored
* `i` is the file path of the SDMX `xml` file
* `l` is the directory path where the logs of the dataset conversion will be stored
* `t` is the file path of the SDMX `tsv` file
Type -h
for help.
Metadata
Generates the VoID file which will be used to populate the triple store described in Step 5 and Step6.
How to Run on Windows: Metadata.bat -i c:/toc/table_of_contents.xml -o c:/test/
How to Run on Linux: sh Metadata.sh -i ~/toc/table_of_contents.xml -o ~/test/
where
* `i` is the file path of the table of contents (optional parameter)
* `o` is the output directory path where the VoID file will be stored
Type -h
for help.
DictionaryParser
Converts the dictionaries/codelists into RDF. It further generates a catalog file which is used to load all dictionaries/codelists into the triple store
How to Run on Windows: DictionaryParser.bat -i c:/dicPath/ -o c:/outputPath/ -c c:/catalogPath/ -f TURTLE
How to Run on Linux: sh DictionaryParser.sh -i ~/dicPath/ -o ~/outputPath/ -c ~/catalogPath/ -f TURTLE
where
* `i` is the directory path where the dictionaries are stored
* `o` is the directory path where the RDF will be stored
* `c` is the directory path where the catalog file will be stored
* `f` is the format for RDF serialization (RDF/XML, TURTLE, N-TRIPLES). This RDF serialization is *only* used to create the catalog file. Dictionaries are generated only in RDF/XML format
Type -h
for help.
EuroStatMirror
Downloads all the compressed Datasets files from the Bulk Download page by extracting URLs from Table of Contents.
How to Run on Windows: EuroStatMirror.bat -p c:/zip/ -t c:/tsv/
How to Run Linux: sh EuroStatMirror.sh -p ~/zip/ -t ~/tsv/
where
* `p` is the directory path where the `zip` files are downloaded
* `t` is the directory path where the `tsv` files are downloaded
Type -h
for help.
Main
Converts the complete Eurostat datasets into RDF:
How to Run: sh Main.sh -i ~/sdmx-code.ttl -l ~/logs/
where
* `i` is the file path of `sdmx-code.ttl`. It can be downloaded from http://code.google.com/p/publishing-statistical-data/source/browse/trunk/specs/src/main/vocab/sdmx-code.ttl
* `l` is the directory path where logs will be generated
Type -h
for help.
Dataset Titles
Generates the titles of the datasets in RDF.
How to Run : sh DatasetTitles.sh -o ~/title/
where
* `o` is the output directory path where the RDF will be stored
Type -h
for help.