CKAN Datahub Page Creation

The CKAN Datahub Page Creation module creates a page for the dataset in CKAN Datahub. This is carried out by using the CKAN API, which follows a RESTful style and uses JSON by default.

This module publishes the dataset automatically inserting the following data about it:

A description of the dataset
The author of the dataset
The source of the dataset
The license of the dataset
Resources associated to it:
- The dataset SPARQL endpoint
- The compressed dataset dump files (one dump file for every subset).
- Void file in Turtle format describing the dataset.

The Void file describing the dataset contains the following information:

The web home page of the dataset
The dataset page in CKAN Datahub
The title of the dataset
The description of the dataset
The publisher of the dataset
The source of the dataset
The day the dataset was created
The contributor to the the dataset creation process (ALIADA consortium)
The license of the dataset
The SPARQL endpoint of the dataset
The vocabulary used by the dataset (ALIADA ontology)
The number of triples of the dataset
The data dumps of the dataset
The subsets of the dataset
For each subset:
- The title of the subset
- The number of triples of the subset

REST Interface

The CKAN Datahub Page Creation module provides a RESTful interface. It offers the following services:

Create a new CKAN Datahub Page Creation job. The identifier of the job to be initiated must be provided, and it is supposed to be a valid integer.
- method: POST
- URL: http://<host>:<port>/ckan-datahub/job
- parameters sent inside a form (APPLICATION_FORM_URLENCODED):
  - jobid=<job identifier>
Get a CKAN Datahub Page Creation job state/info. The identifier of the job must be provided, and it is supposed to be a valid integer.
- method: GET
- URL: http://<host>:<port>/ckan-datahub/job/<job identifier>

Once the CKAN Datahub Page Creation module receives any of these service invocations, it reads the input parameters of the job from table aliada.ckancreation_job_instances of a relational DB. The parameters to connect to this DB are obtained from the "context.xml" file of the CKAN Datahub Page Creation module. The services will return an XML or JSON structure with the following information:

id: the job identifier.
startDate: the starting date of the job.
endDate: the end date of the job.
status: the status of the job. Possible values:
- idle: the job hasn´t started yet. That is, the DB table row exists, but the job creation REST service hasn´t been invoked yet.
- running : the job is still running.
- finished : the job has finished.
ckanOrgURL: the URL of the organization page in CKAN datahub.
ckanDatasetURL: the URL of the dataset page in CKAN datahub.

Here is an example in JSON format:

    {
        "ckanDatasetURL":"http://datahub.io/dataset/datos-artium-org","ckanOrgURL":"http://datahub.io/organization/artium","endDate":"2015-04-22T13:05:41","id":188,"startDate":"2015-04-22T13:04:58","status":"finished"
    }

Relational DB tables used

The CKAN Datahub Page Creation module uses the following tables:

Table aliada.ckancreation_job_instances. This table is used for saving the configuration parameters and the state of each job instance. The configuration parameters are set by the module that creates the job instance in the DB, that is the IU module. The state related fields are set by the job itself.
Table aliada.dataset. This table contains information about the dataset to be published in CKAN Datahub.
Table aliada.subset. This table contains information about the subsets of a dataset.

Table aliada.ckancreation_job_instances

This table contains the following fields grouped by configuration parameters fields and state related fields:

job_id
Configuration fields:
- ckan_api_url: URL of the RESTful API of CKAN Datahub.
- ckan_api_key: Key to use the RESTful API of CKAN Datahub.
- tmp_dir: the name of the temporary folder to be used to store temporarily the organisation logo image. Afterwards, it will copied to a folder under the web page folder of the dataset.
- store_ip: IP address of the machine where the RDF store resides.
- store_sql_port: port of the RDF store for SQL access.
- sql_login: the login of the SQL access.
- sql_password: the password of the SQL access.
- isql_command_path: full path to the ISQL command.
- isql_commands_file_graph_dump: full path of the ISQL commands file to dump the triples of a graph in Virtuoso into a compressed file.
- virtuoso_http_server_root: full path of Virtuoso HTTP server root folder, where the web page for the dataset resides.
- aliada_ontology: ALIADA ontology URI.
- org_name: organization name in CKAN Datahub.
- org_description: organization description.
- org_home_page: organization home page.
- datasetId: dataset identifier to get the dataset information from dataset table.
- organisationId: organization identifier to get the organization information from organisation table.
State fields:
- ckan_org_url: the URL of the organization page in CKAN datahub
- ckan_dataset_url: the URL of the dataset page in CKAN datahub
- start_date
- end_date

Table aliada.dataset

This table contains the following fields:

datasetId: dataset identifier.
organisationId: organization identifier
dataset_desc: dataset description.
domain_name: dataset domain name, e.g.: data.artium.org
uri_id_part: used to generate Identifier URI-s, e.g.: ”id”, URI: http://data.szepmuveszeti.hu/id/museumcollection/E18_Physical_Thing/szepmuveszeti.hu_object_29
uri_doc_part: used to generate Document URI-s, e.g.: ”doc”, URI: http://data.szepmuveszeti.hu/doc/museumcollection/E18_Physical_Thing/szepmuveszeti.hu_object_29
uri_def_part: used to generate the Ontology URI-s, e.g.: ”def”, URI: http://data.szepmuveszeti.hu/def/museumcollection
uri_concept_part: used in all URI types as a prefix to give a description of the dataset in the URI, e.g.: ”data”, URI: http://data.szepmuveszeti.hu/id/data/museumcollection/E18_Physical_Thing/szepmuveszeti.hu_object_29
uri_set_part: used to generate the subsets URI-s, e.g.: ”set” URI: http://data.artium.org/set/library/bib
listening_host: The address of the network interface the Virtuoso HTTP server uses to listen and accept connections.
virtual_host: It will be the virtual host name that the browser presents as Host: entry in the request headers. i.e. Name-based virtual hosting. It will have the same value than dataset.domain_name.
sparql_endpoint_uri: SPARQL endpoint URI.
sparql_endpoint_login: SPARQL endpoint user name.
sparql_endpoint_password: SPARQL endpoint password.
public_sparql_endpoint_uri: public SPARQL endpoint URI.
dataset_author: dataset author name. E.g.: Aliada Consortium.
ckan_dataset_name: dataset name in CKAN datahub.
dataset_long_desc: dataset long description for CKAN datahub.
dataset_source_url: URL of the data source from where the dataset has been generated.
license_ckan_id: CKAN license identifier of the dataset to be published in CKAN datahub. E.g.: cc-zero.
license_url: license URL of the dataset to be published in CKAN datahub. E.g.: http://creativecommons.org/publicdomain/zero/1.0/
isql_commands_file_dataset: full path of the ISQL commands file to execute for the dataset. If it is null or it does not exist, the linkeddataserver_job_instances.isql_commands_file_dataset_default field will be used.
dataset_web_page_root: full path of the dataset web page folder.

Table aliada.subset

This table contains the following fields:

datasetId: dataset identifier.
subsetId: subset identifier.
subset_desc: subset description.
uri_concept_part: used in all URI types as a prefix to give a description of the subset in the URI, e.g.: ”museumcollection”, URI: http://data.szepmuveszeti.hu/id/data/museumcollection/E18_Physical_Thing/szepmuveszeti.hu_object_29
graph_uri: URI of the graph in Virtuoso where the generated RDF triples are saved.
links_graph_uri: URI of the graph in Virtuoso where the discovered links are saved.
isql_commands_file_subset: full path of the ISQL commands file to execute for the subset. If it is null or it does not exist, the linkeddataserver_job_instances.isql_commands_file_subset_default field will be used.

CKAN_Datahub_Page_Creation - ALIADA/aliada-tool GitHub Wiki

CKAN Datahub Page Creation

REST Interface

Relational DB tables used

Table aliada.ckancreation_job_instances

Table aliada.dataset

Table aliada.subset

⚠️ GitHub.com Fallback ⚠️

CKAN_Datahub_Page_Creation - ALIADA/aliada-tool GitHub Wiki

CKAN Datahub Page Creation

REST Interface

Relational DB tables used

Table aliada.ckancreation_job_instances

Table aliada.dataset

Table aliada.subset

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️