CKAN_Datahub_Page_Creation - ALIADA/aliada-tool GitHub Wiki
The CKAN Datahub Page Creation module creates a page for the dataset in CKAN Datahub. This is carried out by using the CKAN API, which follows a RESTful style and uses JSON by default.
This module publishes the dataset automatically inserting the following data about it:
- A description of the dataset
- The author of the dataset
- The source of the dataset
- The license of the dataset
- Resources associated to it:
- The dataset SPARQL endpoint
- The compressed dataset dump files (one dump file for every subset).
- Void file in Turtle format describing the dataset.
The Void file describing the dataset contains the following information:
- The web home page of the dataset
- The dataset page in CKAN Datahub
- The title of the dataset
- The description of the dataset
- The publisher of the dataset
- The source of the dataset
- The day the dataset was created
- The contributor to the the dataset creation process (ALIADA consortium)
- The license of the dataset
- The SPARQL endpoint of the dataset
- The vocabulary used by the dataset (ALIADA ontology)
- The number of triples of the dataset
- The data dumps of the dataset
- The subsets of the dataset
- For each subset:
- The title of the subset
- The number of triples of the subset
The CKAN Datahub Page Creation module provides a RESTful interface. It offers the following services:
-
Create a new CKAN Datahub Page Creation job. The identifier of the job to be initiated must be provided, and it is supposed to be a valid integer.
- method: POST
- URL: http://<host>:<port>/ckan-datahub/job
- parameters sent inside a form (APPLICATION_FORM_URLENCODED):
- jobid=<job identifier>
-
Get a CKAN Datahub Page Creation job state/info. The identifier of the job must be provided, and it is supposed to be a valid integer.
- method: GET
- URL: http://<host>:<port>/ckan-datahub/job/<job identifier>
Once the CKAN Datahub Page Creation module receives any of these service invocations, it reads the input parameters of the job from table aliada.ckancreation_job_instances of a relational DB. The parameters to connect to this DB are obtained from the "context.xml" file of the CKAN Datahub Page Creation module.
The services will return an XML or JSON structure with the following information:
- id: the job identifier.
- startDate: the starting date of the job.
- endDate: the end date of the job.
- status: the status of the job. Possible values:
- idle: the job hasn´t started yet. That is, the DB table row exists, but the job creation REST service hasn´t been invoked yet.
- running : the job is still running.
- finished : the job has finished.
- ckanOrgURL: the URL of the organization page in CKAN datahub.
- ckanDatasetURL: the URL of the dataset page in CKAN datahub.
Here is an example in JSON format:
{
"ckanDatasetURL":"http://datahub.io/dataset/datos-artium-org","ckanOrgURL":"http://datahub.io/organization/artium","endDate":"2015-04-22T13:05:41","id":188,"startDate":"2015-04-22T13:04:58","status":"finished"
}
The CKAN Datahub Page Creation module uses the following tables:
- Table
aliada.ckancreation_job_instances. This table is used for saving the configuration parameters and the state of each job instance. The configuration parameters are set by the module that creates the job instance in the DB, that is the IU module. The state related fields are set by the job itself. - Table
aliada.dataset. This table contains information about the dataset to be published in CKAN Datahub. - Table
aliada.subset. This table contains information about the subsets of a dataset.
This table contains the following fields grouped by configuration parameters fields and state related fields:
- job_id
- Configuration fields:
- ckan_api_url: URL of the RESTful API of CKAN Datahub.
- ckan_api_key: Key to use the RESTful API of CKAN Datahub.
- tmp_dir: the name of the temporary folder to be used to store temporarily the organisation logo image. Afterwards, it will copied to a folder under the web page folder of the dataset.
- store_ip: IP address of the machine where the RDF store resides.
- store_sql_port: port of the RDF store for SQL access.
- sql_login: the login of the SQL access.
- sql_password: the password of the SQL access.
- isql_command_path: full path to the ISQL command.
- isql_commands_file_graph_dump: full path of the ISQL commands file to dump the triples of a graph in Virtuoso into a compressed file.
- virtuoso_http_server_root: full path of Virtuoso HTTP server root folder, where the web page for the dataset resides.
- aliada_ontology: ALIADA ontology URI.
- org_name: organization name in CKAN Datahub.
- org_description: organization description.
- org_home_page: organization home page.
- datasetId: dataset identifier to get the dataset information from
datasettable. - organisationId: organization identifier to get the organization information from
organisationtable.
- State fields:
- ckan_org_url: the URL of the organization page in CKAN datahub
- ckan_dataset_url: the URL of the dataset page in CKAN datahub
- start_date
- end_date
This table contains the following fields:
- datasetId: dataset identifier.
- organisationId: organization identifier
- dataset_desc: dataset description.
- domain_name: dataset domain name, e.g.: data.artium.org
- uri_id_part: used to generate Identifier URI-s, e.g.: ”id”, URI: http://data.szepmuveszeti.hu/id/museumcollection/E18_Physical_Thing/szepmuveszeti.hu_object_29
- uri_doc_part: used to generate Document URI-s, e.g.: ”doc”, URI: http://data.szepmuveszeti.hu/doc/museumcollection/E18_Physical_Thing/szepmuveszeti.hu_object_29
- uri_def_part: used to generate the Ontology URI-s, e.g.: ”def”, URI: http://data.szepmuveszeti.hu/def/museumcollection
- uri_concept_part: used in all URI types as a prefix to give a description of the dataset in the URI, e.g.: ”data”, URI: http://data.szepmuveszeti.hu/id/data/museumcollection/E18_Physical_Thing/szepmuveszeti.hu_object_29
- uri_set_part: used to generate the subsets URI-s, e.g.: ”set” URI: http://data.artium.org/set/library/bib
- listening_host: The address of the network interface the Virtuoso HTTP server uses to listen and accept connections.
- virtual_host: It will be the virtual host name that the browser presents as Host: entry in the request headers. i.e. Name-based virtual hosting. It will have the same value than
dataset.domain_name. - sparql_endpoint_uri: SPARQL endpoint URI.
- sparql_endpoint_login: SPARQL endpoint user name.
- sparql_endpoint_password: SPARQL endpoint password.
- public_sparql_endpoint_uri: public SPARQL endpoint URI.
- dataset_author: dataset author name. E.g.: Aliada Consortium.
- ckan_dataset_name: dataset name in CKAN datahub.
- dataset_long_desc: dataset long description for CKAN datahub.
- dataset_source_url: URL of the data source from where the dataset has been generated.
- license_ckan_id: CKAN license identifier of the dataset to be published in CKAN datahub. E.g.: cc-zero.
- license_url: license URL of the dataset to be published in CKAN datahub. E.g.: http://creativecommons.org/publicdomain/zero/1.0/
- isql_commands_file_dataset: full path of the ISQL commands file to execute for the dataset. If it is null or it does not exist, the
linkeddataserver_job_instances.isql_commands_file_dataset_defaultfield will be used. - dataset_web_page_root: full path of the dataset web page folder.
This table contains the following fields:
- datasetId: dataset identifier.
- subsetId: subset identifier.
- subset_desc: subset description.
- uri_concept_part: used in all URI types as a prefix to give a description of the subset in the URI, e.g.: ”museumcollection”, URI: http://data.szepmuveszeti.hu/id/data/museumcollection/E18_Physical_Thing/szepmuveszeti.hu_object_29
- graph_uri: URI of the graph in Virtuoso where the generated RDF triples are saved.
- links_graph_uri: URI of the graph in Virtuoso where the discovered links are saved.
- isql_commands_file_subset: full path of the ISQL commands file to execute for the subset. If it is null or it does not exist, the
linkeddataserver_job_instances.isql_commands_file_subset_defaultfield will be used.