Datasource and access metadata files - ProofDrivenQuerying/pdq GitHub Wiki
PDQ's executables are run with three types of files.
This part of the guide describes the data source and access metadata.
Schema files give high-level information about relations and access methods available on a relation.
This is sufficient for the PDQ planning module.
But they do not contain any information about how to perform an access at runtime.
PDQ has separate specification files that describe this, with different specifications supported for database-based relations, main-memory relations (data files that are read into memory) and for relations accessed via web services.
These files would typically be written only once per datasource.
Assuming datasource description files are available, PDQ users would not need to be aware of the details of these specifications.
For each access method m
contained in a schema.xml
file, there should be a file m.xml
that contains the datasource details needed to invoke m
.
The PDQ runtime will look for these methods in a directory that is listed in the case.properties
file. By default the directory is called 'accesses'.
A very simple datasource description describes a relation stored in a relational database.
The data source description has some basic connectivity information for the DB (see the key-value pairs contained in <dbproperties>
below); this is analogous to what
one finds in JDBC or ODBC. It also describes mappings between attributes in the real DB and
the attributes of the access method in the schema.xml
file, allowing some renaming of attributes.
Show an example
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Access access-type="DB_ACCESS_METHOD" name="m1" relation-name="customer">
<attribute name="c_custkey" type="java.lang.Integer" input="false" mapsToRelationAttribute="c_custkey"/>
<attribute name="c_name" type="java.lang.String" input="false" mapsToRelationAttribute="c_name"/>
<attribute name="c_address" type="java.lang.String" input="false" mapsToRelationAttribute="c_address"/>
<attribute name="c_nationkey" type="java.lang.Integer" input="false" mapsToRelationAttribute="c_nationkey"/>
<attribute name="c_phone" type="java.lang.String" input="false" mapsToRelationAttribute="c_phone"/>
<attribute name="c_acctbal" type="java.math.BigDecimal" input="false" mapsToRelationAttribute="c_acctbal"/>
<attribute name="c_mktsegment" type="java.lang.String" input="false" mapsToRelationAttribute="c_mktsegment"/>
<attribute name="c_comment" type="java.lang.String" input="false" mapsToRelationAttribute="c_comment"/>
<dbProperties>
<entry>
<key xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xsi:type="xs:string">password</key>
<value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xsi:type="xs:string">root</value>
</entry>
<entry>
<key xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xsi:type="xs:string">url</key>
<value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xsi:type="xs:string">jdbc:postgresql://localhost:5432/</value>
</entry>
<entry>
<key xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xsi:type="xs:string">database</key>
<value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xsi:type="xs:string">tpch</value>
</entry>
<entry>
<key xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xsi:type="xs:string">username</key>
<value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xsi:type="xs:string">postgres</value>
</entry>
</dbProperties>
</Access>
For each access m
implemented by a REST web service, we have an XML document describing how m is implemented.
In the example below, an access method XML_WEB_ACCESS_METHOD
is described in a file myaccess.xml
.
The file describes the attributes returned by the service and their mapping to attributes in the relation.
It also describes which attributes are inputs -- in this case, it is only nationkey
.
The template or the post-parameter can be used to describe how the input values are passed.
The different inputs are referred to by their index within the set of all inputs; in the example there is only one input attribute, and thus only one index, namely 0
.
The value for this attribute is passed as a value through post.
Show an example
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Access access-type="XML_WEB_ACCESS_METHOD" name="mt_4" relation-name="TestRelation1">
<attribute name="n_name" type="java.lang.String" input="false" mapsToRelationAttribute="name"/>
<attribute name="n_nationkey" type="java.lang.Integer" input="true" mapsToRelationAttribute="key"/>
<attribute name="n_comment" type="java.lang.String" input="false" mapsToRelationAttribute="comment"/>
<attribute name="n_regionkey" type="java.lang.Integer" input="false"/>
<url-template>http://pdq-webapp.cs.ox.ac.uk:80/webapp/servlets/servlet/NationInput</url-template>
<post-parameter name="n_nationkey" value="{0}"/>
</Access>
For parameter values that are passed through the query string, rather than via post, one just puts the placeholder {i}
in the url-template.
An access can be implemented by reading in data from a data file. In this case the implementation description just gives the name of the file and a mapping of attribute names.
Show an example
In the example below, the access method m10
on relation nation is implemented by looking
up in a file located at ./data/nation_m10.csv
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Access access-type="IN_MEMORY_ACCESS_METHOD" name="m10" relation-name="nation">
<attribute name="n_nationkey" type="java.lang.Integer" input="false" mapsToRelationAttribute="n_nationkey"/>
<attribute name="n_name" type="java.lang.String" input="false" mapsToRelationAttribute="n_name"/>
<attribute name="n_comment" type="java.lang.String" input="false" mapsToRelationAttribute="n_comment"/>
<attribute name="n_regionkey" type="java.lang.Integer" input="false" mapsToRelationAttribute="n_regionkey"/>
<data-scv-file>./data/nation_m10.csv</data-scv-file>
</Access>
Each relation may also have a catalog.properties file giving some statistical information used to cost plans. For a relation that is stored in a database, PDQ can extract these files from reading the database manager's catalog automatically. But for web services there must be rewritten by hand.
Show an example
Here is an example catalog file for an example university schema, which might be implemented by a web service:
RE:Profinfo BI:m2 RT:5
RE:Udirectory BI:m3 RT:5
RE:Profinfo CA:1
RE:Udirectory CA:3
It says that the scenario deals with 2 relations, and it provides some (unrealistic) cardinality information about each. The relations have one access method each, and it provides information about the number of returned tuples per access.