Parameter files - ProofDrivenQuerying/pdq GitHub Wiki
PDQ's executables are run with three types of files.
This portion of the guide explains the format of the parameter files.
Each project comes with its own pdq-<project>.properties
files, read when the program starts up, and which contains project-specific information.
The default location of the file is the execution directory, but an alternate location can be passed through command line to override the default (see here for more details).
Only one parameter file is read per execution, so it should contain parameters for the project being executed and all other projects that the project depends on. For instance, running the planner project independently, the properties file may include parameters specific to common
, cost
and reasoning
, which planner
depends on.
Individual parameters may also be overridden from command line directly using the option -D
. For example,
java -jar PDQ_v.X.X.X_full.jar <MODULE> -c /path/to/parameter/file -Dtimeout=10000
will force the timeout to be 10 seconds, whichever value was specified in the given parameter file.
Parameters are specific for each sub-project, in the following we provide the details for each of them.
Show all parameters
Parameter | Description | Default value |
---|---|---|
seed |
Randomizer seed shared by all randomizer across the PDQ libraries | 0 |
timeout |
Time limit (in ms) | ∞ |
databaseDriver |
Canonical name of the driver class for the internal database used by the reasoner | |
connectionUrl |
Connection URL for the internal database used by the reasoner | |
databaseName |
Name of the internal database used by the reasoner | |
databaseUser |
Username for the internal database used by the reasoner | |
databasePassword |
Password for the internal database used by the reasoner | |
numberOfThreads |
The number of threads and connections to access the database | 10 |
useInternalDatabaseManager |
True in case the internal database manager should be used | true |
factsAreUnique |
The database should have a constraint for making every fact unique | false |
Show all parameters
Parameter | Description | Default value |
---|---|---|
blackBoxConnectionUrl |
Connection URL for the database used by the BLACKBOX_DB cost estimator (required if the cost_type=BLACKBOX_DB ) |
|
blackBoxDatabaseName |
Name of the database used by the BLACKBOX_DB cost estimator (required if the cost_type=BLACKBOX_DB ) |
|
blackBoxDatabaseDriver |
Driver for the database used by the BLACKBOX_DB cost estimator (required if the cost_type=BLACKBOX_DB ) |
|
blackBoxDatabaseUser |
Username for the database used by the BLACKBOX_DB cost estimator (required if the cost_type=BLACKBOX_DB ) |
|
blackBoxDatabasePassword |
Password for the database used by the BLACKBOX_DB cost estimator (required if the cost_type=BLACKBOX_DB ) |
|
costType |
Type of cost estimation to use. This has an influence on the requirements of other planner parameters (if such requirements are violated, a PlannerException will be thrown upon initialization of the Planner ) |
TEXTBOOK |
cardinalityEstimationType |
Type of cardinality estimation to use | NAIVE |
catalog |
File which stores the database metadata |
The costType
can be one of:
Parameter | Description |
---|---|
FIXED_COST_PER_ACCESS |
Estimates the cost as the sum of the cost of all accesses in a plan, where access cost are provided externally |
COUNT_NUMBER_OF_ACCESSED_RELATIONS |
Estimates the cost as the sum of all accesses in a plan |
TEXTBOOK |
Estimates the cost through some externally defined cost function. Currently, this defaults to the white box cost functions relying on textbox cost estimation techniques |
BLACKBOX_DB |
Estimates the cost by translating the query to SQL and asking its cost to a database |
NUMBER_OF_OUTPUT_TUPLES_PER_ACCESS |
Estimates the cost as the sum of the estimated result size per invocation associated to each access method used in a plan |
The cardinalityEstimationType
can be one of:
Parameter | Description |
---|---|
NAIVE |
Naive cardinality estimation, based on external defined constant join/selectivity reduction factors |
Show all parameters
Parameter | Description | Default value |
---|---|---|
maxIterations |
The maximum number of iterations to perform in planning. (this may have different semantics depending of which planning algorithm is used) |
Integer.MAX_VALUE |
plannerType |
Type of planning algorithm to use | DAG_OPTIMIZED |
k_termination |
Number of rounds of the chase to perform, assuming LINEAR_KCHASE is used as the planner type | 10 |
queryMatchInterval |
Number of exploration interval to wait for between query match checks. (use in linear planning algorithms only) |
|
postPruningType |
Type of post-pruning. This is only used in optimizer linear planning | |
chaseInterval |
Number of intervals between which to run the chase | |
maxDepth |
Maximum depth of the exploration. (this also may have different semantics depending of which planning algorithm is used) |
|
exceptionOnLimit |
If true, a LimitReachedException is thrown during planning if a limit (e.g. time or max no. interactions) is reached.Otherwise, the event is logged and the planning completes gracefully |
|
validatorType |
Type of validator to use. Only required in conjunction with DAG planning algorithms | DEFAULT |
filterType |
Type of filter to use. Only required in conjunction with DAG planning algorithms | |
dominanceType |
Type of dominance checks to use. Only required in conjunction with DAG planning algorithms | STRICT_OPEN |
successDominanceType |
Type of success dominance checks to use. Only required in conjunction with DAG planning algorithms | OPEN |
followUpHandling |
Specifies how follow-up joins should be handled. (only applies to DAG planning algorithms) |
MINIMAL |
iterativeExecutorType |
Type of iterative executor to use (only applies to DAG planning algorithms) | MULTITHREADED |
dagThreads |
Number of threads to use in the parallel DAG planning algorithm (where different threads explore different subplans/subproofs in the search space) | 10 |
depthThreshold |
Threshold for the DEPTH_THROTTLING validator | 2 |
useInternalDatabase |
If true, we will use an Internal database manager instead of the external one. | true |
dagThreadTimeout |
DAG thread timeout value. should be smaller then the main reasoning timeout | 120000 |
The plannerType
can be one of:
Parameter | Description |
---|---|
LINEAR_GENERIC |
Generic (exhaustive) linear planning algorithm |
LINEAR_OPTIMIZED |
Optimized linear planning algorithm |
LINEAR_KCHASE |
Linear planning algorithm relying on KTERMINATION_CHASE reasoning type |
DAG_GENERIC |
Generic (exhaustive) DAG planning algorithm |
DAG_OPTIMIZED |
DAG DP planning algorithm, relying on parallelism |
The postPruningType
can be one of:
Parameter | Description |
---|---|
REMOVE_ACCESSES |
Removes redundant accesses |
The validatorType
can be one of:
Parameter | Description |
---|---|
DEFAULT_VALIDATOR |
requires the left and right configurations to be non-trivial: an ordered pair of configurations (left, right) is non-trivial if the output facts of the right configuration are not included in the output facts of left configuration and vice versa. |
APPLYRULE_VALIDATOR |
Requires the input pair of configurations to be non trivial and at least one of the input configurations to be an ApplyRule. |
DEPTH_VALIDATOR |
Requires the input pair of configurations to be non trivial and their combined depth to be <= the depth threshold. |
RIGHT_DEPTH_VALIDATOR |
Requires the input pair of configurations to be non trivial and the right's depth to be <= the depth threshold |
LINEAR_VALIDATOR |
Requires the input pair of configurations to be non trivial and their composition to be a closed left-deep configuration |
The filterType
can be one of:
Parameter | Description |
---|---|
FACT_DOMINATED_FILTER |
Removes the fact dominated configurations after each exploration step |
NUMERICALLY_DOMINATED_FILTER |
Removes the numerically fact dominated configurations after each exploration step |
The dominanceType
can be one of:
Parameter | Description |
---|---|
CLOSED |
Closed dominance. Given two closed configurations, one dominate the other if its facts are contained in the facts of the other, up to homomorphism |
OPEN |
Open dominance. Given two possible open configurations, one dominate the other if |
The successDominanceType
can be one of:
Parameter | Description |
---|---|
CLOSED |
Closed dominance on successful configurations |
OPEN |
Open dominance on successful configurations |
The followUpHandling
can be one of:
Parameter | Description |
---|---|
MINIMAL |
Minimal follow-up join. Upon initializing of a DAG plan search every follow-up join gives rise to an independant ApplyRule |
MAXIMAL |
Maximal follow-up join. Upon initializing of a DAG plan search all follow-up joins gives rise to a single/common ApplyRule |
The iterativeExecutorType
can be one of:
Parameter | Description |
---|---|
MULTITHREADED |
Multi-threaded executor for runningDAG planning rounds in parallel |
Show all parameters
Parameter | Description | Default value |
---|---|---|
databaseDriver |
Canonical name of the driver class for the internal database used by the reasoner | |
connectionUrl |
Connection URL for the internal database used by the reasoner | |
databaseName |
Name of the internal database used by the reasoner | |
databaseUser |
Username for the internal database used by the reasoner | |
databasePassword |
Password for the internal database used by the reasoner | |
reasoningType |
Type of reasoning to use | RESTRICTED_CHASE |
terminationK |
Number of rounds of rule firings to perform, in a single application of the chase (only applies to KTERMINATION_CHASE reasoning type) |
10 |
The reasoningType
can be one of:
Parameter | Description |
---|---|
RESTRICTED_CHASE |
Restricted chase algorithm. Fires only dependencies that are not already satisfied. |
KTERMINATION_CHASE |
Restricted chase, where the number of rule firing rounds is bounded by a constant K
|
PARALLEL_EGD_CHASE |
Runs the parallel EGD chase algorithm |
Show all parameters
Parameter | Description | Default value |
---|---|---|
expectedCardinality |
The expected cardinality of a plan execution result. This can be used to record the cardinality that was obtained before. Note that often there is no way to execute the input query to obtain the ground truth. |
-1 |
skipRuntime |
If true, skip the runtime test. This can be used in the case where testing runtime can be too costly (e.g. web services). |
false |
Show all parameters
Parameter | Description | Default value |
---|---|---|
tuplesLimit |
The maximum number of output tuples | |
accessDirectory |
Relative or absolute location of the access directory |