Parameter files - ProofDrivenQuerying/pdq GitHub Wiki

PDQ's executables are run with three types of files.

This portion of the guide explains the format of the parameter files.

Each project comes with its own pdq-<project>.properties files, read when the program starts up, and which contains project-specific information. The default location of the file is the execution directory, but an alternate location can be passed through command line to override the default (see here for more details).

Only one parameter file is read per execution, so it should contain parameters for the project being executed and all other projects that the project depends on. For instance, running the planner project independently, the properties file may include parameters specific to common, cost and reasoning, which planner depends on.

Individual parameters may also be overridden from command line directly using the option -D. For example,

java -jar PDQ_v.X.X.X_full.jar <MODULE> -c /path/to/parameter/file -Dtimeout=10000

will force the timeout to be 10 seconds, whichever value was specified in the given parameter file.

Parameters values

Parameters are specific for each sub-project, in the following we provide the details for each of them.

Common

Show all parameters
Parameter Description Default value
seed Randomizer seed shared by all randomizer across the PDQ libraries 0
timeout Time limit (in ms)
databaseDriver Canonical name of the driver class for the internal database used by the reasoner
connectionUrl Connection URL for the internal database used by the reasoner
databaseName Name of the internal database used by the reasoner
databaseUser Username for the internal database used by the reasoner
databasePassword Password for the internal database used by the reasoner
numberOfThreads The number of threads and connections to access the database 10
useInternalDatabaseManager True in case the internal database manager should be used true
factsAreUnique The database should have a constraint for making every fact unique false

Cost

Show all parameters
Parameter Description Default value
blackBoxConnectionUrl Connection URL for the database used by the BLACKBOX_DB cost estimator
(required if the cost_type=BLACKBOX_DB)
blackBoxDatabaseName Name of the database used by the BLACKBOX_DB cost estimator
(required if the cost_type=BLACKBOX_DB)
blackBoxDatabaseDriver Driver for the database used by the BLACKBOX_DB cost estimator
(required if the cost_type=BLACKBOX_DB)
blackBoxDatabaseUser Username for the database used by the BLACKBOX_DB cost estimator
(required if the cost_type=BLACKBOX_DB)
blackBoxDatabasePassword Password for the database used by the BLACKBOX_DB cost estimator
(required if the cost_type=BLACKBOX_DB)
costType Type of cost estimation to use. This has an influence on the requirements of other planner parameters
(if such requirements are violated, a PlannerException will be thrown upon initialization of the Planner)
TEXTBOOK
cardinalityEstimationType Type of cardinality estimation to use NAIVE
catalog File which stores the database metadata

The costType can be one of:

Parameter Description
FIXED_COST_PER_ACCESS Estimates the cost as the sum of the cost of all accesses in a plan, where access cost are provided externally
COUNT_NUMBER_OF_ACCESSED_RELATIONS Estimates the cost as the sum of all accesses in a plan
TEXTBOOK Estimates the cost through some externally defined cost function.
Currently, this defaults to the white box cost functions relying on textbox cost estimation techniques
BLACKBOX_DB Estimates the cost by translating the query to SQL and asking its cost to a database
NUMBER_OF_OUTPUT_TUPLES_PER_ACCESS Estimates the cost as the sum of the estimated result size per invocation associated to each access method used in a plan

The cardinalityEstimationType can be one of:

Parameter Description
NAIVE Naive cardinality estimation, based on external defined constant join/selectivity reduction factors

Planner

Show all parameters
Parameter Description Default value
maxIterations The maximum number of iterations to perform in planning.
(this may have different semantics depending of which planning algorithm is used)
Integer.MAX_VALUE
plannerType Type of planning algorithm to use DAG_OPTIMIZED
k_termination Number of rounds of the chase to perform, assuming LINEAR_KCHASE is used as the planner type 10
queryMatchInterval Number of exploration interval to wait for between query match checks.
(use in linear planning algorithms only)
postPruningType Type of post-pruning. This is only used in optimizer linear planning
chaseInterval Number of intervals between which to run the chase
maxDepth Maximum depth of the exploration.
(this also may have different semantics depending of which planning algorithm is used)
exceptionOnLimit If true, a LimitReachedException is thrown during planning if a limit (e.g. time or max no. interactions) is reached.
Otherwise, the event is logged and the planning completes gracefully
validatorType Type of validator to use. Only required in conjunction with DAG planning algorithms DEFAULT
filterType Type of filter to use. Only required in conjunction with DAG planning algorithms
dominanceType Type of dominance checks to use. Only required in conjunction with DAG planning algorithms STRICT_OPEN
successDominanceType Type of success dominance checks to use. Only required in conjunction with DAG planning algorithms OPEN
followUpHandling Specifies how follow-up joins should be handled.
(only applies to DAG planning algorithms)
MINIMAL
iterativeExecutorType Type of iterative executor to use (only applies to DAG planning algorithms) MULTITHREADED
dagThreads Number of threads to use in the parallel DAG planning algorithm (where different threads explore different subplans/subproofs in the search space) 10
depthThreshold Threshold for the DEPTH_THROTTLING validator 2
useInternalDatabase If true, we will use an Internal database manager instead of the external one. true
dagThreadTimeout DAG thread timeout value. should be smaller then the main reasoning timeout 120000

The plannerType can be one of:

Parameter Description
LINEAR_GENERIC Generic (exhaustive) linear planning algorithm
LINEAR_OPTIMIZED Optimized linear planning algorithm
LINEAR_KCHASE Linear planning algorithm relying on KTERMINATION_CHASE reasoning type
DAG_GENERIC Generic (exhaustive) DAG planning algorithm
DAG_OPTIMIZED DAG DP planning algorithm, relying on parallelism

The postPruningType can be one of:

Parameter Description
REMOVE_ACCESSES Removes redundant accesses

The validatorType can be one of:

Parameter Description
DEFAULT_VALIDATOR requires the left and right configurations to be non-trivial:
an ordered pair of configurations (left, right) is non-trivial if the output facts of the right configuration are not included in the output facts of left configuration and vice versa.
APPLYRULE_VALIDATOR Requires the input pair of configurations to be non trivial and at least one of the input configurations to be an ApplyRule.
DEPTH_VALIDATOR Requires the input pair of configurations to be non trivial and their combined depth to be <= the depth threshold.
RIGHT_DEPTH_VALIDATOR Requires the input pair of configurations to be non trivial and the right's depth to be <= the depth threshold
LINEAR_VALIDATOR Requires the input pair of configurations to be non trivial and their composition to be a closed left-deep configuration

The filterType can be one of:

Parameter Description
FACT_DOMINATED_FILTER Removes the fact dominated configurations after each exploration step
NUMERICALLY_DOMINATED_FILTER Removes the numerically fact dominated configurations after each exploration step

The dominanceType can be one of:

Parameter Description
CLOSED Closed dominance. Given two closed configurations, one dominate the other if its facts are contained in the facts of the other, up to homomorphism
OPEN Open dominance. Given two possible open configurations, one dominate the other if

The successDominanceType can be one of:

Parameter Description
CLOSED Closed dominance on successful configurations
OPEN Open dominance on successful configurations

The followUpHandling can be one of:

Parameter Description
MINIMAL Minimal follow-up join.
Upon initializing of a DAG plan search every follow-up join gives rise to an independant ApplyRule
MAXIMAL Maximal follow-up join.
Upon initializing of a DAG plan search all follow-up joins gives rise to a single/common ApplyRule

The iterativeExecutorType can be one of:

Parameter Description
MULTITHREADED Multi-threaded executor for runningDAG planning rounds in parallel

Reasoning

Show all parameters
Parameter Description Default value
databaseDriver Canonical name of the driver class for the internal database used by the reasoner
connectionUrl Connection URL for the internal database used by the reasoner
databaseName Name of the internal database used by the reasoner
databaseUser Username for the internal database used by the reasoner
databasePassword Password for the internal database used by the reasoner
reasoningType Type of reasoning to use RESTRICTED_CHASE
terminationK Number of rounds of rule firings to perform, in a single application of the chase
(only applies to KTERMINATION_CHASE reasoning type)
10

The reasoningType can be one of:

Parameter Description
RESTRICTED_CHASE Restricted chase algorithm. Fires only dependencies that are not already satisfied.
KTERMINATION_CHASE Restricted chase, where the number of rule firing rounds is bounded by a constant K
PARALLEL_EGD_CHASE Runs the parallel EGD chase algorithm

Regression

Show all parameters
Parameter Description Default value
expectedCardinality The expected cardinality of a plan execution result.
This can be used to record the cardinality that was obtained before. Note that often there is no way to execute the input query to obtain the ground truth.
-1
skipRuntime If true, skip the runtime test.
This can be used in the case where testing runtime can be too costly (e.g. web services).
false

Runtime

Show all parameters
Parameter Description Default value
tuplesLimit The maximum number of output tuples
accessDirectory Relative or absolute location of the access directory
⚠️ **GitHub.com Fallback** ⚠️