DGST Configuration - PasaLab/DGST GitHub Wiki
The default configuration file of DGST is conf/conf.properties.
The core parameters required by DGST are:
| Parameter Name | Default | Meaning |
| alphabet.num | 128 | Size of alphabet |
| div.start | 2 | Initial count window size in sub-tree partitioning |
| div.step | 4 | Count window step size in sub-tree partitioning |
| root.max.count | 2000000 | Maximum sub-tree size (i.e., maximum S-prefix frequency) |
| fs.extra.len | 1024 | Tail length of input split |
| first.buffer | 10 | Number of symbols in the first element of the local LCP-Range array |
| lcp.range | 128 | Size of range in the LCP-Range structure |
| sorting.method | java | Element-wise sorting method |
| grouping.method | bfhg | Sub-tree construction task allocation strategy |
| spark.partitions | 96 | Computation parallelism on Spark |
| input.dir | hdfs://master:9000/input | The input data path on HDFS or local file system |
| output.location | hdfs://master:9000/output | The output data path on HDFS or local file system |
| working.dir | hdfs://master:9000/tmp | The tempoaray data path on HDFS or local file system |
| merged.filename | merged | Name of the merged string |