Performance Comparison on Spark - PasaLab/DGST GitHub Wiki
Performance Comparison
DGST outperforms the state-of-the-art ERa algorithm with about 3 times speedup on both DNA and English text datasets.
DNA Dataset
We first compare the performance of DGST with ERa on the DNA dataset. We extract strings of different lengths from the Pine genome (with a total length of 12 GBps). The performance comparison is shown below. We can see that DGST performs with 3 times speedup on average.
English text Dataset
We also compare the performance of DGST with ERa on the English text dataset. We extract strings of different lengths from the Wikipedia (with a total length of 10G characters). The performance comparison is shown below. We can see that DGST achieves 2.6 times speedup on average.