Examples - ppKrauss/detect-template-region GitHub Wiki
Basic default example, analysing 19 random samples
// CONFIGS:
$originais = '../originalContent/sample01'; // original files in UTF-8 XML or HTML
$randSampling = true; // true for random sampling or false for sequential
$useClass = true; // attribute class of tags
$minSample = false; // enforce minimal sample-length as 2*$sliceLen_max
$sliceLen_max = 60; // tamanho máximo do head ou tail
$sliceLen_head = 2; // tamanho inicial de head
$sliceLen_tail = 2; // tamanho inicial de tail
$NCMP = 25; // numero de amostras, define itens da matriz diagonal de comparação (permuta).
$MAXPERC = 30; // percentual máximo de amostras diferentes para continuar buscando
Result with php src/php/step1.php:
----------------------------------------------------------------
--- SAMPLES FROM ../originalContent/sample01 ---
--- 25 samples, head-and-tail with 60 elements ---
---
----- processing sample s0 - 10200.html (600 lines) -----
----- processing sample s1 - 288433.html (11068 lines)-----
----- processing sample s2 - 240142.html (100 lines) -----
----- processing sample s3 - 137031.html (387 lines) -----
----- processing sample s4 - 4196.html (1328 lines) -----
....
----- processing sample s17 - 110242.html (134 lines) -----
----- processing sample s18 - 147821.html (205 lines) -----
----- processing sample s19 - 302917.html (765 lines) -----
RESULTS:
!HEADS 1 diffs (in 20=5%) when len=55
!HEADS 1 diffs (in 20=5%) when len=56
!HEADS 2 diffs (in 20=10%) when len=57
!HEADS 5 diffs (in 20=25%) when len=58
!HEADS 6 diffs (in 20=30%) when len=59
!TAILS 17 diffs (in 20=85%) when len=19
Same configs, 175 samples
RESULTS:
!HEADS 1 diffs (in 176=0.6%) when len=53
!HEADS 1 diffs (in 176=0.6%) when len=54
!HEADS 2 diffs (in 176=1.1%) when len=55
!HEADS 2 diffs (in 176=1.1%) when len=56
!HEADS 9 diffs (in 176=5.1%) when len=57
!HEADS 21 diffs (in 176=11.9%) when len=58
!HEADS 25 diffs (in 176=14.2%) when len=59
!TAILS 124 diffs (in 176=70.5%) when len=19