Examples - ppKrauss/detect-template-region GitHub Wiki

Basic default example, analysing 19 random samples

// CONFIGS:
	$originais = '../originalContent/sample01'; 	// original files in UTF-8 XML or HTML
	$randSampling = true;  // true for random sampling or false  for sequential
	$useClass = true;  // attribute class of tags
	$minSample = false;      // enforce minimal sample-length as 2*$sliceLen_max
	$sliceLen_max = 60;  // tamanho máximo do head ou tail
	$sliceLen_head = 2;  // tamanho inicial de head
	$sliceLen_tail = 2;  // tamanho inicial de tail
	$NCMP = 25;  // numero de amostras, define itens da matriz diagonal de comparação (permuta).
	$MAXPERC = 30; // percentual máximo de amostras diferentes para continuar buscando

Result with php src/php/step1.php:

----------------------------------------------------------------
--- SAMPLES FROM ../originalContent/sample01 		---
--- 25 samples, head-and-tail with 60 elements 		---
---
-----	processing sample s0 - 10200.html (600 lines)	-----
-----	processing sample s1 - 288433.html (11068 lines)-----
-----	processing sample s2 - 240142.html (100 lines)	-----
-----	processing sample s3 - 137031.html (387 lines)	-----
-----	processing sample s4 - 4196.html (1328 lines)	-----
....
-----	processing sample s17 - 110242.html (134 lines)	-----
-----	processing sample s18 - 147821.html (205 lines)	-----
-----	processing sample s19 - 302917.html (765 lines)	-----

 RESULTS:
	!HEADS 1 diffs (in 20=5%) when len=55
	!HEADS 1 diffs (in 20=5%) when len=56
	!HEADS 2 diffs (in 20=10%) when len=57
	!HEADS 5 diffs (in 20=25%) when len=58
	!HEADS 6 diffs (in 20=30%) when len=59

	!TAILS 17 diffs (in 20=85%) when len=19

Same configs, 175 samples

 RESULTS:
	!HEADS 1 diffs (in 176=0.6%) when len=53
	!HEADS 1 diffs (in 176=0.6%) when len=54
	!HEADS 2 diffs (in 176=1.1%) when len=55
	!HEADS 2 diffs (in 176=1.1%) when len=56
	!HEADS 9 diffs (in 176=5.1%) when len=57
	!HEADS 21 diffs (in 176=11.9%) when len=58
	!HEADS 25 diffs (in 176=14.2%) when len=59

	!TAILS 124 diffs (in 176=70.5%) when len=19