Approach - sorrachai/FraudResumeDetection GitHub Wiki

After preprocessing,

Data Mining Tasks (Training sets)

baseline: search for k-nearest pairs using jaccard similarity or q-gram.
our approach: (int t)
1. search for t-nearest pairs using same as above
2. among t candidates, apply Naive Bayes' classifier using indicators
indicators:
1. (different owner*) plagiarism between sections (Using graph maximum matching, parameter: threshold)
2. university name (parameter: list of faked university)
3. DOB vs. claimed experience (parameter: threshold)
4. (same owner*) Inverse of plagiarism between sections (parameter: threshold)