Approach - sorrachai/FraudResumeDetection GitHub Wiki
After preprocessing,
- Resume is a set of sections;
- Sections contain a bag of key words
Data Mining Tasks (Training sets)
- baseline: search for k-nearest pairs using jaccard similarity or q-gram.
- our approach: (int t)
- search for t-nearest pairs using same as above
- among t candidates, apply Naive Bayes' classifier using indicators
- indicators:
- (different owner*) plagiarism between sections (Using graph maximum matching, parameter: threshold)
- university name (parameter: list of faked university)
- DOB vs. claimed experience (parameter: threshold)
- (same owner*) Inverse of plagiarism between sections (parameter: threshold)