Approach - sorrachai/FraudResumeDetection GitHub Wiki

After preprocessing,

  • Resume is a set of sections;
  • Sections contain a bag of key words

Data Mining Tasks (Training sets)

  • baseline: search for k-nearest pairs using jaccard similarity or q-gram.
  • our approach: (int t)
    1. search for t-nearest pairs using same as above
    2. among t candidates, apply Naive Bayes' classifier using indicators
  • indicators:
    1. (different owner*) plagiarism between sections (Using graph maximum matching, parameter: threshold)
    2. university name (parameter: list of faked university)
    3. DOB vs. claimed experience (parameter: threshold)
    4. (same owner*) Inverse of plagiarism between sections (parameter: threshold)