Reverse map a categorical variable based on dictionary values - lvphj/epydemiology GitHub Wiki

myDF = epy.phjReverseMap(phjDF,
                         phjMappingDict,
                         phjCategoryVarName,
                         phjMappedVarName = 'mapped_cat',
                         phjUnmapped = np.nan,
                         phjTreatAsRegex = False,
                         phjDropPreExisting = False,
                         phjPrintResults = False))

When phjTreatAsRegex is set to True, the function calls phjFindRegexNamesGroups() function with phjSeparateRegexGroups argument set to True. This ensures that all possible matches are identified.

Example 1 - exact string matches

myDF = pd.DataFrame({'id':[1,2,3,4,5,6,7],
                     'var':['dogg','canine','cannine','catt','felin','cot','feline']})

print(myDF)

d = {'dog':['dogg','canine','cannine'],
     'cat':['catt','felin','feline']}

myDF = epy.phjReverseMap(phjDF = myDF,
                         phjMappingDict = d,
                         phjCategoryVarName = 'var',
                         phjMappedVarName = 'new',
                         phjUnmapped = 'missing',
                         phjDropPreExisting = True,
                         phjTreatAsRegex = False,
                         phjPrintResults = True)

Produces the following output:

   id      var
0   1     dogg
1   2   canine
2   3  cannine
3   4     catt
4   5    felin
5   6      cot
6   7   feline

Reversed dictionary

{'felin': 'cat', 'cannine': 'dog', 'dogg': 'dog', 'canine': 'dog', 'feline': 'cat', 'catt': 'cat'}


   id      var      new
0   1     dogg      dog
1   2   canine      dog
2   3  cannine      dog
3   4     catt      cat
4   5    felin      cat
5   6      cot  missing
6   7   feline      cat

Example 2 - regexes

myDF = pd.DataFrame({'id':[1,2,3,4,5,6,7],
                     'var':['dogg','canine','cannine','catt','felin','cot','feline']})

print(myDF)

d = {'dog':['(?:(?:dog+))','(?:can*ine)'],
     'cat':['(?:cat+)','(?:fel+ine?)']}

myDF = epy.phjReverseMap(phjDF = myDF,
                         phjMappingDict = d,
                         phjCategoryVarName = 'var',
                         phjMappedVarName = 'new',
                         phjUnmapped = 'missing',
                         phjDropPreExisting = True,
                         phjTreatAsRegex = True,
                         phjPrintResults = True)

Produces the following output:

   id      var
0   1     dogg
1   2   canine
2   3  cannine
3   4     catt
4   5    felin
5   6      cot
6   7   feline

Full Regex string
(?P<cat>
    (?:cat+)|
    (?:fel+ine?))|
(?P<dog>
    (?:(?:dog+))|
    (?:can*ine))
cat ... done
dog ... done



Table of number of group matches identified per description term

                   Frequency
Number of matches           
0                          1
1                          6


   id      var     cat      dog  numberMatches matchedgroup
0   1     dogg     NaN     dogg              1          dog
1   2   canine     NaN   canine              1          dog
2   3  cannine     NaN  cannine              1          dog
3   4     catt    catt      NaN              1          cat
4   5    felin   felin      NaN              1          cat
5   6      cot     NaN      NaN              0      missing
6   7   feline  feline      NaN              1          cat
⚠️ **GitHub.com Fallback** ⚠️