Reverse map a categorical variable based on dictionary values - lvphj/epydemiology GitHub Wiki
myDF = epy.phjReverseMap(phjDF,
phjMappedVarName = 'mapped_cat',
phjUnmapped = np.nan,
phjTreatAsRegex = False,
phjDropPreExisting = False,
phjPrintResults = False))
When phjTreatAsRegex
is set to True
, the function calls phjFindRegexNamesGroups()
function with phjSeparateRegexGroups
argument set to True
. This ensures that all possible matches are identified.
Example 1 - exact string matches
myDF = pd.DataFrame({'id':[1,2,3,4,5,6,7],
d = {'dog':['dogg','canine','cannine'],
myDF = epy.phjReverseMap(phjDF = myDF,
phjMappingDict = d,
phjCategoryVarName = 'var',
phjMappedVarName = 'new',
phjUnmapped = 'missing',
phjDropPreExisting = True,
phjTreatAsRegex = False,
phjPrintResults = True)
Produces the following output:
id var
0 1 dogg
1 2 canine
2 3 cannine
3 4 catt
4 5 felin
5 6 cot
6 7 feline
Reversed dictionary
{'felin': 'cat', 'cannine': 'dog', 'dogg': 'dog', 'canine': 'dog', 'feline': 'cat', 'catt': 'cat'}
id var new
0 1 dogg dog
1 2 canine dog
2 3 cannine dog
3 4 catt cat
4 5 felin cat
5 6 cot missing
6 7 feline cat
Example 2 - regexes
myDF = pd.DataFrame({'id':[1,2,3,4,5,6,7],
d = {'dog':['(?:(?:dog+))','(?:can*ine)'],
myDF = epy.phjReverseMap(phjDF = myDF,
phjMappingDict = d,
phjCategoryVarName = 'var',
phjMappedVarName = 'new',
phjUnmapped = 'missing',
phjDropPreExisting = True,
phjTreatAsRegex = True,
phjPrintResults = True)
Produces the following output:
id var
0 1 dogg
1 2 canine
2 3 cannine
3 4 catt
4 5 felin
5 6 cot
6 7 feline
Full Regex string
cat ... done
dog ... done
Table of number of group matches identified per description term
Number of matches
0 1
1 6
id var cat dog numberMatches matchedgroup
0 1 dogg NaN dogg 1 dog
1 2 canine NaN canine 1 dog
2 3 cannine NaN cannine 1 dog
3 4 catt catt NaN 1 cat
4 5 felin felin NaN 1 cat
5 6 cot NaN NaN 0 missing
6 7 feline feline NaN 1 cat