Convert placename conjunctions to lowercase - lvphj/epydemiology GitHub Wiki
phjSetPlacenameConjunctionsToLower()
df = epy.phjSetPlacenameConjunctionsToLower(phjDF,
phjSmallWordList = ['of','upon','on','under','and','le','the'],
phjSepList = ['\\s','-','_'],
phjColNameList = ['city','county'],
phjPrintResults = False)
Description
The function ensures that conjunctions found in placenames are set to be lowercase. This ensures that the name format is more consistent with the output of, for example, OS Names API. For example, Isle Of Wight will be changed to Isle of Wight, etc. The separators used either side of the conjunction may be a space, a hyphen or an underscore. This function searches for each combination of conjunction and separator. This enables the case of the conjunction to be changed to lowercase while ensuring that the separator is not changed.
Parameters
The function takes the following parameters:
-
phjDF
Pandas dataframe containing a variable that contains postcode information.
-
phjSmallWordList (default = ['of','upon','on','under','and','le','the'])
List of conjunctions to search.
-
phjSepList (default = ['\s','-','_'])
List of separators. The default list consists of whitespace, a hyphen and an underscore. It is assumed that the separators on either side of conjunction will be the same.
-
phjColNameList (default = ['city','county'])
List of column headings in which to search.
-
phjPrintResults (default = False)
Print intermediate steps; default.
Example
df = pd.DataFrame({'town':['Poulton-LE-Fylde',
'Stratford Upon Avon',
'Newcastle UPON Tyne',
'Newcastle_Under_Lyme',
'Isle Of Wight',
'Stow-On-The-Wold']})
print('Original dataframe')
print('------------------')
print(df)
Original dataframe
------------------
town
0 Poulton-LE-Fylde
1 Stratford Upon Avon
2 Newcastle UPON Tyne
3 Newcastle_Under_Lyme
4 Isle Of Wight
5 Stow-On-The-Wold
df = epy.phjSetPlacenameConjunctionsToLower(phjDF = df,
phjSmallWordList = ['of','upon','on','under','and','le','the'],
phjSepList = ['\\s','-','_'],
phjColNameList = ['town'],
phjPrintResults = True)
print('Dataframe with lowercase placename conjunctions')
print('-----------------------------------------------')
print(df)
List of constructed regexes
---------------------------
['(?i)\\sof\\s', '(?i)-of-', '(?i)_of_', '(?i)\\supon\\s', '(?i)-upon-', '(?i)_upon_', '(?i)\\son\\s', '(?i)-on-', '(?i)_on_', '(?i)\\sunder\\s', '(?i)-under-', '(?i)_under_', '(?i)\\sand\\s', '(?i)-and-', '(?i)_and_', '(?i)\\sle\\s', '(?i)-le-', '(?i)_le_', '(?i)\\sthe\\s', '(?i)-the-', '(?i)_the_']
Dataframe with lowercase placename conjunctions
-----------------------------------------------
town
0 Poulton-le-Fylde
1 Stratford upon Avon
2 Newcastle upon Tyne
3 Newcastle_under_Lyme
4 Isle of Wight
5 Stow-on-the-Wold