Retrieve unique values from dataframes - lvphj/epydemiology GitHub Wiki
A function to retrieve unique values from one or more data frames
myDF = epy.phjRetrieveUniqueFromMultiDataFrames(phjDFList,
phjSort = True,
phjPrintResults = False)
This function takes a list of dataframes and returns a dataframe of unique values that occur in the variable names listed.
Function parameters
List containing Pandas dataframes from which unique values will be extracted. A single dataframe can also be passed.
List of variable names from which unique values will be extracted. A single variable may also be passed. The variable names need to exist in all dataframes passed in phjDFList.
phjSort (default = True)
Sort values in returned dataframe. Sorting will be performed using variables in the order given in phjVarNameList.
phjPrintResults (default = False)
Print results at various points.
Exceptions raised
Pandas dataframe containing unique values.
Other notes
An example of the function in use is given below:
Single dataframe
phjTempDF = pd.DataFrame({'a':[1,2,3,4,5,6,1,2,3,4,5,6],
print('Single variable')
phjOutDF = epy.phjRetrieveUniqueFromMultiDataFrames(phjDFList = [phjTempDF],
phjVarNameList = 'a',
phjSort = True,
phjPrintResults = True)
print('Multiple variables')
phjOutDF = epy.phjRetrieveUniqueFromMultiDataFrames(phjDFList = phjTempDF,
phjVarNameList = ['a','b'],
phjSort = True,
phjPrintResults = True)
To give results:
Single variable
Unique values in dataframe at position 0
0 1
1 2
2 3
3 4
4 5
5 6
Dataframe of unique values from all dataframes
0 1
1 2
2 3
3 4
4 5
5 6
Multiple variables
Unique values in dataframe at position 0
a b
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
5 6 f
8 3 w
Dataframe of unique values from all dataframes
a b
0 1 a
1 2 b
2 3 c
3 3 w
4 4 d
5 5 e
6 6 f
Multiple dataframes
df1 = pd.DataFrame({'m':[1,2,3,4,5,6],
df2 = pd.DataFrame({'m':[2,5,7,8],
phjOutDF = epy.phjRetrieveUniqueFromMultiDataFrames(phjDFList = [df1,df2],
phjVarNameList = ['m','n'],
phjSort = True,
phjPrintResults = True)
To give results:
Unique values in dataframe at position 0
m n
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
5 6 f
Unique values in dataframe at position 1
m n
0 2 b
1 5 e
2 7 g
3 8 h
Dataframe of unique values from all dataframes
m n
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
5 6 f
6 7 g
7 8 h