Load data from a named cell range in an Excel workbook - lvphj/epydemiology GitHub Wiki
Python function to read data from an Excel workbook named cell range into a Pandas dataframe.
phjReadDataFromExcelNamedCellRange()
df = epy.phjReadDataFromExcelNamedCellRange(phjExcelPathAndFileName = None,
phjExcelCellRangeName = None,
phjDatetimeFormat = "%Y-%m-%d %H:%M:%S",
phjMissingValue = "missing",
phjHeaderRow = False,
phjPrintResults = False)
Description
This function can be used to import data from a named range in a Microsoft Excel workbook. The function receives the full path and filename of the Excel document, together with the name of the named cell range of interest, and returns the data as a Pandas dataframe.
Function parameters
-
phjExcelPathAndFilename
The full path and filename of the Microsoft Excel workbook. If the file does not exist, the function prompts to re-enter a correct path and filename.
-
phjExcelCellRangeName
The name of the cell range of interest. It is import to ensure that the name range only occurs once in the workbook.
-
phjDatetimeFormat (default = "%Y-%m-%d %H:%M:%S")
The datatime format that will be used to display columns containing date (and time) data.
-
phjMissingValue (default = "missing")
A string or code that is used to replace empty cells.
-
phjHeaderRow (default = False)
Indicates whether the cell range has a header row. If so, the values in the first row of the cell range are used as the headings of the dataframe columns; otherwise, the columns are given default, generic names.
-
phjPrintResults (default = False)
Print the imported results.
Exceptions raised
None.
Returns
Pandas dataframe containing data read from named cell range.
Other notes
None.
Example
An example of the function in use is given below. An Excel workbook named 'myWorkbook.xlsx' is stored on the Desktop. The workbook contains several individual worksheets, one of which contains a named cell range called 'myCellRange', the first row of which contains the names of the columns. The data can be imported into a Pandas dataframe using:
# The following libraries are imported automatically but are included
# here for completeness.
import pandas as pd
import openpyxl
import epydemiology as epy
myDF = epy.phjReadDataFromExcelNamedCellRange(phjExcelPathAndFileName = '/Users/username/Desktop/myWorkbook.xlsx',
phjExcelCellRangeName = 'myCellRange',
phjDatetimeFormat = "%Y-%m-%d %H:%M:%S",
phjMissingValue = "missing",
phjHeaderRow = True,
phjPrintResults = False)