Home - microsoft/times-excel-reader GitHub Wiki

Setting up development environment

Clone these repos next to this one:

Currently, the main script expects the ground truth to be in CSV format so run python utils/dd_to_csv.py ../times-ireland-model_gams/model ground_truth

Now you should be able to run python times_excel_reader.py to generate output.

Development tips

NOTE: the script currently uses a pickle file raw_tables.pkl to skip the slow reading of Excel files during development. If the input files change in any way, remember to delete the pickle to force a re-read.

To view dataframe contents when debugging:

  • Evaluate the dataframe variable in the immediate window/debug console

To compare output csv to ground truth:

  • Use output/*_missing.csv and output/*_additional.csv files for each table generated when the tool is run
  • Use Beyond Compare, with Rules > Alignment > Sorted, Rules > Columns > gear icon > Key

To search within a single excel file:

  • In Excel, Ctrl+F then change Within: Workbook, Look in: Values

To search in a folder of excel files:

  • Search output/raw_tables.txt (the raw dataframes extracted from the Excel files) and output/merged_tables.txt (the dataframes just after merging) using any text editor. The extraction code is quite stable and believed to be complete so it's very likely that the data from the Excel files is at least in the raw_tables file
  • Use dnGrep