Examples: Resource Usage and How Big Spark Excel Can Handle? - crealytics/spark-excel GitHub Wiki

Purpose: Find out the resource usage and limitation of spark-excel

Limitation

Under the hood, spark-excel relies on the Apache POI to do everything excel. Here are the limitation of single excel file, copied from SpreadsheetVersion in the Apache POI document (in the reference)

EXCEL97 format aka BIFF8 (xls)

  • The total number of available rows is 64k (2^16)
  • The total number of available columns is 256 (2^8)
  • The maximum number of arguments to a function is 30
  • Number of conditional format conditions on a cell is 3
  • Number of cell styles is 4000
  • Length of text cell contents is 32767

Excel2007 (xlsx)

  • The total number of available rows is 1M (2^20)
  • The total number of available columns is 16K (2^14)
  • The maximum number of arguments to a function is 255
  • Number of conditional format conditions on a cell is unlimited (actually limited by available memory in Excel)
  • Number of cell styles is 64000
  • Length of text cell contents is 32767

Spark-excel supports read and write multiple excel files, so the total number of rows in data frame for both reading or writing just depend of resource available and how we partitioning the data in writing.

Let do some tests

TBD

References