processLargeFile - cfsimplicity/spreadsheet-cfml GitHub Wiki

Reads an XLSX spreadsheet file using streaming and processes each row using a specified User Defined Function without returning any data. Highly memory efficient and suitable for large files.

NOTE: Only XLSX (XML format) spreadsheet files can be processed using this method. It cannot read binary XLS files.

By default the method returns an object to which further optional calls can be chained before finally calling execute() to process the file.

processLargeFile( filepath ).withRowProcessor( function )[...optional configuration calls].execute();

Chainable? No.

To read a large XLSX file into a query or CSV, see readLargeFile()

Passing in a row processing function

To specify how each row should be processed, write a function which accepts as its arguments:

  1. the current row values as an array
  2. the current row number
  3. the array of columns/headers if present
processor = function( array row, numeric rowNumber, array columns ){
  //insert row into a DB, or add to an array, or filter based on the row's values or its row number, etc.
}
spreadsheet.processLargeFile( filepath )
  .withRowProcessor( processor )
  .execute();

Specifying the sheet to process

If your spreadsheet workbook contains more than one sheet you can specify which one to process by name or number. By default the first sheet will be used.

spreadsheet.processLargeFile( filepath )
  .withRowProcessor( processor )
  .withSheetName( "Sheet2" )
  .execute();
spreadsheet.processLargeFile( filepath )
  .withRowProcessor( processor )
  .withSheetNumber( 2 )
  .execute();

Ignoring the header row

If your sheet has a header row which shouldn't be processed then simply add withFirstRowIsHeader():

spreadsheet.processLargeFile( filepath )
  .withRowProcessor( processor )
  .withFirstRowIsHeader()
  .execute();

Skipping rows at the start of the sheet

If you want to skip some rows before starting processing, use withSkipFirstRows( N ):

spreadsheet.processLargeFile( filepath )
  .withRowProcessor( processor )
  .withSkipFirstRows( 5 )
  .execute();

Using visible/formatted values

By default values are processed "raw", i.e without any applied formatting. If you want to use visible/formatted values add withUseVisibleValues(). This can be useful with dates or numbers, where you might for example want to avoid scientific notation.

spreadsheet.processLargeFile( filepath )
  .withRowProcessor( processor )
  .withUseVisibleValues()
  .execute();

Password-protected files

If your spreadsheet file is encrypted, use withPassword():

spreadsheet.processLargeFile( filepath )
  .withRowProcessor( processor )
  .withPassword( "topsecret" )
  .execute();

Streaming options

Your spreadsheet will be read using the Excel Streaming Reader which can be tuned using 2 optional parameters: bufferSize (number of bytes to use) and rowCacheSize (number of rows to process at a time). The defaults are 1024 bytes and 10 rows but you can adjust these by passing in a struct as follows:

options = {
  bufferSize: 2048,
  rowCacheSize: 20
};
spreadsheet.processLargeFile( filepath )
  .withRowProcessor( processor )
  .withStreamingOptions( options )
  .execute();