Significant Properties of Spreadsheets - Asbjoedt/CLISC GitHub Wiki

Introduction

Open Preservation Foundation's Archives Interest Group have collaborated on analyzing the significant properties of spreadsheets, which resulted in a final report published in 2021 and presented at the 2021 iPRES conference.

The collaboration involved employees from the National Archives of the Netherlands, the Estonian National Archives, Preservica and the Danish National Archives.

Spreadsheet Complexity Analyser

Remco van Veenendaal from the National Archives of the Netherlands developed a tool called Spreadsheet Complexity Analyser to programmatically assess the complexity of a spreadsheet by reading the content such as macros, hyperlinks, number of used cells, number of fonts etc. These numbers are then set against defined thresholds to determine whether the spreadsheet is simple or complex.

The purpose of this was, as Remco mentions in the repository:

The main reason for making this distinction is (cost-efficiency w.r.t.) normalisation and preservation: our hypothesis is that simple spreadsheets can be normalised to and preserved as e.g. PDF(/A), while more complex spreadsheets require a spreadsheet-specific file format. There is a lot of knowledge of and expertise in working with PDF(/A) in archives. Being able to preserve some percentage of spreadsheets as PDF(/A) is cost-efficient. Our work should also result in choosing the best suited spreadsheet-specific file format for the complex spreadsheets.

Links