Spoiwo - norbert-radyk/spoiwo GitHub Wiki

Overview

Spoiwo is an open-source library for functional-style spreadsheet generation in Scala. It was started as a wrapper over Apache POI and while the XLSX generation is still at its core, the library has been rectified to also export to CSV.

The library tries to address the issues Scala developers face when using spreadsheet libraries for Java and which represent a highly non-functional way in which the spreadsheets need to be generated (mutable state, enforced indexes, execution order dependency). To address these issues SPOIWO introduces its own spreadsheet model with the number of wrapping classes and caches enabling an efficient report generation.

Problems with POI

  • Mutability - in Apache POI the spreadsheet model is implemented as a collection of the Java POJO beans, each of them being a highly mutable entity with multiple getters and setters. This provides the ability to change the internal state of the objects after their construction which shouldn't be required for the reports generation and has multiple serious implication (including greater complexity and lack of thread safety).

In Spoiwo we introduce a new, completely immutable spreadsheet model which benefits from all the positive aspect of immutability.

  • Enforced indexes - in Apache POI whenever you create a row or a cell you are required to explicitly pass its index. This approach not only requires you to keep track of the currently used indexes, but also introduces the risk of accidentally overriding a row/cell (especially should you require some modification to spreadsheet at later stage).

In Spoiwo we believe that in majority of cases indexes can be left out and should you require to leave a blank row or column then it makes sense to specify it explicitly rather than fiddle with indexes.

  • Structural coupling - in Apache POI a child object can be only created from its parent object. This way you can only create a cell from the existing row, a row from the existing sheet and a sheet (or a cell style) from the existing workbook. Such approach makes the system design and structuring your code more difficult.

In Spoiwo the model represents a set of independent objects, which are bound together during composition. You can have a class responsible for generating a single sheet and another one for generating a list of predefined styles without worrying about passing the workbook to them. This way you can be much more flexible in structuring your code.

  • Order dependent execution - in Apache POI some of spreadsheet formatting can depend upon the order in which some functions are executed (i.e. first setting up data and then adjusting a column width will have a completely different result than first adjusting width on an empty column and then setting up its data). This creates yet another challenge in Apache POI, which developers need to worry about.

In Spoiwo the model represents the final state of the spreadsheet you're looking to achieve and the same 2 spreadsheets models will be always converted to the same result. Our conversions implemented internally ensure that spreadsheets are converted correctly and always generate expected results (i.e. if you marked column to be autosized it will be always adjusted to the full set of data in the column).

  • Duplicate objects - in Apache POI all model objects are created on user request even if they're a duplicate of the already existing objects (this is the consequence of the mutable state, as we can't prevent the user from changing a particular object at later stage). For some expensive resources (i.e. CellStyle or Font) this requires user to efficiently manage and reuse resources in his program or might lead to the memory issues.

In Spoiwo we handle this problem behind the scenes, when user tries to create a duplicate of an existing object, then the already existing object is used instead. Should the user need to modify it at later stage then the modified copy will be added to cache (if it doesn't already exist).

What's more

However Spoiwo it's more than just a wrapper over Apache POI. Our goal is to be a one-stop-shop for all spreadsheet operations required by Scala users and to do that we realize we need to support a number of different conversions.

At the moment Apache POI supports XLSX and CSV spreadsheet generation. In the future we might extend the library with additional spreadsheet formats (i.e. Open Office format) and with ability to create a model from the existing data.

Summary

In this short overview hopefully we've managed to show why Spoiwo should be considered an alternative to using a plain Apache POI by Scala users. Immutability and avoiding the boilerplate code are at the core of Spoiwo design and we genuinely believe they will lead to a better and more rewarding coding experience. Therefore please don't hesitate and get started with Spoiwo now.