Commits for the TOC pull request - letsfindaway/OpenBoard GitHub Wiki

The table-of-contents pull request is a significant undertaking consisting of 13 commits that add or change 2,390 lines of code and affect 102 files. This documentation will describe the work done in each commit.

refactor: let items know their media assets

Currently, the graphics items do not know the relative dependencies to the media asset files. Instead, the scene has functions that derive that information depending on the item type. This commit moves the knowledge about media asset files to the items themselves, making media asset management much easier.

Media asset items are graphic items that have one or more backing files. The following is a list of all media asset item classes:

UBGraphicsMediaItem with the derived classes UBGraphicsAudioItem and UBGraphicsVideoItem
UBGraphicsPDFItem
UBGraphicsPixmapItem
UBGraphicsSvgItem
UBGraphicsWidgetItem with the derived class UBGraphicsW3CWidgetItem

We introduce a common base class UBMediaAssetItem, which is derived from UBItem. This base class defines a common interface for these classes containing the pure virtual function mediaAssets(), which all derived classes must implement, as well as the optional function setMediaAssets(), which can optionally be implemented by a derived class. These functions provide a consistent method for items to identify and communicate their media assets.

We made a small change to UBSvgSubsetAdaptor so that a UBGraphicsPixmapItem always knows the underlying image file name.

We removed the clearSource() function from UBItem and all its derived classes. Previously an item could delete the associated media asset files. However, we now want to reuse the same media asset file for several items. This requires moving the management of these files out of the items.

We changed UBGraphicsScene::relativeDependencies() to return a list of strings instead of a list of URLs. Since all invokers of this function previously converted the result to a string, it makes more sense to return a string here. This causes minor changes in UBBoardController, UBDocumentController and UBPersistenceManager. The new function is much simpler because we can use the UBMediaAssetItem::mediaAssets() function instead of the previous switch statement which had to distinguish and handle each item type differently.

We introduced an additional function UBGraphicsScene::mediaAssetItems() to return the media asset items of a scene.

refactor: single place to copy scenes

As discussed previously in https://github.com/letsfindaway/OpenBoard/issues/198, there are several different implementations of copying a scene in OpenBoard, each with slightly different behavior. This commit unifies these implementations and introduces a single method for copying a page within or between documents. Ultimately, this commit deletes more lines than it adds (123+/296-).

To accomplish this, we extend the UBDocument::copyPage() function to handle all scene copying and use it everywhere.

When copying a scene, we keep the item UUIDs because they only need to be unique within the scene. However we exchange the scene UUID. This is especially important when duplicating a scene, as it helps distinguish between the original scene and the copy. Note that the TOC uses the scene UUID to identify scenes and relies on their uniqueness.

performance: hash map for fast item lookup by uuid

In some cases, it is necessary to look up an item in a scene by its UUID. This has been implemented in UBGraphicsScene::itemForUuid() as a loop over all the scene's items. There was already a comment recommending replacing this with a map lookup. This commit implements that recommendation. The hash map and access function are now part of UBCoreGraphicsScene. Since all items are added or removed from a scene using functions of this class, map is naturally handled here.

Therefore, it is no longer necessary to store the item UUID in an additional QGraphicsItem data attribute, so we removed it. Each item no longer needs to implement its own, mostly identical copy of setUuid(). There is now one single implementation in UBItem.

Some usages to get the UUID in UBSvgSubsetAdaptor and UBGraphicsScene had to be adapted.

feat: asset based UUID for name of media asset files

This commit implements an important feature that allows media asset files to be reused for several items. The idea is to use SHA-1-based UUIDs for the names of media asset files. Such a UUID is not random but rather derived from the file contents in a deterministic way. Nevertheless, it guarantees that UUIDs for different files will likely be different. We added the function mediaAssetUuid() to UBMediaAssetItem to calculate these UUIDs from arbitrary data.

Using these UUIDs as file names allows us to compare file names instead of content to determine if two files are identical.

It is now crucial to distinguish between the item UUID and the media asset UUID. Previously, these were often the same, but not always. This required changes to the UBSvgSubsetAdaptor, UBBoardController and UBDocumentController. When copying a media asset file in the UBPersistenceManager it is no longer an error if the target file already exists. This just indicates that another item is using the same media asset file, so we can reuse the file instead of copying it.

cleanup: remove UBForeignObjectsHandler

We can now remove the UBForeignObjectsHandler because it is no longer used anywhere. We only had to remove some #include statements and commented cod.

drop: CFF support

Next, we removed was support for the CFF file format. Export support had already been disabled, and import support was broken. To avoid incorporating such obsolete code into our subsequent work, we decided to remove these classes.

The related code in UBDocumentManager was also removed.

feat: add table-of-content (TOC)

Now again a more constructive - and arguably the most important - commit: adding the basics of a table of contents (TOC) to the code. A new class, UBToc, holds the TOC data, and a UBTocSerializer loads and saves the TOC as JSON. A UBToc instance is created and managed by UBDocument.

For an in-depth discussion of the TOC see here: Introducing an updated document format. The most important thing to understand about this commit is that the page file names like page000.svg and page numbers are now no longer directly related. Instead we use the TOC to look up the page file name by page number.

We now use the following naming convention:

The page index is the page number minus one. The first page has an index of 0 and so on.
The page ID is the number used in the file name.

Previously, the page index and page ID were identical. Now, a lookup in the TOC is necessary to translate a page index to a page ID. See here for a deeper discussion.

For the classes and their APIs this means:

UBDocument functions always deal with a page index.
UBPersistenceManager functions always deal with a page ID.

Therefore, this commit had to touch all places where the UBPersistenceManager was called. A corresponding function in the UBDocument had to be created to perform the translation.

Conversely, we could remove much of the code from the UBPersistenceManager that was previously used to rename files when pages were inserted, deleted, or moved. These operations are now simple tasks performed by the TOC in the UBDocument.

To facilitate the transition from documents without TOC to the new document format we also implemented a scanner in the UBDocument::scan() function that creates a TOC by scanning the document directory and its contained scenes. This scanner will be further improved in later commits.

To distinguish documents created or modified by previous versions of OpenBoard we updated the document version from 4.8.0 to 4.9.0. This update also causes a warning dialog to display to users of previous OpenBoard versions when they attempt to open a version 4.9.0 document.

refactor: switch pageCount to UBDocument

Now, that the TOC is in place, there is no need to store the document page count in UBDocumentProxy or to increment or decrement it when pages are added or removed. Instead, we can simply ask the TOC for the number of entries.

Therefore, the importance of the UBDocument over the UBDocumentProxy increases in some places, and we adapted some function signatures to use this.

We can now use UBDocument::pageCount() instead of UBDocumentProxy::pageCount() and have removed the latter.

We also removed UBPersistenceManager::sceneCount(), which determined the number of pages by iterating over the files in the document directory. A similar task is now performed by UBDocument::scan() when a document must be converted.

Using a TOC made deleting all scenes of a document more difficult when it was no longer used. Previously, we simply iterated over the file names starting from 0; however, the page IDs may no longer be contiguous. Therefore, we changed the type of the scene cache container from QHash to QMap to guarantee that scenes are sorted by document in the cache. This allows us to easily identify and delete the affected entries.

feat: improve handling of media asset UUIDs

This commit improves the handling of media asset UUIDs and is an improvement to the commit that introduced asset-based UUIDs. It includes:

Keeping a separate asset UUID for SVG items.
Renaming the function mediaAssetUuid() to createMediaAssetUuid() clarifies that it is not an access function.
Make setMediaAsset() pure virtual and implement it in all derived classes. There is no default implementation! Each derived class must handle this accordingly.
Add UBMediaAssetItem::uuidFromPath() to derive a UUID from a path.
Add UBMediaAssetItem::mediaAssetUuid() to retrieve the asset UUID.
Allow switching the renderer for UBGraphicsPDFItem when the underlying PDF file is renamed.

feat: fully convert document while scanning

This commit completes document conversion during scanning. Asset files are copied when they receive a new content-based UUID. Unreferenced asset files can be deleted at the end of the scanning process and when a document is closed.

Some now obsolete functions that were used to shift or move scenes have been removed from the UBSceneCache.

feat: renumber pages on export

To facilitate document exchange between OpenBoard versions, we added a function that renumbers the page files in ascending order during export, ensuring backward compatibility. This is now implemented for both UBZ and UBX exports.

We also added a UBPageMapper, which uses the TOC to convert the page file names to a contiguous sequence. At the same time, an updated TOC is created and also added to the export archive.

refactor: use QtConcurrent for UBBackgroundLoader

This commit refactors UBBackgroundLoader to use the QtConcurrent framework in preparation for moving document scanning to the background. QtConcurrent::mapped() provides nearly all of the functions implemented in UBBackgroundLoader and is much more flexible. For example, it can easily use multiple threads, thus improving performance.

perf: document scan in background

This final commit now applies the UBBackgroundLoader to document scanning. More work was needed on both the scanning process and the UBBackgroundLoader.

First, we found that a full scan is unnecessary when a document is selected in Document mode. It is sufficient to load the first few bytes of each scene to retrieve the scene UUID. This information is necessary to identify a scene and detect if a scene file name has been changed by a previous version of OpenBoard, so that the table of contents (TOC) can be updated accordingly. Only with this information can we display the scene thumbnails in the correct order. This scan is performed in UBDocument::scan().

In a second phase, we scan the scenes for media asset items and their associated files. This requires loading each scene. However, this process can be deferred until a document is opened. Even then, the process can be performed in the background and interrupted at any time. This scan is performed in UBDocument::scanAssets().

To accomplish this, we had to extend the UBBackgroundLoader to allow throttling, even though the results are processed in the background. A detailed description of the problem and the chosen solution can be found in Background loading.

The procedure is as follows:

When a document is opened in Board mode, the UBDocumentController calls UBDocument::scanAssets().
Here, we first ensure that the initial scan is complete and that all scene UUIDs have been identified. The first pass also removes asset information from scenes that were manipulated by an earlier version of OpenBoard. Assets may have changed, causing the associated TOC entry to become inaccurate.
Next, we assemble a list of all pages with missing asset information.
If this list is not empty, we start a background loader for it.
When the background loader delivers a result, we use it to ask the UBPersistenceManager to prepare to load that scene without adding it to the cache.
prepareSceneLoading() returns an opaque handle if the scene needs to be loaded (i.e. if it is not already in the cache). This handle is actually a shared pointer to the SceneCacheEntry managing the loading process. Since this entry was not added to the cache, we must keep it alive by storing the handle in a set.
If the background loader does however not return a result or if the scene is already in the cache, then we trigger the delivery of the next result by calling resultProcessed().
The loading process in the SceneCacheEntry is mainly controlled in startLoading(). A timer is started here, which performs stepwise loading of the scene on the main thread without blocking user interactions.
Once the scene is loaded, we call UBGraphicsScene::loadingCompleted(), which in turn informs the document by calling UBDocument::sceleLoaded(), passing a pointer to the scene and the aforementioned opaque handle.
Here, we remove the handle from the set again, which releases and deletes the SceneCacheEntry.
After that, the scene is scanned for media asset items.
If an item uses a media asset UUID that is not SHA-1 based (UUIDv5), then we calculate a new UUID and replace it. The media asset file is copied if necessary.
If the scene was modified by this process, we save it.
Finally, we trigger the delivery of the next result by calling resultProcessed().