I've decided to not have separate staging and commits. If this turns out to be useful in the future it won't be difficult to add. For now though I'm merging the indended add, scan and commit commands into just one add command that can directly add files and folders to Photosphere.
I've decided not to bother with unit testing for the command functions, it's just too much work for too little return. The smoke tests are working out really well and hopefully will cover my CLI testing needs.
Sort indexes
The sort index embeds entire records making it very fast to retreive the set of media files page by page in sorted order. It's fast to retreive sorted files, but it's a bit more expensive to update a record because it means copying all record data from the database shard and in all sort indexes, using a binary search to find the and update the record in each sort index. An alternative way to do this is just store the record id in the sorted index but this would mean an indirection that makes loading sorted records very expensive.
The sort index for getting media files in sorted order is initially generated using a k-way merge. This is still pretty expensive to generate but only if its done in one hit for a big database. It is memory efficient though so it can handle huge databases, even if it takes significant time to generate. Normally the sort index should be built incremently though, so in normal circsumstance it should not be required to rebuild the entire sort index.
Sort indexes do need a metadata file because otherwise we have no way of knowing if an empty sort index has been created.
I considered simplifying the results of a query from the sort index, but decided against it because the information that comes back (e.g. totalRecords) could actually be really useful in the future.
After realizing that adding a record to a sort index and splitting a page file breaks the sorted order of the records I've had Claude convert the data structure to a b-tree.
The b-tree seems to make the k-way merge unecessary.