020Redesign - boxbackup/boxbackup GitHub Wiki

Redesign for 0.20

Background

Box Backup was originally designed with a particular use in mind. After open sourcing the project, it's been used in many more ways than originally intended.

In addition, as with any project, some of the original decisions have been shown to be non-optimal. While the original code strived for simplicity, it sometimes has done so at the expensive of making new features algorithmically expensive and troublesome to code.

In the light of experience, we need to adjust some of the core. This will be incompatible with older versions.

File Selection

Exclude/AlwaysInclude is cumbersome and has many limitations. For example, what do you do if you want to back up /home/chris, not /home/chris/music but include /home/chris/music/mp3/some/band/album? You have to write:

Path              =  /home/chris
ExcludeFilesRegex = ^/home/chris/music/
ExcludeDirsRegex  = ^/home/chris/music/
AlwaysIncludeDir  =  /home/chris/music/mp3
AlwaysIncludeDir  =  /home/chris/music/mp3/some
AlwaysIncludeDir  =  /home/chris/music/mp3/some/band
AlwaysIncludeDir  =  /home/chris/music/mp3/some/band/album
AlwaysIncludeFilesRegex = ^/home/chris/music/mp3/some/band/album/
AlwaysIncludeDirsRegex  = ^/home/chris/music/mp3/some/band/album/

And if you want to exclude a file inside that directory, you can't override the AlwaysIncludeFilesRegex you have to rewrite it to specifically include just the files that you do want.

To solve this, I'd propose renaming AlwaysInclude to Include, and make Include and Exclude an ordered list, with rules applied in sequence. Then you could write:

Path              =  /home/chris
ExcludeFilesRegex = ^/home/chris/music/
ExcludeDirsRegex  = ^/home/chris/music/
IncludeDir        =  /home/chris/music/mp3
IncludeDir        =  /home/chris/music/mp3/some
IncludeDir        =  /home/chris/music/mp3/some/band
IncludeDir        =  /home/chris/music/mp3/some/band/album
IncludeFilesRegex = ^/home/chris/music/mp3/some/band/album/
IncludeDirsRegex  = ^/home/chris/music/mp3/some/band/album/
ExcludeFilesRegex = ^/home/chris/music/mp3/some/band/album/*.txt

In addition, it would be nice to have a way to tell Box to descend into a directory looking for included files, but not to automatically back up all of its contents. Then you could write:

Path              =  /home/chris
ExcludeDir        =  /home/chris/music
Descend           =  /home/chris/music
IncludeDir        =  /home/chris/music/mp3/some/band/album
ExcludeFilesRegex = ^/home/chris/music/mp3/some/band/album/*.txt

Scanning the disc for changes is sometimes necessary, but modern OSes offer more efficient ways to keep track of file changes while Box Backup is running.

We should implement a generic interface for change notifiers, which can either be a disc scanner or an OS notification. The change notifications build an "upload set" which is a list of all files which need uploading.

Filters would be lovely to have, so that external processes can filter files and transform database files into dumps of their SQL.

Upload Engine

This should support much more complete notification of progress to a GUI based UI or bbackupctl, and support suspension and resume.

The Boxi patches include significant extensions to progress notification, that could easily be used to send notifications over a command socket, although Boxi currently links directly with Box code and uses the notifications as callbacks to update the GUI. The notification stubs in BackupDaemon are currently empty for backwards compatibility. --Chris

Some notes on exactly what's needed over the command socket for a good GUI client would be useful. --Ben

Another thing I would like to see is the ability for bbackupd to back up to a local disk, using the store file format. This would allow, on the first visit to a customer site, all their data to be backed up to a removable hard disk, which is then physically moved and copied to the store. Then only the incremental changes to that backup are sent over the link. It would also be useful for unit testing on platforms where the backup server cannot run, such as Windows. --Chris

To a certain extent this can be done now by using a local bbstored. However, perhaps we could create a mini-store daemon which pretends to have nothing, and serialises everything to a single file? And something at the other end which pretends to be a client? This would mean that the code for the client and server wouldn't have to be modified to cope. We could abstract out the interface to writing the files, but this just adds complications to the code. --Ben

Streams in Files

We need to support multiple streams of data in a file, for resource forks under Mac OS X and multiple streams in NTFS. There are two options; either to change the format to include multiple streams, or have multiple entries in the directory (or multiple references to files within a single directory entry.)

Attributes

While reasonably extensible, the existing attribute system is cumbersome. It has a type field, but affects the whole record, and Windows doesn't use it anyway. A file should contain multiple records, each with type and size, and a client can use any or all of the information it requires.

Although it does mean a file format change, it would probably be more resilient to have a single directory entry point to a list of streams. As an optimisation, perhaps have two types of directory entry; single-stream ones, which are bitwise identical to current directory entries (making upgrade painless), and multi-stream ones and point to a list of streams, instead of to the first stream. My reason for this is that otherwise directory operations code will always be having to group all the stream entries in a directory together to report as one file, which is just fiddly. -- Alaric 10:53, 24 February 2006 (AST)

The Windows code will need to be adjusted to store Windows ACLs as extended attributes. --Chris

Store

This needs the most fundamental changes. Currently housekeeping is the biggest problem, as it needs to scan every single directory to build a list of files to delete. When this takes longer than the period of connections from bbackupd, housekeeping is never actually performed.

I propose moving to a reference counted store. Each object has a reference count, and when it reaches zero, housekeeping deletes it. bbstored updates these counts during the connection, meaning that the store only has to be scanned when it is checked for errors.

Note that the store will still need to rewrite some of the file objects to allow it to do the "changes only" store of updates to files.

I would suggest making housekeeping a function of the client, not the server. The client should be able to decide an arbitrary policy on which files should be kept, and for how long, and then instruct the server to delete a file when necessary to keep it under quota and allow the client to upload new files. The client would issue a command, something like DeleteFileNow, to the server. Assuming that the server stores old versions as reverse diffs, deleting the oldest diff should be trivial, and deleting a non-oldest diff would merge it with the one after, or the current version, potentially saving the space used by overlapping change blocks. Deleting the newest version of a file with diffs would not be possible. Deleting the current version of a file with no diffs would be easy. --Chris

The code for deleting diffs in the middle of the chain already exists. Not sure about making the client responsible for individual files, though, sounds like too much for the client to keep track of. It could delete backup sets, which causes the server to decrement ref counts and eventually delete the actual files. --Ben

Directory structure

I would like to completely change the way directories are stored. This allows 1) the directories to be signed and 2) to have a lovely way of going back in time.

The directory is a hierarchy, as now, with a directory referring to subdirectories. However, we have a new object, which is a "backup set". This specifies a root directory, and a list of grafts.

A graft is a pair of (original dir ID, replacement dir ID). So when you are scanning a tree, if you see a reference to the original dir ID, you use the replacement one instead. This allows efficient replacement of leaves of the tree -- otherwise we would have to duplicate it from the root.

Filenames are no longer encrypted as separate objects, as this was shown to be a pain and annoying inefficient. Instead, the directory is stored as a single encrypted stream. This is possible as the store no longer needs to modify them. However, the referred object IDs need to be stored in the clear so the server can use them, but of course, these are included in the signed data so the server can't modify them without detection.

NOTE: Needs to cope with large directories with lots of changes happening, eg spools, Maildirs, etc.

Attributes are only stored in the directory, never in the file. (Although, what about xattrs, which could be "big"? But relevant, because they can store ACLs.)

Each time bbackupd connects, it makes a new backup set, marked with the current time. Rules are specified to the server (or client?) to say when a backup set can be automatically deleted.

By default bbackupquery will use the latest backup set, but can be instructed to use a different one.

There is an elegant technique I'd like to suggest here. See Venti, an archival storage manager. Note that it is archival, not backup - it assumes objects in the store will be there forever; to make it a backup system, you would need to reference count each object and remove them when they're not needed.

The magic of Venti is that it references stored blocks by their SHA1 hash. This means that if you try and insert the same block twice, it will just store it once, and give you back the same hash. Now, this does mean it's slightly vulnerable to hash collisions - I would suggest making it slightly slower by forcing it to do a bytewise comparison of apparently identical blocks being inserted, and then making the keys be an SHA1 hash plus a variable-length collision number!

Blocks are up to some limit in size (say 64k). If a file is larger than that, store it as a series of blocks with a block containing a list of pointers. Be careful - sign, but don't encrypt, the indirect blocks, so they can be traversed to rebuild the reference counts if the server is shut down messily. When storing directories, you'll need to encrypt the file metadata, but not the pointers to data and indirect blocks - perhaps if you have multi-stream files you can keep the filename in the xattr stream, and just have the directory entry contain a list of streams and their lengths, how many levels of indirection they use, and a pointer to data (if 0 levels) or the top level of indirect blocks.

Store the reference count in the B-Tree index for each block.

Now, at the top of the tree for a given backup, you create a top-level backup set object, which goes into a linked list to be the root set for garbage collections. The neat thing is, you don't need to be explicit about incremental updates - just share existing subtrees. In fact, a naive dump algorithm uploads everything, resulting in all the unchanged files being already found on the server, and the server only storing the changed objects and their parents in the tree. See the discussion of vac in the Venti paper for more details on how to optimise this.

When an old backup set is dropped, the reference count on its root directory block must be decremented; if it becomes 0, then recursively decrement all the children and delete the block, and so on. -- Alaric 11:09, 24 February 2006 (AST)

Lazy Mode

This was the only mode the original supported. It has it's pros and cons. The original decision was made to ensure a slow trickle of data across a broadband connection.

The above may make supporting lazy mode a bit tricky, and we should decide whether we want to keep it, and if so, in what form. For example, directory entries may need to be marked as "changed but not included", prompting the restore to look in future versions.

This is potentially messy.

Regular backup sets can be made by keeping uploading changed stuff, then every so often making a backup set object pointing to all the recent changes - and handle client/server failure during this without getting the uploaded-but-not-yet-made-into-a-backup-set stuff lost.

I don't see why a lazy backup should be any different to an explicit snapshot. In either case, files modified too recently will not be backed up. --Chris

Protocol

The current protocol can be a bit inefficient in uploading lots of little files, because it needs to send a command and receive a reply after each one. It might be a nice idea to implement a "bulk" version which doesn't need these acknowledgments by sending a stream of data.

Another possibility is to change the protocol to get the client to allocate the object IDs, not the server. Then the client can just keep on sending data and commands and handling the results asyncronously -- the only reason the client can't keep on sending stuff now is because it needs to wait for the object ID to be returned.

Resuming Uploads

Torsten reports:

again this is my problem. In Germany sometimes the DSL provider kills the connection every 24 hours.

Before i try to understand the source code. Is it possible to implement this function with box backup?

Unfortunately there is no support for resuming interrupted uploads at the moment. It would be possible to implement, but not trivial. The simplest thing would be for you to find another DSL provider who doesn't do such nasty things to you :)

The way I'd implement it is something like this. Remembering that the file could change in between upload attempts, we can't simply assume that the data already on the server (the partially-completed upload) matches the data currently in the start of the file on the client side (although that might be an easy way to get started).

In bin/bbstored/BackupCmmands.cpp, BackupProtocolServerStoreFile handles file uploading. The easiest case is full uploads, but we'll have to deal with patch uploads as well, otherwise the resumed upload might fail for the same reason.

We'd need to catch the exception that gets thrown when the connection is interrupted, probably Connection TLSReadFailed, and save what we have received so far as a file, marked in the directory with a new flag, maybe PARTIAL. If the upload is a full upload, we can just save the bytes that we received, but if it's a patch then we'll have to construct a new version with whatever information we have. We need to make sure that the recipe is sent before the data blocks for this to work.

We'd probably have to generate some kind of block index ourselves, but since the normal block index relies on checksums of the unencrypted data, this might require us to switch to, or add support for, using encrypted data checksums + IVs in the block index instead (this would also allow the server to verify consistency of each file in bbstoreaccounts check). Alternatively we could send the block checksums along with each block, so the server knows all the checksums of complete blocks that it's received so far, and can construct a valid block index.

When bbackupd tries to decide whether to upload a new file or a patch, in BackupClientDirectoryRecord it should check for a PARTIAL file and try to send a diff against it, using a modified diffing algorithm that works on encrypted data checksums + IVs. This would allow us to start uploading just the changes.