Next Gen File Management - gotolinux/gotolinux.github.com GitHub Wiki
Next Gen File Management
One of the most unique things about GotoLinux is the planned next generation file management system. One of the most frustrating things about operating systems today is just how dumb file systems remain. This leads to all sorts of speciality applications that try to manage your files in all different sort of incompatible ways. Photo managers, music managers, drop boxes and so on, all vying to wrestle control of your files. The worst part is how easily it is to end up with a slew of duplicate files on your system after you import some files into one of these managers. The actual file system turns into a mess.
GotoLinux plans to move beyond these programs with a more fundamental solution in two specific ways.
First it will ensure every file has an associated type. Not by an optional extension or magic mime-typing, but an actual metadata field. The trick here however is that not all file systems support file metadata. In those cases it will use a fallback scheme akin to the old AmigaOS .info
files. This isn't a perfect solution because users may forget to ensure the metadata files get copied along with the main file when old-school tools are used. But with enough hype to ensure users know what happens when certain file systems are used (e.g. FAT32), they will hopefully remember to stay on their toes.
Having a file type at the file systems disposal will allow the files to be handled more conveniently in much the same way that object-oriented languages can more effectively handle data the is in the form of an object.
Second, GotoLinux will use a storage scheme that will keep copies of all (stand-alone) files in a central location with a content-based checksum for identification and symlinked via a GUID (Globally Unique ID) so the file can be tracked even when it changes.
For example, lets say Bob has a school paper to write. It's for Civics class and is due in 10-12, so he calls it Civics1012.lyx
.
home/
bob/
Civics1012.lyx
He gets a good start on it, and all is well. While he could wait for the task system (e.g. cron job) to get around to filing his new document automatically, perhaps he is an adept system manager and decides to do it manually, with something like:
$ File Civics1012.lyx
Then in the central file system (wherever that may be in the file hierarchy is yet to be determined), at least two entries will be made, a checksum entry for the contents of the file and a GUID which points to the checksum entry. While 128 bit cryptographic hash functions would be sufficient, for increased protection from collision we use 256 bit hashes. The GUID is a double GUID (two standard GUIDs put together). In addition the directories are sub-categorized by the first four digits into two sub-directories. This helps prevent any single directory from having a huge number of files.
/Index/
csum/
70/
e7/
70e7c7533d94c4067eabfbe424b15cad2ec7a864520ea65e691f46a2eff4e7bd
guid/
55/
0e/
550e8400e29b41d4a716446655440000550e8400e29b41d4a716446655440000 ->
/Index/csum/70/e7/70e7c7533d94c4067eabfbe424b15cad2ec7a864520ea65e691f46a2eff4e7bd
The original file is then replaced with a link to the GUID.
Civics1012.lyx ->
/Index/guid/55/0e/550e8400e29b41d4a716446655440000550e8400e29b41d4a716446655440000
Now the file is uniquely identified and no two identical files will ever take up more that one copy on disk. Yet any number of links to the one file can be made, and are automatically handled.
To further compress the file name the 256 bit hash could be converted to base 32 or base 36, instead of hexadecimal, but we will use the hexadecimal notation here. The exact numeracy of these ids can be worked out later, giving consideration to optimal performance.
There are two difficulties with implementing this system.
-
How to determine if a file has change and thus needs to be updated in the file index without the end user having to manually file it.
-
How to handle project directories which need to be treated as a single entity, more so than as a collection of individual files.
These issues need further consideration.
One last note: It is entirely possible that [git](git-scm.org] could be used to handle all of this. Git handles version control in much the same manor. Although, it might need some minor modifications to do the whole job as required and at the very least some serious porcelain to make git sane for end users!