Phase 1 Notes - Weil0ng/CS270 GitHub Wiki

Jon's notes on Phase 1:

Implement each of the layers (0-3) independently. Test each layer completely before linking with the layer above it.
Layer 1 is the most important logic-wise, since it must create the file system and manage the free lists, so it needs to be perfect. Layer 2 can be complicated but its functionality is more "familiar" to us.
We need to nail down our list of constants, work splitting policy, and exposed system calls per layer very early in this project.

TA's notes on Phase 1:

How we implement the in-memory disk is totally up to us (one big array or a small array for each block). However, if we use one big array we can use the mmap function to automatically link our memory with a backing file (making debugging dumps very easy to record). This is a better way than using malloc.
For Layer 0, we certainly want the basic read/writeBlock(id, bytes) function. However, we will also want some form of read/writeBlock(id, offset, bytes) function since writing at the file level may not be in complete blocks--we need to translate a file write at an arbitrary location into a block write. In other words, suppose our file is composed of 2 4-byte blocks, [0, 1, 2, 3] and [4, 5, 6, 7]. If we do writeFile at offset 2 with 4 bytes (at layer 2) this will actually be 2 block operations (at layer 1) because we are crossing the "block border". Since block operations must write the entire block, these operations will consist of reading the whole block into memory, writing the portion at the offset, and writing the whole block back to disk. So it will be helpful to have an offset function for blocks as well.
We can implement synchronization however we want. Alex said high performance parallel file reads/writes won't be an issue, BUT large parallel allocations may be. We definitely need parallel processes to be able to allocate separate inodes at the same time--we cannot simply drop the second allocation. He recommended that we implement the simplest correct logic first--it's perfectly ok to lock down the entire superblock every operation, which will make the filesystem slow. We can optimize specific operations later on when we verify that the rest of our data operations work.
We don't really have to worry about critical failures like system crashes during superblock operations. We should first ensure that our system can startup and shutdown correctly, and handle basic concurrent commands. Then, we can try to ensure that a system crash will only lose blocks or inodes without corrupting the entire file system. Modern filesystems actually keep a log of operations so that upon startup they can restore the working state. Alternatively, we can implement some kind of checkfs function but that is still much later.
Writing the superblock to disk every operation is a bad idea. We should figure out how we want to periodically sync it with disk. Alex recommended running a separate thread that checks the modified flag every say, 5 seconds, but I think we can be even simpler for phase 1 by syncing every 100 operations or something more naive.
The free node and inode list can be implemented however we want. We can keep just the head in the superblock, or reserve some space for the initial list. I think keeping the initial list in the superblock might actually make the makefs function simpler, since we don't have to allocate anything outside of the superblock (even if it wastes some space in the superblock, that space wouldn't have been allocated anyway). The free inode list is simple--we can use the circular queue we discussed.

TA 10/21 Notes on Phase 1

There are fundamentally two designs for detecting free inodes on disk, but we can't simply rely on linkCount == 0. This is because it's possible to delete a file that another process is currently writing to, and the writing process will not be kicked out of the inode. That write is eventually discarded when the write finishes (refCount = 0 and linkCount = 0 so the inode is flagged free) but if you allocate that inode too early, the write could still be going on and another process could see a corrupted inode.

The first solution is to have an extra flag (like the type). This will certainly work because the flag won't be updated until ALL processes finish writing to the file, so it is guaranteed to be free when it makes it to disk. The advantage is that we can implement this directly in Layer 1 without worrying about disk consistency. You only need to update the flag if either refCount == 0 or linkCount == 0 (of course, you have to check both values when this happens).

The second solution is to omit the flag, but when you find a inode on disk with linkCount == 0, you must check your file table (or buffer cache if implemented) to see if the refCount == 0 as well, to ensure no process is finishing an operation. This is nice because you only need to check refCounts when scanning for freed inodes, and if your system crashes, you don't need to "recover" freed inodes (refCounts will reset since your process died anyway). The disadvantage is that the scan now depends on Layer 2 functionality (if you have no buffer cache).

I prefer the extra flag, since we can simulate inode freeing without Layer 2 and the flag doesn't cost any extra space if we integrate it into the type.

I asked about what happens if we unmount during an operation like above. Alex said that the best way to do unmount is to force it to block if operations are still in progress, so you only unmount a filesystem that has completed all pending tasks. Otherwise you have to implement some "force unmount" function which is complex.
Alex agreed that disk writing functions like blockify are the simplest way to convert things like in-core superblocks to disk superblocks (we may want to do the same for inodes). There is another solution, which is having separate structs for in-core data structures and disk data structures, and writing code that converts between the two (or composing one with another). This is a cleaner solution that avoids having to manage buffers explicitly but is more code to write.

TA's notes on Layer 2

In the UNIX file descriptor table (textbook page 95), the user file descriptor tables are per-process, as we understand. The second column, the open file table, contains refcounts because processes that fork create copies of the file descriptor table and share a open file table entry (fork not required for this project). Processes that open the same file twice always get separate entries in the open file table that must be closed separately.
Data structures for the file table: since the file table structures are all in memory, their performance is pretty insignificant compared to the actual disk. In other words, even if we implement all our tables as arrays and seek, it will still be way faster than disk operations anyway so we don't have to worry about performance.
We should start worrying about synchronization, since even in sequential mode we probably want to specify what kinds of read/write interactions are allowed. I think there are 2 levels of locking in Layer 2: inode locking, where direct inode modifications must pass a lock, and file locking, where processes cannot, for example, read and write the same file at the same time. One very easy policy Alex suggested: each file can only be opened by one process at a time, for any operation.
Phase 1 and 2 grading will be "informal" and we only need to provide test functions that demonstrate the working state of the required functions (no menu or interface required). Alex said that only FUSE will be the really rigorous test, and will be the vast majority of our grade.