System V File System Help Guide - brown-cs1690/handout GitHub Wiki

Resources

What is System V File System (S5FS)?

  • System V File System (S5FS) is a file system based on the original Unix file system. In this project, you will be providing a new file system implementation for the VFS interface. While we used RamFS in previous projects as your file system, during S5FS and VM you'll be using your implementation of S5FS as the file system
  • We want our file system to have permanent storage (such as residing on disk), to be quick, easy, and efficient to use. Changes made during VFS would not persist because we were storing files on RAM (volatile storage). Changes in S5FS will persist because we're storing files on the disk (nonvolatile storage)
    • You've already laid some of the ground work for working with the disk in Drivers (sata.c), and VFS (the layer of abstraction that will call the various functions you write in S5FS!)

Data Structures

Inodes (s5_inode_t)

  • Inodes allow you to store information about different types of data such as files, directories, etc. In VFS, you worked with vnodes, but vnodes were temporarily used when your Weenix was operating (stored in Kernel memory). When your Weenix shuts down you need some way of storing the changes on the disk -- that's where inodes come in. Inodes will store information that gets used to actually create the vnodes you're working with so that changes persist even after Weenix shuts down
  • Information such as file size, the inode's number, linkcount and the blocks that make it up are included in the data structure. For the direct blocks, it's an array of block numbers that make up the file. For the indirect block, it's a block number that contains more block numbers. For example, if the indirect block is 526, then that is saying that in block 526, there are more block numbers available that can correspond to this file. When using the indirect block, you will have to retrieve that block and treat the block as an array of integers (using the pf_addr of a page frame). If the indirect block is 0, that means it hasn't been allocated yet

S5 Node (s5_node_t)

  • A more general structure that contains a vnode and inode (representations of the same data), as well as a field to indicate whether the inode is dirtied

S5FS (s5fs_t)

  • An in-memory representation of a S5FS filesystem

Super Block (s5_super_t)

  • The first block of the disk. It contains metadata about the file system. This will be something you become more acquainted with when you're writing s5_alloc_block() and manipulating the free blocks list

Dirents (s5_dirent_t)

  • Contents of a directory entry. It contains the inode number and name of the directory entry

Pframes (pframe_t)

  • Pframes (short for page frames) are part of the VM caching system. It allows us to manipulate pages in memory, which then get written back to disk when Weenix shuts down
  • You have page frames, which are each responsible for tracking one page/block of data, and memory objects, which have a number of pframes that hold the data for the memory object
    • Pframes are used to reference blocks of files, blocks of a device, and blocks of segments of memory. They store metadata about the page they hold and a reference to that page in memory. If a file isn't paged into memory, there isn't a pframe for it yet
  • When you're working with pframes, you're going to check if they exist, and if they don't then you will retrieve them. You won't directly call functions like mobj_get_pframe(). Instead, you'll use functions that wrap around those calls. When you are done using a pframe, you want to make sure you call the appropriate release function. For example, you may have to call s5_release_disk_block() or s5_release_file_block(). This will unlock the pframe's mutex (it will be locked when you get it)

Macros

There are a number of macros that you will become acquainted with throughout this project. Most of them can be found in s5fs.h. It will also be helpful to refer back to the macros from VFS

Casting Macros

  • These macros essentially cast types by using offsets and structs
    • VNODE_TO_S5NODE(vn) - cast from a vnode_t* to a s5_node_t*
    • FS_TO_S5FS(fs) - cast from a fs_t* to a s5fs_t*
    • VNODE_TO_S5FS(vn) - cast from a vnode_t* to a s5fs_t*

I/O Macros

  • These macros take in some input and give some output
    • S5_DATA_BLOCK(seekptr) - given a file offset (in terms of bytes), returns the file block number (e.g. 1000 would give file block number 0)
    • S5_DATA_OFFSET(seekptr) - given a file offset (in terms of bytes), returns the offset into the pointer's block (where in the block the data is)
    • S5_INODE_BLOCK(inum) - given an inode number, returns the block that inode is stored in
    • S5_INODE_OFFSET(inum) - given an inode number, returns the offset (in units of s5_inode_t) of that inode within the block returned by S5_INODE_BLOCK

Value Macros

  • These macros have some underlying value that doesn't require any input (like a constant)
    • S5_NIDRECT_BLOCKS - number of blocks stored in the indirect block
    • S5_NDIRECT_BLOCKS - the number of direct blocks that can be allocated for a file
    • S5_BLOCK_SIZE - the size of a block on the disk (same size as a page!)
    • S5_NBLKS_PER_FNODE - the number of block numbers per node in free block "link list" (see data blocks section on S5FS handout for more information)
    • S5_MAX_FILE_BLOCKS - the maximum number of blocks that can be allocated for a file (# of direct blocks + the extra block numbers given by the indirect block)
    • S5_TYPE_[DATA, DIR, CHR, BLK] - S5 equivalents of S_IFREG, S_IFDIR, S_IFCHR, and S_IFBLK from VFS
    • S5_NAME_LEN- S5 equivalent of NAME_LEN from VFS
    • S5_DIRENTS_PER_BLOCK - the number of directory entries that can fit in a block
    • S5_INODES_PER_BLOCK - the number of inodes per block

To-Dos and To-Donts

What is completed for you?

  • File system mounting/unmounting
  • File truncation
  • Allocating inodes
  • Freeing blocks
  • Many finer details and much of page frame retrieval/freeing

What do you have to complete?

  • High-level functions that work with the VFS interface
    • Linking, renaming, making directories, etc.
  • Low-level subroutines that get utilized in the higher-level functions

Files you'll be using during the project

You will be directly modifying the following files:

  • s5fs.c
    • High-level functions that provide the underlying functionality for do_x functions in VFS. For example, do_write() would utilize s5fs_write()
  • s5fs_subr.c
    • Low-level subroutines that provide helpful functionality in s5fs.c

It will be helpful to refer to s5fs.h. It contains a variety of structs and macros that you will be using throughout this project.

Testing

  • Write your own test code
  • You want to be sure that tests run by vfstest still pass while using S5FS as your file system
  • Use the tests in s5fstest.c (make sure you're using the file within the kernel folder as there is a s5fstest.c that looks very similar in the usr folder)
    • kshell command: s5fstest or call s5fstest_main() in kmain.c
    • Make sure the relevant debug printouts are on so you can see s5fstest's full output
    • You can comment out tests to make sure that individual tests/sections pass (and don't prevent Weenix from halting cleanly)
      • If you're encountering refcount issues it would be very helpful to comment out each function in the test suite and uncomment them out slowly to see where the refcount issue occurs
  • Weenix should be able to halt cleanly after using vfstest and s5fstest (it would be good to test these commands multiple times in a row)
  • You should also make sure you're testing indirect blocks, sparse blocks, running out of inodes, data blocks, or file length, and that changes persist even after shutting down and rebooting
    • Take advantage of the hamlet file that's provided to you. It will be helpful for testing
    • It's important that you try testing different cases because your implementation can still contain bugs despite passing s5fstest and vfstest, which you may not catch until VM (true story)
    • To test that changes persist, you should make changes to the disk (such as creating a new file), halt Weenix, then use ./weenix instead of ./weenix -n to start Weenix with the same disk (as opposed to using a new disk)

Debugging

  • Use the strategies from the previous projects (especially from VFS because there's overlap)
  • Enable/disable debug printouts by setting INIT_DBG_MODES variable in kernel/util/debug.c
    • When your Weenix runs, you'll see things outputted like test results and other helpful information based on the flags you've enabled
    • Use test, vfs, fref, vnref, s5fs (they don't all have to be on at the same time)
      • You can find how these are used by searching up dbg(...) which should give you the different files (such as the testing files) that debug printouts are used in
    • You can find a list of possible flags in debug.h
  • A tool has been provided to you for helping you examine your disk without running weenix, fsmaker
    • Run ./fsmaker -h to see how to use it

FAQs

The following questions (and more) can be viewed on the Files and Memory Wiki

  • How are vnodes related to inodes?
    • Inodes are the file system counterpart to the vnodes. They are specific to the file system and they are stored on the disk (whereas vnodes are stored in kernel memory). Vnodes are put away as you use them, so by the time Weenix shuts down you shouldn't have any active vnodes. Inodes are saved on the disk, so when you want to create vnodes later you can use the inode information saved on disk (see s5fs_read_vnode() in s5fs.c)
  • What are inodes?
    • They represent files that are stored on disk, and inodes themselves are stored on disk. The inode keeps track of the data blocks associated with it using a list of block numbers (some stored in the inode itself (direct blocks) and some stored in a separate "indirect" block) that can point to blocks scattered throughout the disk
  • Do the direct blocks and indirect block actually store the data that the inode represents?
    • No, these are just block numbers that refer to where the data is stored on the disk. You will need to use the block numbers to get a page frame with the data
  • What are s5_node_ts?
    • s5_node_ts contain both an inode and a vnode (and whether the inode is dirty) that correspond to the same filesystem object
  • When should inodes be marked as dirty?
    • Marking an inode as dirty means that any changes you have made will eventually be flushed to disk, but prior to that the disk remains unmodified (which means that if the system crashes, these changes will be lost). The inode should be marked dirty when the metadata of the file that it represents not when the data of the file that it represents has been modified
  • What are sparse blocks?
    • Blocks of all zeros (that don't contain any data) are considered "sparse." In an inode's direct blocks or in the indirect block, they are denoted with a disk block number of 0
  • What's the difference between a disk block number and a file block number?
    • A file block number indicates the block of the file relative to the file. For example, an offset of 5000 bytes into a file means you would be within file block number 1 (5000 / 4096). A block number refers to blocks relative to the disk. File block number 1 may be the 500th block in the disk
  • How do memory objects relate to what is stored on the disk?
    • The disk on Weenix is represented as a block device, which is a device that accesses data in terms of blocks (in comparison to character devices that transfer/receive a single character/byte at a time) The routines that you have written in sata.c (or that have been written for you) will eventually be called any time the disk is accessed. sata_read_block and sata_write_block will read from and write data to disk, which will be called with calls to blockdev_fill_pframe. Anything from the disk that is cached in RAM will be in the block device's memory object. In other words, any disk block that is not sparse will be cached in the block device's memory object. When Weenix is shutting down, all the data blocks that have been modified will be flushed to disk using blockdev_flush_pframe.

Getting Started

  • Double check that S5FS = 1 is set in Config.mk and run make clean all
  • Read. Read through all of the documentation we give you. It will save you a lot of time if you spend some time reading through things and understanding how S5FS works. You won't get a 100% understanding on your first read, but it's good to have some baseline understanding of what you're implementing
  • You're only working in s5fs.c and s5fs_subr.c. You should be okay to start with either file
    • s5fs.c deals with the high-level system calls for S5FS (the underlying functions that get called by higher level VFS functions) -- similar to ramfs.c. For example, do_write() would call s5fs_write()
    • s5fs_subr deals with low-level subroutines. They represent common functionality that may will useful to reuse in the higher level system calls. The subroutines are challenging but they'll help you become very familiar with how the disk and file system work