OS - jwyx/ForFun GitHub Wiki

Credit

USC CS402

Synchronization

How to work the projects?
    1. Understand the theory thoroughly
       design is important!
    2. Start coding
       Iterative Programming
    3. Remaining code
       each partner and integrate

    Writeup
    Comments
    Tests
    Execution

Primary goal: make a computer easier to use

Users | Application Programs | Virtual Machine Interface | OS kernel | Hardware Interface (Device Drivers) | Hardware

5 Major Components to an OS
    CPU scheduling
    File System
    Memory Management
    Networking -Distributed Systems
    Security

Multi-programmed
    Multiple users
    Multiple programs
    Need security between user programs
    OS is responsible for security

OS key objective: Keep as many resources on the computer as busy as possible
    -> switch efficiently between user programs (context switch)
    switching the CPU from running one user program to running a different user program
    transparent to the user program

When to context switch?
    When a user program requests a 'slow' operation
    When a user program finishes
    When a new user program arrives (optional)
    When a maximum amount of time of occurred -> Time Slice (optional)

Context switching allows for currency -> user programs can behave as if they have the computer to themself
    On a context switch, the OS must remember the user program context (state) when being evicted from the CPU

Process:
    OS manage process, NOT user programs
    used to keep track of all the things an OS must remember about the state of a user program
    3 Main parts:
        Code/data
        Allocated resources
        information for context switch
            CPU registers
            Other OS-specific data
    4 States:
        New:     process just been created. Not completely setup for CPU execution, yet, as a user program
        Ready:    process is ready for the CPU; waiting its turn
        Running:  process currently executing in the CPU
        Blocked:  Process cannot use the PU until some event occurs

Race condition:
    order of execution affects the results when sharing resources
    NO shared data, NO race condition
E.g.
    A                              B
(1) i = 0                   (3) i = 10
(2) i = i +1               (4) i = i - 1
(1)(2)(3)(4) -> 9; (3)(4)(1)(2) -> 1; (1)(3)(2)(4) -> 10; (3)(1)(4)(2) -> 0;

Mechanisms for sharing data between execution streams (between processes):
    Message passing:
        Requires a system call so that OS can perform the data sharing transfer between 2 processes
        Disadvantage: OS must run code for every sharing event
    Global Memory:
        Processes can request a 'special' block of memory that can be shared.
        All authorized processes share this block of memory.
        Disadvantage: still requires OS involvement at startup time
    Thread:
        Allow multiple execution streams inside a single process

How does the OS track threads?
    Each thread has its own CPU state in their process
    CPU scheduler now schedules threads

Now A process contains:
    A grouping of resources
    One, or more, thread
    CPU state saved separately for each thread

How to solve race condition when using threads?
    1. How to share data?
       Use global variable to share data

    2. How to ensure threads, in a process, execute one at a time?
    3. How to ensure proper sequencing of events?
       Atomic Operation
           One, or more, tasks that execute as a "single operation", can't be split up
           No interference with shared data
       Critical Region
           A section  of code that utilizes shared resources
           Idea: use atomic operations to control access to critical regions

       Mutual Exclusion
           If one thread is in a critical region, no other thread in the same process,
           is allowed to enter that critical region they must wait

           4 necessary conditions for M.E.
               Only 1 thread in a C.R. at a time
               No thread should wait forever to enter a C.R.
               No assumptions about CPU speed
               No thread outside a C.R. can block a thread in a C.R.

       Avoiding busy waiting!
       Need higher-level atomic primitives (in software)

       Using Monitors
           3 components:
           1. Lock for mutual exclusion
              Acquire: lock
              Release: unlock
              both operations MUST be atomic

              Atomic:
                  Use interrupts (disable, restore)
                  Sleep/ Wakeup mechanism:
                  Sleep: A thread "state" where it cannot gain access to the CPU on its own by the CPU scheduler
                         -> remove the thread from ready queue
                         -> find a sleeping threads, we can use a separate queue to store sleeping threads
                            specific to each lock
                         -> before going to sleep, add myself to the lock's wait queue
                  Wakeup: The event, a sleeping thread has been waiting for, has occurred
                         -> Wakeup 1 sleeping thread
                         -> Give this 1 thread ownership of the lock
                         -> Put the thread in the Ready Queue in the Ready State

                  For a thread to gain access to the CPU:
                      it must be in the Ready Queue
                      it must be in the Ready State

              2 States:
                  Free - available
                  Busy - Not available

              Solved problems:
                  Works!
                  All threads have same code
                  No busy waiting
                  Only solve mutual exclusion

           2. One, or more, condition variables for sequencing
              Have no state, but 3 opertiaons
              Wait: put myself to sleep, waiting on some condition
              Signal: wake up 1 waiting thread if there is one
              Broadcast: wake up all waiting threads

           3. Monitor variables for making sequencing decisions
              Just data
              2&3 are shared data

           Form of a Monitor
           Acquire a lock - Entering the monitor
              Access/ manipulate our monitor variables to make synchronization decision
              Use C.V. operations to wait, signal or broadcast
           Release the lock - Exiting the monitor

           Mesa-style: if Q and P in the same monitor,  Q has to wait for access to the monitor when it
                       wake up from a CV by P.

           How to design multi-threads:
              What lock?
              What monitor variables?
              What condition variable?

Methodology for writing multi-threaded interactions:
    'write' one at a time - only 2 entities:
        1. Analysis
            lock
            1 or more CV
            monitor variables
        2. Design/ end up with pseudocode, Ignore any other entities
        3. Write code for both entities
        4. Compile the code
        5. Test the code you just wrote

Semaphores:
    can handle all of the problems monitors can handles
    2 Operations:
        Wait  Down P
        Signal Up    V
    State:
        an integer
            0 -> any positive value
    Have a wait queue

    Up:
        Increment semaphore value by 1
        Wakeup 1 waiting thread
    Down:
        Decrements value by 1 - if not 0
        If value is 0, thread go to sleep

    Semaphore value is private

    1. Mutual Exclusion:
        Binary semaphore - 0 or 1
            Initialized to 1
        to enter a C.R. - Down()
        to exist a C.R.  - Up()

    2. Synchronization with Semaphore
        Can start with any valid value
        Usage can be an Up, or Down to start with

        Application decision

    E.g.
        Soda and Student

        Needs 3 semaphores:
            mutex - lock - binary semaphore
            empty - #empty slots - start @ 10
            full - #full slots  - start @ 0
        SodaGal
            empty.Down() // reserve an empty slot for me
            mutex.Down()
                // add 1 soda
            mutex.Up()
            full.Up()

        Student
            full.Down() // reserve an empty slot for me
            mutex.Down()
                // take 1 soda
            mutex.Up()
            empty.Up()

System call & CPU schedule

System calls
    mechanism by which the OS allows user programs to perform kernel tasks

    Fork - create a new thread
    Exec - create a new single threaded process
    Exit - when a thread is done executing

    Synchronization
        Locks: Acquire, Release, Create Lock, Destroy Lock
        CVs: Wait, Signal, Broadcast, CreateCV, DestroyCV

    miscellaneous:
        Yield

    The purpose of a system call:
        'Safe' mechanism by which user program are able to access kernel-protected resources
        Lock class & condition are kernel resources
        Lock & condition system calls are just interface to Lock & Condition kernel objects
        Locks are specific to a process

    Format of sys calls from user program viewpoint:
        1. What data does the OS need to carry out the request?
        2. Is there a return value to the user program?

    must return a handle to the user programs - a resource identifier

    OS Help: Only destroy a resource when it is not in use!
    remember a destroy system call has been requested!
    need a status variable that a lock is going to be used!

CPU Scheduling:
    Task: which thread gets the CPU task & when do they get it

    Policy: Rules
    Mechanism: how we implement the rules

    5 possible goals for scheduler:
        Fairness: every job gets the same amount of CPU time
        Efficiency: Keep as many of the computers resources as busy as possible
        Response Time: Minimize / Maximize interactive response time
        Turnaround Time: Give CPU preference (or not) to background tasks
        Throughput: Complete as many jobs as possible in some amount of time

    Issue: Do jobs run to completion or not (non-preemptive or preemptive)
        How long is  a time slice?
            Too long: interactive use will suffer
            Too short: too many unnecessary context switches

    Scheduling Policies:
        First-Come First-Served
            Similar to 'standing in line'
            Non-preemptive
            The job that has waited the longest gets the CPU next

            + Fair (No starvation)
            + Simple
            - Short jobs can get 'stuck' behind long jobs

        Round Robin:
            preemptive FCFS
            Jobs run until their time slice has expired, they are blocked on a 'slow operation',
            or they voluntarily give up the CPU

            + Fair
            + Short jobs don't get stuck behind long jobs
            - Unnecessary context switches when all jobs are about the same length (not short)

        Shortest Job First:
            Maximize throughput
            Run the job requiring the least CPU time to complete

            Non-preemptive
            - Not fair (starvation)
            - Not implementable

        Shortest Remaining Time to Complete First
            Preemptive
            The job 'we' predict will use the smallest time slice percentage - based on its recent history
            - Can still have starvation
            - The scheduler must track & compute the time slice usage

        Priority -based Scheduling:
            Priority: Some jobs are favored over other jobs
            Policy: 'Higher' priority jobs run before lower priority jobs
            Issue: Preemptive or not?
            New Issue: Do priorities change over time?
                static priorities: priorities are fixed
                    - starvation
                    + easier on OS
                dynamic priority: priority can change
                    up or down
                    wait time
                    CPU time
                    + NO starvation
                    - Extra overhead, during context switch, in recomputing priorities
            Final issue: How many priorities are needed?
                somewhere around 4 priorities are typical

                Most effective Ready Queue organization, is to have a separative Ready Queue for each priority;
                    This allows for different scheduling policies for each priority
                    The time slice interval can be different for each priority queue

Deadlock:
    Occurs because of competition for resources:
        hardware
        files
        synchronization primitives

    E.g.1
        lock1->Acquire()                     lock1->Acquire()
        lock2->Acquire()                     lock2->Acquire()
        cv2->wait(lock2)                      CV2->signal(lock2)

    E.g.2
        lock1->Acquire()                     lock2->Acquire()
        lock2->Acquire()                     lock1->Acquire()

   Definition:
       Two, or more, jobs each waiting on an event that can only be produced by one of the waiting jobs.

    Using resource sequence:
        1. user program requests resource access from the OS
        2. Once access is granted, the user's job uses the resource
        3. When user program is done with the resource, it is given back to the OS
    What if a requested resource is not available?
        1. Fail the request
            Application has the responsibility to handle failed requests
            + No deadlock possible
        2. Queue the request
            OS has responsibility to manage queued requests
            - Deadlock is possible

    4 conditions for deadlock:
        1. Mutual exclusion:
            Some resources that cannot be shared
        2. Hold & Wait
            Jobs can own resources & request other resources, without giving up what they already own
        3. No preemption
            Once a resource is given to a job, it cannot be 'safely' borrowed
        4. Circular wait
            X, Y resources
            A, B jobs
            A owns X and asking for Y
            B owns Y and asking for X

    Deadlock solution strategies:
        1. Do nothing
        2. Detection & Recovery
            have a way to detect deadlock has occurred
            kill a job
        3. Dynamic Avoidance
            We try to keep deadlock from happening by carefully allocating resources
        4. Prevention
            Eliminate 1 of 4 required conditions for deadlock

Memory Management

Virtual address space (part of our security model)
    First executable instruction is at virtual address 0
    Code, Data, Heap, Stack
    Code and Data built by compiler
    Heap and Stack assigned at run time start-up

How to load the executable contents into memory?
    Memory is 'divided' into equal size pieces - pages
        The OS allocates memory 1 page at a time

    The OS must now track the used/ unused pages of memory (use Bitmap)
    For each process, we need to track where each virtual address space page is loaded into memory

    Page table:
        A mapping of virtual page locations (starting virtual address of that page) to the physical page
        locations (where were they copied into memory) is an array indexed by V.P.N
    One stack per thread

Physical Memory Management
    Most Basic:
        Only 1 progress, with only 1 kernel thread, in memory at a time
        No memory sharing between user programs -> no security
    Fixed Partitions:
        Memory is divided into a set of partitions of various sizes
        Load a program into a partition
            No sharing of partitions
        + Multiprogramming is possible
        + The partitions are the security
        - Can only run a program up to the largest partition
        - Internal memory fragmentation
            Unused memory allocated to a process
    Dynamic Partitions (Base & Bounds)
        A partition is created at process initialization
        + A process can be loaded anywhere in memory
        + Can run one job almost the size of physical memory
        - Need security between user programs
            Need 2 new registers
                Base: starting memory address
                Bounds: partition size
        - Problem: If need more memory for the new process than any single piece of available memory
          Wait until later defragment memory
    One more disadvantage that all 3 methods up to now; memory allocated to a process has been contiguous.

    Address Translation:
        Validate memory references & converting virtual addresses to real(physical) addresses
        Done in hardware, MMU

For Base & Bounds:
    1. validation:
        Compare needed virtual address to the bounds reg.
        V.A >=0 && V.A. < Bounds Reg
    2. Getting Physical Address
        P.A. = Base Reg + V.A.

    Segmentation:
        Divide the address space into segments - these are smaller than whole address space
        Each segment is contiguous in memory
        Any segment can be loaded into an unused block of memory that is large enough

        One set of Base & Bounds registers is not enough
            Too expensive to have lots of pairs of registers
        We will store these B&B values in a table - segment table

        New + Memory sharing can now occur between processes
            entire segment
        New + Virtual memory is now possible
            allow user program to ignore physical memory limitations
        - Memory allocation is still a problem
        - External fragmentation exists

    How to do Address Translation:
            MMU process:
                0. Receive V.A
                1. Validate V.A
                    split the V.A into 2 parts
                        Segment number (index position in segment table; array)
                        Address offset in this segment
                2. If valid, perform the translation
            E.g.
                16 bit V.A.
                5 bits Segment #, 11 bits V.A. offset
                Segment Table (valid, base, bounds)

            Validation Process:
                Does segment # exist in segment table? (have such index in segment table)
                Is this segment already in memory? (valid bit is true)
                Is address offset contained with segment 17? (compare offset with size)
                Compute physical address:
                    P.A. = V.A. + Segment start (Base)
    Paging:
        Allocate memory in fixed size blocks - all the same size ->Pages
        Logically divide the address space into virtual pages - the same size as physical memory pages

        + Any virtual page can be loaded into any available page of physical memory
        OS. tracks all virtual page locations, for each address space, in physical memory
            use a page table for this

        Address Translation:
            Similar like segmentation
                split the V.A. into 2 parts
                    virtual page #
                    address offset within a page

        Paging has a new problem:
            Page Table is an array (It must be physically contiguous in memory)
                indexed by V.P. #
                managed by kernel
                    no address translation
        Page table can be very large & mostly empty
            E.g. 32 bit OS -> 2^32 bytes available to a user program -> 4 billion bytes -> 1 billion addresses

    Multi-level Translation:
        Lowest - level table, which points to the physical page where the virtual page is located,
                 is always a page table.
        E.g.
            2-level paging
            Divide the V.A. into 3 parts
            32bit address space (10 bit top-level index, 10 bit 2nd-level index,12 bits address offset)
            2^20 virtual pages, 1025 total tables
        In real OSes:
            32 bits: 3 or 4 translation levels
            64 bits: 4 translation levels
        But we have new problem:
            MMU:
                1. V.A. received & validated
                2. MMU accesses memory for 1st-level P.T.
                3. ... 2nd-level P.T.
                4. ... 3rd-level P.T.
                5. ... 4th-level P.T.
                6. ... Get P.A.
            5 memory accesses for 1 user program address

Translation Lookaside Buffer (TLB):
    Cache of page table entries
    Map virtual pages to physical page - like a page table
    The V.P.N is the Key
    Contains recently accessed V.P.s & their physical page location

    MMU process:
        1. Receive & validate V.A.
        2. Compute the V.P.N
        3. Use V.P.N to lookup in the TLB,
            If found directly compute physical address - done
        4. If not found in TLB, go to the translation tables
            Add this entry to the TLB

    Issue: what happens on a context switch?
        Invalidate the TLB!!

    Issue: Real OSes, when they get busy, don't have enough memory
        The OS must manage memory - it can be full
        -> Global page replacement
            The OS can remove a page from any process

Inverted Page Table (IPT):
    Goal: check whether a V.P. is in memory!
    kernel needs some data structure that maps physical memory - independent of user programs
    maps physical pages to virtual pages
        only 1 for OS
        always in memory
        1 entry for each physical memory page
        managed by kernel
        'Indexed' by physical page #

    In the real OSes, the IPT is a hash table
        key: VPN & process ID
        lookups are:
            IPT contains @ least:
                physical page #
                virtual page #
                process id
                valid bit
                dirty bit - when true, physical page has been modified

    MMU process:
        1. receive & valiate V.A
        2. check TLB
            if found, compute P.A. - done
        3. If not in TLB, look in IPT,
            if found, compute P.A., update TLB - done
        4. (Page fault) If not in IPT, go to the translation table (needed V.P. is not in memory)
            a. load needed page into an available page of memory
            b. update IPT, TLB
            c. update translation table
            d. restart the user instructions

Demand Paged Virtual Memory:
    Don't preload any virtual pages into memory - wait for a page fault

What if physical memory is full on a page fault?
    must evict a page of memory
        What if the selected page is dirty?
            we must save this dirty page to disk

    MMU process:
        1. received & validate V.A.
        2. check TLB
        3. If not in TLB, check IPT
        4. If not in IPT, check translation tables
            a. to look in our T.T., we use the V.P.N to find the appropriate entry
            b. read that entry to find the disk location
            c. copy this page (if on disk) to a physical memory page
                if none available, evict one
                for the selected physical page:
                    is it dirty?
                        Yes, save it to disk
                    update translation table// record where it stored (none, swapfile, executable)
            d. now we can load in the needed virtual page - into the newly available physical memory page
            e. update the translation table for the newly loaded page of the current process
            f. update IPT & TLB for the newly loaded page
            g. the evicted page, if it is in my process, might be present in the TLB
                If so, invalidate that TLB entry
            h. restart the user instruction

Different between Translation table and Page table??

Page Replacement Policies:
'global' choices:
    1. If a page is dirty/ non-dirty
    2. If the page is dirty->
        maintain a cache of memory pages not allocated to user programs (safety net)
        instead of putting the dirty page to the swap file, I use a 'cache' page

Specific Policies:
    1. Random
        + simple
        - not based on usage
    2. FIFO
        select the page that been in memory the longest
        - not based on usage
    3. Optimal:
        replace the page that will not be needed for the longest time
        - not implementable
        + used for comparisons with implementable policies
    4. Aging algorithm:
        replace the 'oldest' page
            we have a way to measure age (length of time since a page was accessed)
            Clock - 1 bit
            Second Chance - 2 bit
            Several algorithms - 4 or 8 bits
            Least Recently Used (LRU)
                typically 32 bits

Protection Systems:

The OS has the responsibility for the protection of computer resources

General characteristics:
    An OS has resources (objects)
    These resources are managed by the OS for user programs
    Each resources has a unique identifier
    Each resource has a set of valid operations that can be performed on it

Protection systems have 3 main components:
    object/ resource to be protected
    set of allowed operations
    processes (user programs)

Domain Matrix Implementation:
    A 2D array
    Domain: A set of pairs - object/ rights (authorized operations)
    Rules: A user program can only be in one domain at a time
            A user program can change their  domain
    A 'typical' domain is the user account

    Each object/rights pairs specifies:
        the object being protected
        the subset of allowable operations the domain can perform

    In our matrix:
        a row represents one domain
        a column represents one object
        a cell contains the rights for a single domain to a single object
    This is a kernel data structure
    disadvantage: sparse

Do not store empty cells - only store rights that exist
    No entry means no access

    1. by row (domain) -> protection domain
    2. by column (object) -> access control list

    Protection domain:
        for each domain, we have a set of pairs (object, rights)
    ACL:
        for each object, we have a set of pairs (domain, rights)

    Which method to choose for an OS?
        Protection Domain:
            + users can see all files they can access
            + only one active domain, at a time, for a process, we can store a reference to that domain protection in the process
        ACL:
            + can show all domains which can access a single object

Protection data is privileged (kernel access)
    no direct user access

Issue: at any instant, the protection data determines what a domain can do (mechanism)
    It does not control what a domain are authorized to do (policy)
    Protection data may not match the protection policy
    Protection systems only validate against the protection data, NOT the protection policy

Networks

Remote Procedure Calls
    Based on the concept of a local procedure call
        jump to another section of code
        execute it
        return to calling address & continue - may return a value
    The difference with RPCs - the jump is to code on another computer

How to implement RPCs?
    1. Require user programs to explicitly issue Send/Receive syscalls
        + Easy OS implementation
        - Require the application to 'know' the server location & possible the protocol (limit flexibility)
    2. Hide the fact that work is being done remotely from user programs
        No networking code in the user program
        Send/Receive work done by a 'middle layer', or by OS kernel

        How to do this?
        use 'stub' programs to do the networking
            client stub (issues requests)
            server stub (receives requests)

        Where do stubs come from?
            Idea: Client stub & server stub must understand each other
            They primary task is to create, parse, send & receive, msg between each other

            Have a program to generate the stubs => stub generator
                web services -> Axis wsdl2java
                CORBA - IDL idj2java

            Specification Data:
                Request type
                Request version (optional)
                Order of data in msg
                Format & type of each data item

         How does the client stub 'know' where the server stub is located?
            static binding -> hard code the network location of the server stub in the client stubs - not flexible
            dynamic binding ->
                client stub obtains the network location of server stub @ request time
                need a new program: binder
                    track location of server stubs
                    respond to client stub requests
                    server stubs register with binders
                    Deregistration msg
                    binders can 'poll' server stubs
                    load balancing
                    authorization
                advantage:
                    flexible
                    achieved transparency to the user apps
                disadvantage
                    complexity

Distributed Mutual Exclusion:

    1. Centralized
        Have 1 mutual exclusion server
        It handles all request to enter CRs
        A request message is sent from the client to the server
        If no client is in that particular CR, the server replies with an OK msg
        If a client is in that CR, the server queues the request
        upon exiting a C.R., a client send a release msg to the server
            If a client is waiting for that CR, the server send 1 OK msg to a waiting client
        For this to work, CR need unique identifiers
        + correct
        + fair
        - does not scale well
        - single point of failure - server

    2. Fully distributed
        no centralization
        no central server
            all clients work together
        no centralized decision making
            clients make group decisions

        Requirements:
            1. Reliable communication
                no lost messages
            2. Globally unique identifier for each group member
            3. Total ordering of events
                An event, in a distributed system, is a Send/Receive
                All group members agree on the order of Sends
                    require messages, from a single group member, are delivered in timestamp order

            How to ensure 'total ordering'?
                A group member does not process a received request msg until it 'knows' it cannot receive a request msg from
                ANY group member with an earlier timestamp

            New requirement:
                Message from a single group member are received in timestamp order & no 2 msgs have the timestamp
            Solution:
                Each group member maintains a 'last timestamp received' table

        Process for Total Event Ordering:
            1. Receive a msg
            2. Extract timestamp & member ID
            3. Update last timestamp, in my table, for that member
            4. Extract the earliest timestamp value from my table
            5. Process any msg, in timestamp order, with a timestamp <= value from step 4

        Mutual Exclusion Message Processing Algorithm:
        1. A group member sends a request msg to all group members
            msg contains: identifier of C.R., timestamp, ID of requesting member
        2. A member processing a request msg can be in 1 of 3 states:
            a. Member is not in that C.R. & has no pending request for that C.R.
            b. Member is in that C.R.
            c. Member has a pending request for that C.R.
        3. How/when to respond?
            a. Send OK msg to requester
            b. Queue the request, when I exit the C.R., send OK msg to all queued requests for that C.R.
            c. Compare timestamps (& global IDs if timestamps are the same)
                if my timestamp is earlier, like b
                if my timestamp is later, like a

        C.R. Entry Rule: A member can enter a C.R. when OK msgs have been received from all other members

        + Correct
        + Scales somewhat (O(n^2))
        - N points of failure

        Variation 1# - Always send a reply to a request
        Member will send an OK, or a NOT OK msg upon receiving a request
        + Requesting members can use the lack for a response to start a process to determine if a member is dead
        - Still have a problem in 2 cases
            A member dies after sending a NOTOK, before exiting the C.R.
            A member in the C.R. dies

       Variation #2 - Majority Rules
       A member can enter a C.R. when N/2 + 1 OK msg are received
       Members only send 1 OK msg when they receive a Release msg (someone exits the C.R.)
       + N/2 - 1 members can die without affecting the algorithm
       - Single point of failure - member in the C.R. dies (no sending of release msg)

       Another approach to distributed M.E.: Token Ring algorithm
       For each C.R. there is a Token msg 
            it contains the C.R. identifier

       To enter a C.R. the member must have the token msg
           the member keeps the token until they exist the C.R.
       Group organization is a 'logical' ring
            Members know who is in the group & the order of members

       If a member does not want to enter a C.R., when receiving that token msg they forward the msg to their neighbor
       + Correct
       - Token msg consume network resources even with no C.R. activity

       When can token Ring algorithm Fail?
           Failure model: Token msg lost
           where this can happen:
                network failure -> reliable messaging
                my neighbor is dead -> have neighbor ack; if no ack, send to next neighbor
                a member can die holding token
                    critical failure mode
                    -> one solution for distributed algorithm is Election Algorithm

       Every algorithm has a 'critical failure' point
           whole algorithm stops

       In some applications, it may be necessary to have the ability to restart the algorithm
           -> election algorithm are one way to do this

       To be able to do this:
           1. Reliable method to identify the critical failure has occurred
           2. A reliable method to select a single group member to 'restart' the algorithm
           3. These methods must work even when multiple members are doing items 1&2 at the same time

       Election algorithm:
           'Elect' a new critical member
           1. Determine that the critical member has died
           2. Select a new critical member AND the entire group agrees

       Two Implementations:
           Bully:
               A member does step 1
               They initiate an election:
                   Send an election msg to all members with a higher group member ID
                   Upon receipt of an Election msg, they respond to the sender, with OK
                   Eventually, one group member will get no response -> they win
               The winning member sends a msg to all other members to let them know who is the winner
               + Correct, we can pick a new critical member
               - lots of msgs O(N^2)
               + Can use timestamps to ensure the results of 1 election are 'official'

           Ring algorithm:
               Like Token Ring, all members:
                   knows all other members
                   know the order of members
               The initiating member sends an election msg to their neighbor
                   include our group ID
               When an election msg is received, append your ID & forward
               When msg returns to initiator, change msg to 'CompletedElection' & send to its neighbor
               All group members record the new group data
               When completedElection msg returns to the initiator, they kill it

              Member with highest/lowest ID wins

              + Correct
              - O(N) msgs sent

              Failure Modes:
                  a member dies after the election msg passes & before the completedElection msg.
              If Elections are timestamped, any 'failure' in an Election can be solved by initiating a new Election