OS - jwyx/ForFun GitHub Wiki
Credit
USC CS402
Synchronization
How to work the projects?
1. Understand the theory thoroughly
design is important!
2. Start coding
Iterative Programming
3. Remaining code
each partner and integrate
Writeup
Comments
Tests
Execution
Primary goal: make a computer easier to use
Users | Application Programs | Virtual Machine Interface | OS kernel | Hardware Interface (Device Drivers) | Hardware
5 Major Components to an OS
CPU scheduling
File System
Memory Management
Networking -Distributed Systems
Security
Multi-programmed
Multiple users
Multiple programs
Need security between user programs
OS is responsible for security
OS key objective: Keep as many resources on the computer as busy as possible
-> switch efficiently between user programs (context switch)
switching the CPU from running one user program to running a different user program
transparent to the user program
When to context switch?
When a user program requests a 'slow' operation
When a user program finishes
When a new user program arrives (optional)
When a maximum amount of time of occurred -> Time Slice (optional)
Context switching allows for currency -> user programs can behave as if they have the computer to themself
On a context switch, the OS must remember the user program context (state) when being evicted from the CPU
Process:
OS manage process, NOT user programs
used to keep track of all the things an OS must remember about the state of a user program
3 Main parts:
Code/data
Allocated resources
information for context switch
CPU registers
Other OS-specific data
4 States:
New: process just been created. Not completely setup for CPU execution, yet, as a user program
Ready: process is ready for the CPU; waiting its turn
Running: process currently executing in the CPU
Blocked: Process cannot use the PU until some event occurs
Race condition:
order of execution affects the results when sharing resources
NO shared data, NO race condition
E.g.
A B
(1) i = 0 (3) i = 10
(2) i = i +1 (4) i = i - 1
(1)(2)(3)(4) -> 9; (3)(4)(1)(2) -> 1; (1)(3)(2)(4) -> 10; (3)(1)(4)(2) -> 0;
Mechanisms for sharing data between execution streams (between processes):
Message passing:
Requires a system call so that OS can perform the data sharing transfer between 2 processes
Disadvantage: OS must run code for every sharing event
Global Memory:
Processes can request a 'special' block of memory that can be shared.
All authorized processes share this block of memory.
Disadvantage: still requires OS involvement at startup time
Thread:
Allow multiple execution streams inside a single process
How does the OS track threads?
Each thread has its own CPU state in their process
CPU scheduler now schedules threads
Now A process contains:
A grouping of resources
One, or more, thread
CPU state saved separately for each thread
How to solve race condition when using threads?
1. How to share data?
Use global variable to share data
2. How to ensure threads, in a process, execute one at a time?
3. How to ensure proper sequencing of events?
Atomic Operation
One, or more, tasks that execute as a "single operation", can't be split up
No interference with shared data
Critical Region
A section of code that utilizes shared resources
Idea: use atomic operations to control access to critical regions
Mutual Exclusion
If one thread is in a critical region, no other thread in the same process,
is allowed to enter that critical region they must wait
4 necessary conditions for M.E.
Only 1 thread in a C.R. at a time
No thread should wait forever to enter a C.R.
No assumptions about CPU speed
No thread outside a C.R. can block a thread in a C.R.
Avoiding busy waiting!
Need higher-level atomic primitives (in software)
Using Monitors
3 components:
1. Lock for mutual exclusion
Acquire: lock
Release: unlock
both operations MUST be atomic
Atomic:
Use interrupts (disable, restore)
Sleep/ Wakeup mechanism:
Sleep: A thread "state" where it cannot gain access to the CPU on its own by the CPU scheduler
-> remove the thread from ready queue
-> find a sleeping threads, we can use a separate queue to store sleeping threads
specific to each lock
-> before going to sleep, add myself to the lock's wait queue
Wakeup: The event, a sleeping thread has been waiting for, has occurred
-> Wakeup 1 sleeping thread
-> Give this 1 thread ownership of the lock
-> Put the thread in the Ready Queue in the Ready State
For a thread to gain access to the CPU:
it must be in the Ready Queue
it must be in the Ready State
2 States:
Free - available
Busy - Not available
Solved problems:
Works!
All threads have same code
No busy waiting
Only solve mutual exclusion
2. One, or more, condition variables for sequencing
Have no state, but 3 opertiaons
Wait: put myself to sleep, waiting on some condition
Signal: wake up 1 waiting thread if there is one
Broadcast: wake up all waiting threads
3. Monitor variables for making sequencing decisions
Just data
2&3 are shared data
Form of a Monitor
Acquire a lock - Entering the monitor
Access/ manipulate our monitor variables to make synchronization decision
Use C.V. operations to wait, signal or broadcast
Release the lock - Exiting the monitor
Mesa-style: if Q and P in the same monitor, Q has to wait for access to the monitor when it
wake up from a CV by P.
How to design multi-threads:
What lock?
What monitor variables?
What condition variable?
Methodology for writing multi-threaded interactions:
'write' one at a time - only 2 entities:
1. Analysis
lock
1 or more CV
monitor variables
2. Design/ end up with pseudocode, Ignore any other entities
3. Write code for both entities
4. Compile the code
5. Test the code you just wrote
Semaphores:
can handle all of the problems monitors can handles
2 Operations:
Wait Down P
Signal Up V
State:
an integer
0 -> any positive value
Have a wait queue
Up:
Increment semaphore value by 1
Wakeup 1 waiting thread
Down:
Decrements value by 1 - if not 0
If value is 0, thread go to sleep
Semaphore value is private
1. Mutual Exclusion:
Binary semaphore - 0 or 1
Initialized to 1
to enter a C.R. - Down()
to exist a C.R. - Up()
2. Synchronization with Semaphore
Can start with any valid value
Usage can be an Up, or Down to start with
Application decision
E.g.
Soda and Student
Needs 3 semaphores:
mutex - lock - binary semaphore
empty - #empty slots - start @ 10
full - #full slots - start @ 0
SodaGal
empty.Down() // reserve an empty slot for me
mutex.Down()
// add 1 soda
mutex.Up()
full.Up()
Student
full.Down() // reserve an empty slot for me
mutex.Down()
// take 1 soda
mutex.Up()
empty.Up()
System call & CPU schedule
System calls
mechanism by which the OS allows user programs to perform kernel tasks
Fork - create a new thread
Exec - create a new single threaded process
Exit - when a thread is done executing
Synchronization
Locks: Acquire, Release, Create Lock, Destroy Lock
CVs: Wait, Signal, Broadcast, CreateCV, DestroyCV
miscellaneous:
Yield
The purpose of a system call:
'Safe' mechanism by which user program are able to access kernel-protected resources
Lock class & condition are kernel resources
Lock & condition system calls are just interface to Lock & Condition kernel objects
Locks are specific to a process
Format of sys calls from user program viewpoint:
1. What data does the OS need to carry out the request?
2. Is there a return value to the user program?
must return a handle to the user programs - a resource identifier
OS Help: Only destroy a resource when it is not in use!
remember a destroy system call has been requested!
need a status variable that a lock is going to be used!
CPU Scheduling:
Task: which thread gets the CPU task & when do they get it
Policy: Rules
Mechanism: how we implement the rules
5 possible goals for scheduler:
Fairness: every job gets the same amount of CPU time
Efficiency: Keep as many of the computers resources as busy as possible
Response Time: Minimize / Maximize interactive response time
Turnaround Time: Give CPU preference (or not) to background tasks
Throughput: Complete as many jobs as possible in some amount of time
Issue: Do jobs run to completion or not (non-preemptive or preemptive)
How long is a time slice?
Too long: interactive use will suffer
Too short: too many unnecessary context switches
Scheduling Policies:
First-Come First-Served
Similar to 'standing in line'
Non-preemptive
The job that has waited the longest gets the CPU next
+ Fair (No starvation)
+ Simple
- Short jobs can get 'stuck' behind long jobs
Round Robin:
preemptive FCFS
Jobs run until their time slice has expired, they are blocked on a 'slow operation',
or they voluntarily give up the CPU
+ Fair
+ Short jobs don't get stuck behind long jobs
- Unnecessary context switches when all jobs are about the same length (not short)
Shortest Job First:
Maximize throughput
Run the job requiring the least CPU time to complete
Non-preemptive
- Not fair (starvation)
- Not implementable
Shortest Remaining Time to Complete First
Preemptive
The job 'we' predict will use the smallest time slice percentage - based on its recent history
- Can still have starvation
- The scheduler must track & compute the time slice usage
Priority -based Scheduling:
Priority: Some jobs are favored over other jobs
Policy: 'Higher' priority jobs run before lower priority jobs
Issue: Preemptive or not?
New Issue: Do priorities change over time?
static priorities: priorities are fixed
- starvation
+ easier on OS
dynamic priority: priority can change
up or down
wait time
CPU time
+ NO starvation
- Extra overhead, during context switch, in recomputing priorities
Final issue: How many priorities are needed?
somewhere around 4 priorities are typical
Most effective Ready Queue organization, is to have a separative Ready Queue for each priority;
This allows for different scheduling policies for each priority
The time slice interval can be different for each priority queue
Deadlock:
Occurs because of competition for resources:
hardware
files
synchronization primitives
E.g.1
lock1->Acquire() lock1->Acquire()
lock2->Acquire() lock2->Acquire()
cv2->wait(lock2) CV2->signal(lock2)
E.g.2
lock1->Acquire() lock2->Acquire()
lock2->Acquire() lock1->Acquire()
Definition:
Two, or more, jobs each waiting on an event that can only be produced by one of the waiting jobs.
Using resource sequence:
1. user program requests resource access from the OS
2. Once access is granted, the user's job uses the resource
3. When user program is done with the resource, it is given back to the OS
What if a requested resource is not available?
1. Fail the request
Application has the responsibility to handle failed requests
+ No deadlock possible
2. Queue the request
OS has responsibility to manage queued requests
- Deadlock is possible
4 conditions for deadlock:
1. Mutual exclusion:
Some resources that cannot be shared
2. Hold & Wait
Jobs can own resources & request other resources, without giving up what they already own
3. No preemption
Once a resource is given to a job, it cannot be 'safely' borrowed
4. Circular wait
X, Y resources
A, B jobs
A owns X and asking for Y
B owns Y and asking for X
Deadlock solution strategies:
1. Do nothing
2. Detection & Recovery
have a way to detect deadlock has occurred
kill a job
3. Dynamic Avoidance
We try to keep deadlock from happening by carefully allocating resources
4. Prevention
Eliminate 1 of 4 required conditions for deadlock
Memory Management
Virtual address space (part of our security model)
First executable instruction is at virtual address 0
Code, Data, Heap, Stack
Code and Data built by compiler
Heap and Stack assigned at run time start-up
How to load the executable contents into memory?
Memory is 'divided' into equal size pieces - pages
The OS allocates memory 1 page at a time
The OS must now track the used/ unused pages of memory (use Bitmap)
For each process, we need to track where each virtual address space page is loaded into memory
Page table:
A mapping of virtual page locations (starting virtual address of that page) to the physical page
locations (where were they copied into memory) is an array indexed by V.P.N
One stack per thread
Physical Memory Management
Most Basic:
Only 1 progress, with only 1 kernel thread, in memory at a time
No memory sharing between user programs -> no security
Fixed Partitions:
Memory is divided into a set of partitions of various sizes
Load a program into a partition
No sharing of partitions
+ Multiprogramming is possible
+ The partitions are the security
- Can only run a program up to the largest partition
- Internal memory fragmentation
Unused memory allocated to a process
Dynamic Partitions (Base & Bounds)
A partition is created at process initialization
+ A process can be loaded anywhere in memory
+ Can run one job almost the size of physical memory
- Need security between user programs
Need 2 new registers
Base: starting memory address
Bounds: partition size
- Problem: If need more memory for the new process than any single piece of available memory
Wait until later defragment memory
One more disadvantage that all 3 methods up to now; memory allocated to a process has been contiguous.
Address Translation:
Validate memory references & converting virtual addresses to real(physical) addresses
Done in hardware, MMU
For Base & Bounds:
1. validation:
Compare needed virtual address to the bounds reg.
V.A >=0 && V.A. < Bounds Reg
2. Getting Physical Address
P.A. = Base Reg + V.A.
Segmentation:
Divide the address space into segments - these are smaller than whole address space
Each segment is contiguous in memory
Any segment can be loaded into an unused block of memory that is large enough
One set of Base & Bounds registers is not enough
Too expensive to have lots of pairs of registers
We will store these B&B values in a table - segment table
New + Memory sharing can now occur between processes
entire segment
New + Virtual memory is now possible
allow user program to ignore physical memory limitations
- Memory allocation is still a problem
- External fragmentation exists
How to do Address Translation:
MMU process:
0. Receive V.A
1. Validate V.A
split the V.A into 2 parts
Segment number (index position in segment table; array)
Address offset in this segment
2. If valid, perform the translation
E.g.
16 bit V.A.
5 bits Segment #, 11 bits V.A. offset
Segment Table (valid, base, bounds)
Validation Process:
Does segment # exist in segment table? (have such index in segment table)
Is this segment already in memory? (valid bit is true)
Is address offset contained with segment 17? (compare offset with size)
Compute physical address:
P.A. = V.A. + Segment start (Base)
Paging:
Allocate memory in fixed size blocks - all the same size ->Pages
Logically divide the address space into virtual pages - the same size as physical memory pages
+ Any virtual page can be loaded into any available page of physical memory
OS. tracks all virtual page locations, for each address space, in physical memory
use a page table for this
Address Translation:
Similar like segmentation
split the V.A. into 2 parts
virtual page #
address offset within a page
Paging has a new problem:
Page Table is an array (It must be physically contiguous in memory)
indexed by V.P. #
managed by kernel
no address translation
Page table can be very large & mostly empty
E.g. 32 bit OS -> 2^32 bytes available to a user program -> 4 billion bytes -> 1 billion addresses
Multi-level Translation:
Lowest - level table, which points to the physical page where the virtual page is located,
is always a page table.
E.g.
2-level paging
Divide the V.A. into 3 parts
32bit address space (10 bit top-level index, 10 bit 2nd-level index,12 bits address offset)
2^20 virtual pages, 1025 total tables
In real OSes:
32 bits: 3 or 4 translation levels
64 bits: 4 translation levels
But we have new problem:
MMU:
1. V.A. received & validated
2. MMU accesses memory for 1st-level P.T.
3. ... 2nd-level P.T.
4. ... 3rd-level P.T.
5. ... 4th-level P.T.
6. ... Get P.A.
5 memory accesses for 1 user program address
Translation Lookaside Buffer (TLB):
Cache of page table entries
Map virtual pages to physical page - like a page table
The V.P.N is the Key
Contains recently accessed V.P.s & their physical page location
MMU process:
1. Receive & validate V.A.
2. Compute the V.P.N
3. Use V.P.N to lookup in the TLB,
If found directly compute physical address - done
4. If not found in TLB, go to the translation tables
Add this entry to the TLB
Issue: what happens on a context switch?
Invalidate the TLB!!
Issue: Real OSes, when they get busy, don't have enough memory
The OS must manage memory - it can be full
-> Global page replacement
The OS can remove a page from any process
Inverted Page Table (IPT):
Goal: check whether a V.P. is in memory!
kernel needs some data structure that maps physical memory - independent of user programs
maps physical pages to virtual pages
only 1 for OS
always in memory
1 entry for each physical memory page
managed by kernel
'Indexed' by physical page #
In the real OSes, the IPT is a hash table
key: VPN & process ID
lookups are:
IPT contains @ least:
physical page #
virtual page #
process id
valid bit
dirty bit - when true, physical page has been modified
MMU process:
1. receive & valiate V.A
2. check TLB
if found, compute P.A. - done
3. If not in TLB, look in IPT,
if found, compute P.A., update TLB - done
4. (Page fault) If not in IPT, go to the translation table (needed V.P. is not in memory)
a. load needed page into an available page of memory
b. update IPT, TLB
c. update translation table
d. restart the user instructions
Demand Paged Virtual Memory:
Don't preload any virtual pages into memory - wait for a page fault
What if physical memory is full on a page fault?
must evict a page of memory
What if the selected page is dirty?
we must save this dirty page to disk
MMU process:
1. received & validate V.A.
2. check TLB
3. If not in TLB, check IPT
4. If not in IPT, check translation tables
a. to look in our T.T., we use the V.P.N to find the appropriate entry
b. read that entry to find the disk location
c. copy this page (if on disk) to a physical memory page
if none available, evict one
for the selected physical page:
is it dirty?
Yes, save it to disk
update translation table// record where it stored (none, swapfile, executable)
d. now we can load in the needed virtual page - into the newly available physical memory page
e. update the translation table for the newly loaded page of the current process
f. update IPT & TLB for the newly loaded page
g. the evicted page, if it is in my process, might be present in the TLB
If so, invalidate that TLB entry
h. restart the user instruction
Different between Translation table and Page table??
Page Replacement Policies:
'global' choices:
1. If a page is dirty/ non-dirty
2. If the page is dirty->
maintain a cache of memory pages not allocated to user programs (safety net)
instead of putting the dirty page to the swap file, I use a 'cache' page
Specific Policies:
1. Random
+ simple
- not based on usage
2. FIFO
select the page that been in memory the longest
- not based on usage
3. Optimal:
replace the page that will not be needed for the longest time
- not implementable
+ used for comparisons with implementable policies
4. Aging algorithm:
replace the 'oldest' page
we have a way to measure age (length of time since a page was accessed)
Clock - 1 bit
Second Chance - 2 bit
Several algorithms - 4 or 8 bits
Least Recently Used (LRU)
typically 32 bits
Protection Systems:
The OS has the responsibility for the protection of computer resources
General characteristics:
An OS has resources (objects)
These resources are managed by the OS for user programs
Each resources has a unique identifier
Each resource has a set of valid operations that can be performed on it
Protection systems have 3 main components:
object/ resource to be protected
set of allowed operations
processes (user programs)
Domain Matrix Implementation:
A 2D array
Domain: A set of pairs - object/ rights (authorized operations)
Rules: A user program can only be in one domain at a time
A user program can change their domain
A 'typical' domain is the user account
Each object/rights pairs specifies:
the object being protected
the subset of allowable operations the domain can perform
In our matrix:
a row represents one domain
a column represents one object
a cell contains the rights for a single domain to a single object
This is a kernel data structure
disadvantage: sparse
Do not store empty cells - only store rights that exist
No entry means no access
1. by row (domain) -> protection domain
2. by column (object) -> access control list
Protection domain:
for each domain, we have a set of pairs (object, rights)
ACL:
for each object, we have a set of pairs (domain, rights)
Which method to choose for an OS?
Protection Domain:
+ users can see all files they can access
+ only one active domain, at a time, for a process, we can store a reference to that domain protection in the process
ACL:
+ can show all domains which can access a single object
Protection data is privileged (kernel access)
no direct user access
Issue: at any instant, the protection data determines what a domain can do (mechanism)
It does not control what a domain are authorized to do (policy)
Protection data may not match the protection policy
Protection systems only validate against the protection data, NOT the protection policy
Networks
Remote Procedure Calls
Based on the concept of a local procedure call
jump to another section of code
execute it
return to calling address & continue - may return a value
The difference with RPCs - the jump is to code on another computer
How to implement RPCs?
1. Require user programs to explicitly issue Send/Receive syscalls
+ Easy OS implementation
- Require the application to 'know' the server location & possible the protocol (limit flexibility)
2. Hide the fact that work is being done remotely from user programs
No networking code in the user program
Send/Receive work done by a 'middle layer', or by OS kernel
How to do this?
use 'stub' programs to do the networking
client stub (issues requests)
server stub (receives requests)
Where do stubs come from?
Idea: Client stub & server stub must understand each other
They primary task is to create, parse, send & receive, msg between each other
Have a program to generate the stubs => stub generator
web services -> Axis wsdl2java
CORBA - IDL idj2java
Specification Data:
Request type
Request version (optional)
Order of data in msg
Format & type of each data item
How does the client stub 'know' where the server stub is located?
static binding -> hard code the network location of the server stub in the client stubs - not flexible
dynamic binding ->
client stub obtains the network location of server stub @ request time
need a new program: binder
track location of server stubs
respond to client stub requests
server stubs register with binders
Deregistration msg
binders can 'poll' server stubs
load balancing
authorization
advantage:
flexible
achieved transparency to the user apps
disadvantage
complexity
Distributed Mutual Exclusion:
1. Centralized
Have 1 mutual exclusion server
It handles all request to enter CRs
A request message is sent from the client to the server
If no client is in that particular CR, the server replies with an OK msg
If a client is in that CR, the server queues the request
upon exiting a C.R., a client send a release msg to the server
If a client is waiting for that CR, the server send 1 OK msg to a waiting client
For this to work, CR need unique identifiers
+ correct
+ fair
- does not scale well
- single point of failure - server
2. Fully distributed
no centralization
no central server
all clients work together
no centralized decision making
clients make group decisions
Requirements:
1. Reliable communication
no lost messages
2. Globally unique identifier for each group member
3. Total ordering of events
An event, in a distributed system, is a Send/Receive
All group members agree on the order of Sends
require messages, from a single group member, are delivered in timestamp order
How to ensure 'total ordering'?
A group member does not process a received request msg until it 'knows' it cannot receive a request msg from
ANY group member with an earlier timestamp
New requirement:
Message from a single group member are received in timestamp order & no 2 msgs have the timestamp
Solution:
Each group member maintains a 'last timestamp received' table
Process for Total Event Ordering:
1. Receive a msg
2. Extract timestamp & member ID
3. Update last timestamp, in my table, for that member
4. Extract the earliest timestamp value from my table
5. Process any msg, in timestamp order, with a timestamp <= value from step 4
Mutual Exclusion Message Processing Algorithm:
1. A group member sends a request msg to all group members
msg contains: identifier of C.R., timestamp, ID of requesting member
2. A member processing a request msg can be in 1 of 3 states:
a. Member is not in that C.R. & has no pending request for that C.R.
b. Member is in that C.R.
c. Member has a pending request for that C.R.
3. How/when to respond?
a. Send OK msg to requester
b. Queue the request, when I exit the C.R., send OK msg to all queued requests for that C.R.
c. Compare timestamps (& global IDs if timestamps are the same)
if my timestamp is earlier, like b
if my timestamp is later, like a
C.R. Entry Rule: A member can enter a C.R. when OK msgs have been received from all other members
+ Correct
+ Scales somewhat (O(n^2))
- N points of failure
Variation 1# - Always send a reply to a request
Member will send an OK, or a NOT OK msg upon receiving a request
+ Requesting members can use the lack for a response to start a process to determine if a member is dead
- Still have a problem in 2 cases
A member dies after sending a NOTOK, before exiting the C.R.
A member in the C.R. dies
Variation #2 - Majority Rules
A member can enter a C.R. when N/2 + 1 OK msg are received
Members only send 1 OK msg when they receive a Release msg (someone exits the C.R.)
+ N/2 - 1 members can die without affecting the algorithm
- Single point of failure - member in the C.R. dies (no sending of release msg)
Another approach to distributed M.E.: Token Ring algorithm
For each C.R. there is a Token msg
it contains the C.R. identifier
To enter a C.R. the member must have the token msg
the member keeps the token until they exist the C.R.
Group organization is a 'logical' ring
Members know who is in the group & the order of members
If a member does not want to enter a C.R., when receiving that token msg they forward the msg to their neighbor
+ Correct
- Token msg consume network resources even with no C.R. activity
When can token Ring algorithm Fail?
Failure model: Token msg lost
where this can happen:
network failure -> reliable messaging
my neighbor is dead -> have neighbor ack; if no ack, send to next neighbor
a member can die holding token
critical failure mode
-> one solution for distributed algorithm is Election Algorithm
Every algorithm has a 'critical failure' point
whole algorithm stops
In some applications, it may be necessary to have the ability to restart the algorithm
-> election algorithm are one way to do this
To be able to do this:
1. Reliable method to identify the critical failure has occurred
2. A reliable method to select a single group member to 'restart' the algorithm
3. These methods must work even when multiple members are doing items 1&2 at the same time
Election algorithm:
'Elect' a new critical member
1. Determine that the critical member has died
2. Select a new critical member AND the entire group agrees
Two Implementations:
Bully:
A member does step 1
They initiate an election:
Send an election msg to all members with a higher group member ID
Upon receipt of an Election msg, they respond to the sender, with OK
Eventually, one group member will get no response -> they win
The winning member sends a msg to all other members to let them know who is the winner
+ Correct, we can pick a new critical member
- lots of msgs O(N^2)
+ Can use timestamps to ensure the results of 1 election are 'official'
Ring algorithm:
Like Token Ring, all members:
knows all other members
know the order of members
The initiating member sends an election msg to their neighbor
include our group ID
When an election msg is received, append your ID & forward
When msg returns to initiator, change msg to 'CompletedElection' & send to its neighbor
All group members record the new group data
When completedElection msg returns to the initiator, they kill it
Member with highest/lowest ID wins
+ Correct
- O(N) msgs sent
Failure Modes:
a member dies after the election msg passes & before the completedElection msg.
If Elections are timestamped, any 'failure' in an Election can be solved by initiating a new Election