uthreads - brown-cs1690/handout GitHub Wiki
In this project you will develop and test your very own user-level threads package complete with thread creation/deletion/joining, mutexes, condition variables, and a priority-based scheduler. Having completed this assignment, you should be able to write multi-threaded applications using your package instead of, say, pthreads.
Download the stencil here.
If this is your first time installing a Github Classroom assignment, or you need a refresher, please refer to the Github and Gradescope guide. This will walk you through how to get setup with Git and install an assignment.
Once the assignment is installed, ensure you are working in a Linux environment with the proper libraries installed. Please see the Developing Locally Guide for more details.
Be sure to make commits frequently so you won’t lose your work! This project is due on Feb 5th at 11:59 PM ET.
Linux (and most other operating systems) have the notion of a "machine context" which contains the complete state (at least everything viewable from user-land) of the CPU at any given time. Usually this includes a pointer to the stack, register contents, and other bookkeeping information. The idea is that since contexts contain a complete description of the CPU state, you should be able to save the current state and exchange it with another state (which has either been previously saved or constructed by hand). This enables the operating system to suspend and resume execution of programs at will, and also enables you to write a user-land threads package. As a note, "user-land" refers to threads run in user-mode that are created by user applications (as opposed to a kernel thread, which is directly supported by the OS).
Let's try an example. Say you have threads TA, TB, and TC with
contexts CA, CB, and CC of which TA is currently running.
Say TA calls swap(
CA, CB)
, causing the current machine state
to be saved as CA and the thread TB to begin executing. TB then
does a swap(
CB, CC)
, causing TC to start running. Eventually
TC decides that for whatever reason it wants to allow thread TA
to run again. When it calls swap(
CC, CA)
, the machine state that
was previously saved as CA is restored, and TA starts running
again. From TA's perspective, the call to swap()
simply did nothing
except pause for some period of time.
It is important to note that context and threads are fundamentally different things. A context is merely a snapshot of the CPU as it is at a point in time/code. Threads are a kernel construct that contain a bunch of other information which is unimportant to the hardware (scheduling priority, thread state information, signal mask, etc.). Threads contain a context, not the other way around.
uthreads is a user-level threading package which has been loosely based on the familiar pthreads interface. It supports the creation of threads which can be joined with or detached, a simple priority based scheduler, mutexes and condition variables. You will be writing the majority of the uthreads code, but your generous TAs have provided you with some code for dealing with dead threads in addition to a wrapper around the Linux function for creating contexts. The uthreads functions which we give you that you might have to call yourself are:
// uthread.c
char *alloc_stack();
void free_stack(char*stack);
// uthread_ctx.c
void uthread_makecontext(uthread_ctx_t *ctx, char *stack,
int stacksz, void (*func)(),
long arg1, void *arg2);
as well as the functions in uthread_queue.h
and the macros in
list.h
.
The uthreads API functions which you must implement are:
// uthread.c
void uthread_start(uthread_func_t firstFunc, long argc, char *argv[]);
int uthread_create(uthread_id_t *uidp, uthread_func_t func,
long arg1, void *arg2, int prio);
void uthread_exit(void *status);
int uthread_join(uthread_id_t uid, void **return_value);
int uthread_detach(uthread_id_t uid);
uthread_id_t uthread_alloc();
void uthread_destroy(uthread_t *uth);
void reaper_init(void);
void make_reapable(uthread_t *uth);
// uthread_sched.c
void uthread_yield();
void uthread_wake(uthread_t *uthr);
int uthread_setprio(uthread_id_t id, int prio);
void uthread_switch(utqueue_t *q, int saveonrq);
void clock_interrupt(int sig);
// uthread_mtx.c
void uthread_mtx_lock(uthread_mtx_t *mtx);
int uthread_mtx_trylock(uthread_mtx_t *mtx);
void uthread_mtx_unlock(uthread_mtx_t *mtx);
// uthread_cond.c
void uthread_cond_wait(uthread_cond_t *cond,
uthread_mtx_t *mtx);
void uthread_cond_broadcast(uthread_cond_t *cond);
void uthread_cond_signal(uthread_cond_t *cond);
This may seem like a lot of work, but most of these functions are short. As a case-in-point, we are providing you with 800+ lines of stencil code and the TA implementation is less than 1000 lines total. So, while this assignment contains some concepts that you might not have been exposed to, it is mercifully short.
These functions are marked by NOT_YET_IMPLEMENTED
macros found in the
code. To see a list of all the functions you still need to implement,
type make nyi
from the command line, while in the same directory as
the Makefile we have included.
Each of the functions mentioned above has extensive comments in the source code which explain what is expected of you, but to save you some time, we will give you a brief summary of how the system works as a whole.
The first thing that any executable that uses your threads package
should call is uthread_start
. This should be called exactly once and
is responsible for setting up global data structures such as the
uthreads
array. It creates the first thread (and allocates its stack),
which then calls the function that was passed to uthread_start
.
Threads may be in one of a number of states:
-
UT_ON_CPU
, at most one thread is in this state and it is running -
UT_RUNNABLE
, it would like to be running, but is waiting its turn for the CPU -
UT_WAIT
, it is in a queue, waiting for something to happen, such as a mutex being unlocked -
UT_ZOMBIE
, it has terminated and is waiting to be reaped (explained soon)
After the uthreads have been initialized (via the call to uthread_start
), all actions are done in the context of the current thread, which is pointed to by ut_curthr
(among your duties is to make sure that this is true). If the current thread wishes to make some other thread the current thread, it calls uthread_switch
(in uthread_sched.c). Often, but not always, when the current thread calls uthread_switch
, it puts itself on a queue as part of the call. This queue may be a queue of threads waiting for something to happen (such as a mutex being unlocked) or a queue of threads waiting for their turn to run (this queue is called the runq
-- more on it later). Thus uthread_switch
has two arguments. The first, if not zero, is the address of a wait queue the thread should be put on as part of the call to uthread_switch
. The second, if non-zero, means that the current thread should be put on the runq
and the state of the thread should be modified to be UT_RUNNABLE
. It definitely shouldn't be the case that both arguments are non-zero.
The caller of uthread_switch
will set the state of the current thread to what it should be after it yields its current-thread status. For example, in uthread_mtx_lock
, if the mutex is already locked, the current thread sets its state to UT_WAIT
and then calls uthread_switch
, specifying in its first argument the address of the queue of threads waiting on this mutex. Thus uthread_switch
will put the thread on the proper queue (and it's already marked as being in the UT_WAIT
state) and switch to some other thread.
When the current thread wishes to wake up another, it calls uthread_wake
. Its argument is a pointer to the thread to be woken up. If that thread isn't in the UT_WAIT
state, it does nothing, but if it is (the usual case), that thread is removed from the wait queue it is in, its state is set to UT_RUNNABLE
, and it is moved to the runq
.
When we say that uthread_switch
gives the CPU to some other thread, we mean that it gives the CPU to the highest priority, longest waiting runnable thread. the runq
is actually an array of queues, one for each priority level -- threads with higher priority values run in preference to those with lower values. This priority value is initially set when the thread is created (it's an argument to uthread_create
), but can be changed via calls to uthread_setprio
. Keep in mind that if one thread sets the priority of another thread to a value higher than its own, then that other thread, if it's runnable, preempts the thread calling uthread_setprio
.
In general, the uthreads assignment has been designed to behave like a
system you are familiar with, pthreads. As such, it has functions to
create, detach, and join threads. As with pthreads, if an undetached
thread finishes executing, its cleanup should be deferred until a call
to uthread_join()
is made. Happily, since you are writing the scheduling mechanism, you know that a call to
uthread_cond_wait()
will not return randomly (as it might with
pthreads).
If you need a refresher on these subjects, feel free to consult the CS033 slides on pthreads.
![]() |
---|
Figure 1: Life cycle of a uthread |
A note on some of the assumptions that you may make when writing this assignment: uthreads will never have more than one thread running at any one time, but threads may be preempted (control can be randomly taken away from the thread that is running). Handling multiple CPUs (the ability to run more than one thread concurrently) is beyond the scope of this assignment. These simplifications allow you to forgo taking locks on global uthreads data structures if you do not call a function that might put the thread to sleep while modifying the data (and will not be preempted). This should greatly simplify writing all the code in this assignment.
In uthreads, you will use uthread_makecontext()
to create a machine
context for a thread. When you want to change which thread is currently
executing (i.e. in uthread_switch
), you will need to call
getcontext
to save the context of the thread that has just called uthread_switch
, and at some later point call setcontext
to resume the context of the thread that is now to run. You might think of getcontext
as saving its caller's registers, particularly including the stack pointer, and you might think of setcontext
as restoring the registers of some thread that has recently saved them.
Note that when a thread is resumed via a call to setcontext
, it resumes execution returning from its previous call to getcontext
. We suggest using a local variable, declared volatile int
, to help keep track of whether a thread has returned from getcontext
immediately after calling it, or is it returning from it because of the setcontext
. There is an example of this in the lecture slides.
Previous versions of uthreads only allowed for threads to be
de-scheduled if they explicitly yield
ed the processor. Most interfaces
for threads do not work this way1; instead the execution of various
processes are interleaved implicitly by the scheduler.
A common way of implementing this is by time slicing: each thread is granted a certain amount of time that it can run continuously. After this time is up, the thread is "preempted" (i.e. descheduled) and another thread allowed to run. This is typically implemented by scheduling a periodic timer interrupt, and yielding the processor in an interrupt handler.
You will use interval timers (see setitimer
) and SIGVTALRM
to
simulate a timer interrupt; your signal handler clock_interrupt
will simulate an
interrupt handler.
1: And when they do, they often aren't called threads; more often, they are called coroutines
Unfortunately, adding preemption to your code does not "just work." Some
routines (think about mutexes) must run to completion without yielding
the processor. Use the uthread_nopreempt_on/off
functions which have already been written for you to maintain
the correctness of key functions by disabling/enabling preemption.
In general, it is safest to disable preemption for long as necessary within
your functions (but don't forget to re-enable it before every return
!).
There may also be cases with nested preemption; we deal with this by letting
each thread keep track of its own no_preempt_count
.
As discussed in lecture, the reaper thread is responsible for cleaning
up the resources of unused threads. It is important to note that the
reaper does not clean up non-detached threads which have exited
but not yet been joined. In other words, either uthread_join
or uthread_detach
must be called in order for those threads to be properly cleaned up. We have given you a complete reaper as reaper
in
uthread.c
. You should look at it to understand what this means.
As a rule, one should use the standard error types defined in errno.h
(although it should not be necessary to include this file to use the
values) and as mentioned in the pthreads man
page. Following the POSIX
specification, return the proper error value from the function, or '0' if
no error occurs.
Currently, it is not possible to take any (already compiled) program and have it use uthreads instead of pthreads. In order to use uthreads, you will need to add all of uthreads's files to its project, modify it to use the uthreads API functions and recompile it.
As always, use of gdb
will make your life much easier when trying to
get this assignment up and running. However, gdb
can sometimes get
confused in multithreaded programs, and may have trouble printing stack
traces. If that happens, don't worry; it doesn't mean your code is
broken. Also, since uthreads is built as a library, gdb
won't find
the symbols in it right away, so tell it to wait for the "future shared
library to load". Note that gdb will not be able to differentiate between the
different uthreads. If you need a refresher on gdb, we have put together a short section that you can find here.
If you are using an M1-M4 Mac, you'll need to run gdb
a bit differently:
- To run
gdb
on thetest
program, usegdb-multiarch -ix test.gdb
. - Once you are in
gdb
, usert
instead ofrun
to run the program. - You can use
gdb
normally otherwise.
Also, there's an additional tool you can utilize that was created by a former CS1670 student, Alex Mazansky, which you can find here. For this assignment, the tool allows you to visualize the states of the different threads and other functionality that you may find helpful. Instructions for using the tool are on the repository in the README.
We can't stress enough how important test code is in an assignment like
this. Without proper test code, finding bugs will be next to impossible.
Make sure to test all sorts of situations with lots of threads at
different priority levels. The Makefile
included with the assignment
will compile an interesting test program which uses the uthreads functions,
just to get you started (run ./test
from the directory your uthreads
library is in). The expected output is to first get "hello" from each
thread 10 times. Next, each thread will exit and be joined.
If it runs and exits cleanly, most of your basic
functionality is working, but be sure to test other cases.
We also provide two other test files, premptive_test.c
and non_preemptive_test.c
(run ./preemptive_test <test number>
and ./non_preemptive_test <test number>
from the directory your uthreads library is in). These tests are more comprehensive, so we'd suggest using these to test your implementation beyond ./test
. Note that these tests might exit cleanly even if your implementation fails. Therefore, you should carefully check the output for errors after running the test.
In addition, You might find it helpful to write your own tests as you are writing code for this assignment. Feel free to add those tests to the test.c
file!
Here is the expected output from ./test
:
Judicious use of assert()
will help you both understand your threads
package and debug it. This is your first real systems-level coding
project, and it is highly recommended that you assert a general sane
state of the system whenever you enter a function. Thinking about what a
"sane state" means should lead to a greater understanding of what is
happening at any given time and what could go wrong.
To hand in your project please refer to the Github and Gradescope guide.
Remember to include a README that briefly describes any important design decisions you made and mentions known bugs! You can use the cs1670_reformat
script to improve the format of your code, but you will not be graded on style (./cs1670_reformat *.c *.h
). If you get the following error: -bash: ./cs1670_reformat: /bin/bash^M: bad interpreter: No such file or directory
, running dos2unix cs1670_reformat
should resolve it.
Your grade for this project will be composed of two main components: Functionality and Interactive Grading.
-
Functionality (70%): This portion of your grade is graded manually, with the help of results from the autograder. There are no hidden tests: we only run
test.c
,premptive_test.c
, andnon_preemptive_test.c
in the autograder. Note that you will not see point values assigned to the tests in Gradescope. We have not yet decided the point values for each test, but our suggestion is to focus on basic functionality (test.c
) before moving on to the more comprehensive tests. - Interactive Grading (30%): This component of your score focuses on your code and your understanding of the project conceptually. You will meet one-on-one with your mentor to discuss your project in detail. During this session, your mentor will ask you to navigate to certain parts of the codebase, explain your implementation, and articulate your rationale behind certain design choices. This is an opportunity for you to demonstrate your understanding of the project beyond the autograder component and receive feedback on your implementation.