POSIX, Part 1: Error handling - angrave/SystemProgramming GitHub Wiki
What is POSIX error handling?
In other languages, you may see error handling implemented with exceptions. Although you technically can use them in C --You keep a stack of very try/catch block and use setjmp
and longjmp
to go to those blocks, respectively -- error handling in C is typically done with POSIX error handling the code typically looks like this.
int ret = some_system_call()
if(ret == ERROR_CODE){
switch(errno){
// Do different stuff based on the errno number.
}
}
In the kernel, the use of goto
is heavily used to clean up different parts of the application. You should not use gotos because they make code harder to read. gotos in the kernel are there out of necessity, so don't take lessons.
errno
and when is it set?
What is POSIX defines a special integer errno
that is set when a system call fails.
The initial value of errno
is zero (i.e. no error).
When a system call fails it will typically return -1 to indicate an error and set errno
What about multiple threads?
Each thread has it's own copy of errno
. This is very useful; otherwise an error in one thread would interfere with the error status of another thread.
errno
reset to zero?
When is It's not unless you specifically reset it to zero! When system calls are successful they do not reset the value of errno
.
This means you should only rely on the value of errno if you know a system call has failed (e.g. it returned -1).
errno
?
What are the gotchas and best practices of using Be careful when complex error handling use of library calls or system calls that may change the value of errno
. In practice it's safer to copy the value of errno into a int variable:
// Unsafe - the first fprintf may change the value of errno before we use it!
if (-1 == sem_wait(&s)) {
fprintf(stderr, "An error occurred!");
fprintf(stderr, "The error value is %d\n", errno);
}
// Better, copy the value before making more system and library calls
if (-1 == sem_wait(&s)) {
int errno_saved = errno;
fprintf(stderr, "An error occurred!");
fprintf(stderr, "The error value is %d\n", errno_saved);
}
In a similar vein, if your signal handler makes any system or library calls, then it is good practice to save the original value of errno and restore the value before returning:
void handler(int signal) {
int errno_saved = errno;
// make system calls that might change errno
errno = errno_saved;
}
How can you print out the string message associated with a particular error number?
Use strerror
to get a short (English) description of the error value
char *mesg = strerror(errno);
fprintf(stderr, "An error occurred (errno=%d): %s", errno, mesg);
How are perror and strerror related?
In previous pages we've used perror to print out the error to standard error. Using strerror
, we can now write a simple implementation of perror
:
void perror(char *what) {
fprintf(stderr, "%s: %s\n", what, strerror(errno));
}
What are the gotchas of using strerror?
Unfortunately strerror
is not threadsafe. In other words, two threads cannot call it at the same time!
There are two workarounds: Firstly we can use a mutex lock to define a critical section and a local buffer. The same mutex should be used by all threads in all places that call strerror
pthread_mutex_lock(&m);
char *result = strerror(errno);
char *message = malloc(strlen(result) + 1);
strcpy(message, result);
pthread_mutex_unlock(&m);
fprintf(stderr, "An error occurred (errno=%d): %s", errno, message);
free(message);
Alternatively, use the less portable but thread-safe strerror_r
. perror
is thread safe, which is why it is preferred in multi-threading environments if possible.
EINTR
? What does it mean for sem_wait
? read
? write
?
What is Some system calls can be interrupted when a signal (e.g SIGCHLD, SIGPIPE,...) is delivered to the process. At this point the system call may return without performing any action! For example, bytes may not have been read/written, semaphore wait may not have waited.
This interruption can be detected by checking the return value and if errno
is EINTR
. In which case the system call should be retried. It's common to see the following kind of loop that wraps a system call (such as sem_wait).
while ((-1 == systemcall(...)) && (errno == EINTR)) { /* repeat! */}
Be careful to write == EINTR
, not = EINTR
.
Or, if the result value needs to be used later...
while ((-1 == (result = systemcall(...))) && (errno == EINTR)) { /* repeat! */}
On Linux,calling read
and write
to a local disk will normally not return with EINTR (instead the function is automatically restarted for you). However, calling read
and write
on a file descriptor that corresponds to a network stream can return with EINTR.
Which system calls may be interrupted and need to be wrapped?
Use the man page! The man page includes a list of errors (i.e. errno values) that may be set by the system call. A rule of thumb is 'slow' (blocking) calls (e.g. writing to a socket) may be interrupted but fast non-blocking calls (e.g. pthread_mutex_lock) will not.
From the linux signal 7 man page:
"If a signal handler is invoked while a system call or library function call is blocked, then either:
- the call is automatically restarted after the signal handler returns; or
- the call fails with the error EINTR. Which of these two behaviors occurs depends on the interface and whether or not the signal handler was established using the SA_RESTART flag (see sigaction(2)). The details vary across UNIX systems; below, the details for Linux.
If a blocked call to one of the following interfaces is interrupted by a signal handler, then the call will be automatically restarted after the signal handler returns if the SA_RESTART flag was used; otherwise the call will fail with the error EINTR:
- read(2), readv(2), write(2), writev(2), and ioctl(2) calls on "slow" devices. A "slow" device is one where the I/O call may block for an indefinite time, for example, a terminal, pipe, or socket. (A disk is not a slow device according to this definition.) If an I/O call on a slow device has already transferred some data by the time it is interrupted by a signal handler, then the call will return a success status (normally, the number of bytes transferred). "
Note, it is easy to believe that setting 'SA_RESTART' flag is sufficient to make this whole problem disappear. Unfortunately that's not true: there are still system calls that may return early and set EINTR
! See signal(7) for details.
Errno exceptions?
There are some POSIX utilities that have their own error numbers and do not use errno
. One example is when you call getaddrinfo
that returns the error code as its result. The function to check convert the returned error number to a string is gai_strerr.