socket - yszheda/wiki GitHub Wiki

select

poll


The Problem with select()

int select(int numfd, fd_set * readfds, fd_set * writefds,
           fd_set * exceptfds, struct timeval * tv);

The performance of select() is directly related to the value of the file descriptors being monitored. No other Linux system call depends on file descriptors' values. (The number of file descriptors monitored would be a more natural characteristic for performance to depend on.) As a specific example of this, select() performs very poorly once the file descriptors get large.

If fd_set were actually enlarged, each fd_set structure would measure at least 12K in size! Needless to say, handling fd_set structures of that size would have a noticeable performance impact!

int poll(struct pollfd *ufds, unsigned int nfds, int timeout);

struct pollfd {
    int fd;          /* file descriptor */
    short events;    /* requested events */
    short revents;   /* returned events */
};

The number of struct pollfd items passed to it is going to have an effect on performance, which is reasonable, since the amount of work the system call is doing is directly related to the number of items in that array. However, the value of the individual file descriptors has no effect. This is a major win for poll() over select().

The other advantage to poll() is that the largest file descriptor it can handle is limited by the size of an int, not by the size of another structure (such as an fd_set). This lets poll() work with Linux kernels that allow hundreds of thousands of file descriptors per process, which is extremely useful in some cases.

Pipes

Linux considers writing to a pipe with no readers a pretty serious matter, since data could get lost. Rather then let a process blithely ignore error codes from write() and go about its business, Linux sends the process a signal called SIGPIPE. Unless the process has specifically asked to handle the SIGPIPE signal, Linux kills the process. The Broken pipe message is from the shell, which is telling you that the process died because of a broken pipe.


Poll vs Select

  • poll( ) does not require that the user calculate the value of the highest- numbered file descriptor +1
  • poll( ) is more efficient for large-valued file descriptors. Imagine watching a single file descriptor with the value 900 via select()—the kernel would have to check each bit of each passed-in set, up to the 900th bit.
  • select( )’s file descriptor sets are statically sized.
  • With select( ), the file descriptor sets are reconstructed on return, so each subsequent call must reinitialize them. The poll( ) system call separates the input (events field) from the output (revents field), allowing the array to be reused without change.
  • The timeout parameter to select( ) is undefined on return. Portable code needs to reinitialize it. This is not an issue with pselect( )
  • select( ) is more portable, as some Unix systems do not support poll( )

Epoll vs Select/Poll

  • We can add and remove file descriptor while waiting
  • epoll_wait returns only the objects with ready file descriptors
  • epoll has better performance – O(1) instead of O(n)
  • epoll can behave as level triggered or edge triggered (see man page)
  • epoll is Linux specific so non portable

options

MSG_WAITALL


Trouble-shooting

Transport endpoint is already connected

Bad file descriptor