Debugging Methods Used and Errors Faced - gautamramk/FQ-PIE-for-Linux-Kernel GitHub Wiki

Debugging Methods Used and Errors Faced

The major error faced in our project was a segmentation fault error. This was caused due to poor initialization of the skb pointer in the dequeue function. The skb pointer had a garbage value and when the 'qdisc_pkt_len' function when invoked on that pointer caused the system to hang.

The initialization segment of the dequeue function is as follows.

struct fq_pie_sched_data *q = qdisc_priv(sch);
struct sk_buff *skb = NULL;
struct fq_pie_flow *flow;
struct list_head *head;
u32 uninitialized_var(pkt_len);

It was initially

struct fq_pie_sched_data *q = qdisc_priv(sch);
struct sk_buff *skb;
struct fq_pie_flow *flow;
struct list_head *head;
u32 uninitialized_var(pkt_len);

Testing Tried

Standard printk statements

We added printk statements between critical lines in the code. Using this, we were able to figure out that the error was in the dequeue function.

kgdb and kdb

This method failed to work. We tried to debug the kernel through kgdb, but kgdb requires the use of a serial port to connect the target and host machines. Setting up kgdboe (kgdb over ethernet) would have been an extremely time consuming process.

Lockdep

We initially thought that the error was caused due to a deadlock arising from locking the qdisc in the timer function. We tried to enable lockdep in the kernel and recompiled it. This failed as the system threw 'kernel panic error' while booting with lockdep enabled. We could not go ahead with this.

Changing the location of return statement in the dequeue function.

We constantly moved the return statment in the dequeue function. We intially put return NULL at the top of dequeue, and kept bringing it down till we could find a system hang. From this, we could properly isolate the area of the error.