NAPI - awokezhou/LinuxPage GitHub Wiki

NAPI

NAPI(New API)是设备驱动中包处理框架的一个扩展，用以提高高速网路的性能。

中断减缓

高吞吐量的网络每秒钟可能产生数千个中断，通知系统接收到一帧数据，而这种信息对于系统来说是重复的。NAPI允许驱动在一个高流量的时间段禁中断运行，减小系统负荷

节流

当系统必须丢弃报文的时候，最好在处理这些数据包之前就丢弃。NAPI能够使得这些报文在网络适配器处理中就被丢弃，而不是让内核看到。

新开发的驱动最好使用NAPI，但是NAPI不会破坏系统的后向兼容，在中断上下文中直接做报文处理而不使用NAPI也不会有什么问题。

NAPI驱动设计

NAPI初始化

首先是在初始化部分，要创建一个struct napi_struct实例，这个实例一般是嵌入到设备私有结构体中。struct napi_struct结构如下

/*
 * Structure for NAPI scheduling similar to tasklet but with weighting
 */
struct napi_struct {
	/* The poll_list must only be managed by the entity which
	 * changes the state of the NAPI_STATE_SCHED bit.  This means
	 * whoever atomically sets that bit can add this napi_struct
	 * to the per-cpu poll_list, and whoever clears that bit
	 * can remove from the list right before clearing the bit.
	 */
	struct list_head	poll_list;

	unsigned long		state;
	int			weight;
	int			(*poll)(struct napi_struct *, int);
#ifdef CONFIG_NETPOLL
	spinlock_t		poll_lock;
	int			poll_owner;
#endif

	unsigned int		gro_count;

	struct net_device	*dev;
	struct list_head	dev_list;
	struct sk_buff		*gro_list;
	struct sk_buff		*skb;
};

weight是一个权重属性，指定一次收包接收多少帧数据，一般千兆网设置的是64，更低速网络设置值应该比它更小。poll是一个函数指针，当NAPI调度完成时，会调用该函数，一般的做法是进行收包，然后传递到协议栈。

napi_struct必须使用netif_napi_add()函数初始化和注册，使用netif_napi_del()函数注销。

中断服务例程

下一步操作就是要对中断处理进行一些变动。如果因为接收到一个完整的帧而触发了中断服务例程，这一帧数据不应该马上立即被处理。驱动首先要做的是关硬件中断，然后告诉网络子系统在一个时间段内轮询所有到来的数据包，完成该操作通过调用以下函数实现

void napi_schedule(struct napi_struct *napi);

在某些驱动中你也许会看到如下写法

if (napi_schedule_prep(napi))
       __napi_schedule(napi);

它们其实是一样的。napi_schedule_prep()函数返回0表明网络子系统正在进行一个轮询调度，那么不能再接收任何的中断。

轮询调度

__napi_schedule(napi)会调用软中断，这个软中断在网络子系统初始化的时候就早已经创建

open_softirq(NET_RX_SOFTIRQ, net_rx_action);

net_rx_action()函数的执行会调用到netif_napi_add()注册的轮询处理函数。轮询处理函数不断的进行DMA操作，从dma缓冲区获取报文，传递到协议栈，这里有一点需要注意，不能够使用netif_rx()将报文传递到协议栈，而要使用

 int netif_receive_skb(struct sk_buff *skb);

轮询处理函数持续的接收数据，直到收报数达到weight的限制(收报数必须小于等于限制数)，然后调用以下函数关闭NAPI

 void napi_complete(struct napi_struct *napi);

然后使能硬件中断。

硬件需求

NAPI要求一下硬件特性支持

DMA ring或者充足的RAM以支持存储数据包
能够禁止中断，能够将数据包发送到协议栈

竞态和并发

任意时刻，只有一个CPU可以调用napi→poll()；因为只有一个CPU可以响应最初的中断和处理轮询调度。
因为核心层调用设备进行发送是循环形式，所以接收过程必须是无锁的。
竞争的结果是另外一个CPU获得rx ring的处理权限。这种情况只会发生在close()和suspend()中，驱动开发人员不需要担心这个问题，上层设计会考虑到同步问题。

API

netif_napi_add(dev, napi, poll, weight)
Initialises and registers napi structure for polling dev

netif_napi_del(napi)
Unregisters napi structure; must be called after the associated device is unregistered. free_netdev(dev) will call

netif_napi_del() 
for all napi_structs still associated with the net device, so it may not be necessary for the driver to call this directly.

napi_schedule(napi) 
Called by an IRQ handler to schedule a poll for napi

napi_schedule_prep(napi) 
puts napi in a state ready to be added to the CPU polling list if it is up and running. You can look at this as the first half of napi_schedule(napi).

__napi_schedule(napi)
Add napi to the poll list for this CPU; assuming that napi_schedule_prep(napi) has already been called and returned 1

napi_reschedule(napi) 
Called to reschedule polling for napi specifically for some deficient hardware.

napi_complete(napi) 
Remove napi from the CPU poll list: it must be in the poll list on current cpu. This primitive is called by napi->poll(), when it completes its work. The structure cannot be out of poll list at this call, if it is then clearly it is a BUG().

__napi_complete(napi)
same as napi_complete but called when local interrupts are already disabled.

napi_disable(napi)
Temporarily disables napi structure from being polled. May sleep if it is currently being polled

napi_enable(napi)
Reenables napi structure for polling, after it was disabled using napi_disable()

性能测试

Psize	Ipps	Tput	Rxint	Txint	Done	Ndone
60	890000	409362	17	27622	7	6823
128	758150	464364	21	9301	10	7738
256	445632	774646	42	15507	21	12906
512	232666	994445	241292	19147	241192	1062
1024	119061	1000003	872519	19258	872511	0
1440	85193	1000003	946576	19505	946569	0

Ipps：每秒输入的数据包数

Tput:以M为单位输出

Txint:发送完成中断

Done: poll将数据包从rx ring中接收的次数

Ndone:Done的相反情况，没能在poll中完成接收测次数

当报大小为60时，仅产生了17个接收中断，系统不能在1个中断中处理这种负载水平的包。当负载稍微降低时，接收中断数量上升。

参考文献

networking:napi