zynq的ps_dma访问axi_full_ip的波形研究与速度研究 - minichao9901/TangNano-20k-Zynq-7020 GitHub Wiki

硬件设计

image

程序关键部分(注意关闭Dcache)

#define DMA_LENGTH	256	/* Length of the Dma Transfers */
int *Src=(int *)0x10000000;
int *Dst=(int *)0x44000000;

Xil_DCacheDisable();

	DmaCmd.ChanCtrl.SrcBurstSize = 4;
	DmaCmd.ChanCtrl.SrcBurstLen = 4;
	DmaCmd.ChanCtrl.SrcInc = 1;
	DmaCmd.ChanCtrl.DstBurstSize = 4;
	DmaCmd.ChanCtrl.DstBurstLen = 4;
	DmaCmd.ChanCtrl.DstInc = 1;
	DmaCmd.BD.SrcAddr = (u32) Src;
	DmaCmd.BD.DstAddr = (u32) Dst;
	DmaCmd.BD.Length = DMA_LENGTH * sizeof(int);

XTime t1,t2;
int dt;
XTime_GetTime(&t1);
Status = XDmaPs_Start(DmaInst, Channel, &DmaCmd, 0);
if (Status != XST_SUCCESS) {
	return XST_FAILURE;
}
XTime_GetTime(&t2);
dt = (u32)(t2-t1) * (1000000.0/ COUNTS_PER_SECOND);
xil_printf("dt=%dus\r\n", dt);


XTime_GetTime(&t1);
for (Index = 0; Index < DMA_LENGTH; Index++) {
	Dst[Index] = Src[Index];
}
XTime_GetTime(&t2);
dt = (u32)(t2-t1) * (1000000.0/ COUNTS_PER_SECOND);
xil_printf("dt=%dus\r\n", dt);

XTime_GetTime(&t1);
memcpy(Dst, Src, DMA_LENGTH*sizeof(int));
XTime_GetTime(&t2);
dt = (u32)(t2-t1) * (1000000.0/ COUNTS_PER_SECOND);
xil_printf("dt=%dus\r\n", dt);

实验结果

ps_dma读写axi_full_ip

image image

用ILA的高级capture功能仔细测了一下,ps_dma访问bram还是有burst功能的。burst长度是4。每传输4个u32需要6个cycles。这样子传输256个u32需要384cycles,也就是3.84us。 它的AWPROTECT信号,有一个模式的切换过程。从Data sercure privileged切换到Data secure Unprivileged。

tu=6clk/4u32= 1.5 clk/u32
total= 256*tu= 384 clk
实际384clk,也就是3.84us
测量结果:12us,这说明dma准备工作的时间非常长。需要传输大数据量才划算。

image

memcpy拷贝

image image

tu=29clk/3u32= 9.667 clk/u32
total= 256*tu= 2474 clk
实际3594clk,也就是35.9us
测量结果:35us,吻合!

image

进一步研究:

如果从dd3->bram拷贝8192个u32,速度对比

image
可见dma比memcpy快15倍, memcpy比for循环快14倍

如果从dd3->dd3拷贝8192个u32,速度对比

image
可见dma比memcpy快3.8倍, memcpy比for循环快46倍

进一步研究:

修改burst len=16

	DmaCmd.ChanCtrl.SrcBurstSize = 4;
	DmaCmd.ChanCtrl.SrcBurstLen = 16;
	DmaCmd.ChanCtrl.SrcInc = 1;
	DmaCmd.ChanCtrl.DstBurstSize = 4;
	DmaCmd.ChanCtrl.DstBurstLen = 16;
	DmaCmd.ChanCtrl.DstInc = 1;
	DmaCmd.BD.SrcAddr = (u32) Src;
	DmaCmd.BD.DstAddr = (u32) Dst;
	DmaCmd.BD.Length = DMA_LENGTH * sizeof(int);

image

正确

修改burst len=64

	DmaCmd.ChanCtrl.SrcBurstSize = 4;
	DmaCmd.ChanCtrl.SrcBurstLen = 64;
	DmaCmd.ChanCtrl.SrcInc = 1;
	DmaCmd.ChanCtrl.DstBurstSize = 4;
	DmaCmd.ChanCtrl.DstBurstLen = 64;
	DmaCmd.ChanCtrl.DstInc = 1;
	DmaCmd.BD.SrcAddr = (u32) Src;
	DmaCmd.BD.DstAddr = (u32) Dst;
	DmaCmd.BD.Length = DMA_LENGTH * sizeof(int);

image

发现burst_len仍然是16,结果错误。

image

原因在这里,因为zynq7000的ps接口是AXI3,它的burst length最大是16.