Jetson TX1 TK1 - yszheda/wiki GitHub Wiki






Architecture

CUDA

Caffe

version

cat /etc/nv_tegra_release

Performance

TX1

sudo su
echo userspace > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo userspace > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo userspace > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
echo userspace > /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
cat /sys/devices/system/cpu/cpu2/cpufreq/scaling_max_freq > /sys/devices/system/cpu/cpu2/cpufreq/scaling_min_freq
cat /sys/devices/system/cpu/cpu3/cpufreq/scaling_max_freq > /sys/devices/system/cpu/cpu3/cpufreq/scaling_min_freq
echo 0 > /sys/devices/system/cpu/cpuquiet/tegra_cpuquiet/enable
for file in /sys/devices/system/cpu/cpu*/online; do
 if [ `cat $file` -eq 0 ]; then
 echo 1 > $file
 fi
done
echo runnable > /sys/devices/system/cpu/cpuquiet/current_governor
cat /sys/kernel/debug/clock/gpu_dvfs_t
cat /sys/kernel/debug/clock/dvfs_table
cat /sys/kernel/debug/clock/gbus/max >
/sys/kernel/debug/clock/override.gbus/rate
echo 1 > /sys/kernel/debug/clock/override.gbus/state

TK1

# Maximizing CPU performance
echo 0 > /sys/devices/system/cpu/cpuquiet/tegra_cpuquiet/enable
echo 1 > /sys/devices/system/cpu/cpu0/online
echo 1 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu3/online
echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

# Controlling GPU performance
echo 852000000 > /sys/kernel/debug/clock/override.gbus/rate
echo 1 > /sys/kernel/debug/clock/override.gbus/state

Development

zero-copy memory

Jetson TK1 supports the complete CUDA Toolkit version 6.0. Tegra K1 supports Unified Memory, however in contrast to current desktop / server GPUs, the memory on Tegra is physically unified. However, there are separate GPU and CPU caches. This just means that you need to use the cudaMallocManaged API to allocate memory on Tegra K1, just like you do on Tesla and GeForce; you have the same programming model across all GPUs.

On Tegra, GPU and CPU allocate memory from the same hardware. The main difference is in sync and cache handling.

Sync:

  • Unified: auto-sync via GPU driver
  • Zero-copy: pinned memory, but may have slow access on some location.

Cache:

  • Unified: YES
  • Zero-copy: NO

We recommend Jetson user to use unified memory, and more information can be found here: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-introduction

Regarding the article http://arrayfire.com/zero-copy-on-tegra-k1/ from 2014 stating that zero-copy is faster than cudaMalloc, this article is mis-leading and generalizes the zero-copy case. This is not really accurate.

Zero copy is only faster in some cases where the access pattern does not benefit from caches.

Zero-Copy memory on Tegra is CPU and GPU uncached. So every access by the CUDA kernel goes to DRAM. So if the kernel repeatedly accesses the same memory location from then it is likely that the cudaMalloc memory is faster.

cudaHostRegister() is not supported on ARM platforms. This is because the caching attribute of an existing allocation can't be changed on the fly.

If required, please use cudaHostAlloc() with the flag cudaHostAllocMapped to allocate device-mapped host-accessible memory.

Unified Memory

From https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements

K.1.1. System Requirements Unified Memory has two basic requirements:

  • a GPU with SM architecture 3.0 or higher (Kepler class or newer)
  • a 64-bit host application and non-embedded operating system (Linux, Windows, macOS)

OpenCV

Trouble shooting

CUDA driver version is insufficient for CUDA runtime version

monitor

sudo ~/tegrastats
sudo ~/jetson_clocks.sh --show

status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR

需用root执行

TK1n安装libopencv4tegra

Depends: libavcodec54 (>= 6:9.1-1) but it is not installable or
libavcodec-extra-54 (>= 6:9.16) but it is not installable
Depends: libavformat54 (>= 6:9.1-1) but it is not installable
Depends: libavutil52 (>= 6:9.1-1) but it is not installable
Depends: libswscale2 (>= 6:9.1-1) but it is not installable
E: Unable to correct problems, you have held broken packages.
sudo apt-add-repository universe