The Tegra Architecture - OrangeOwlSolutions/General-CUDA-programming GitHub Wiki
The NVIDIA Tegra K1 mobile processor has been developed meet the demands of a host of next-generation PCs (e.g., clamshells) and new mobile applications (e.g., smartphones, tablets) as
- Realistic 3D gaming;
- Computational photography;
- Speedy web page rendering;
- High resolution displays and delivery of high quality HD video to multiple screens;
- Speedy web page rendering;
- Automotive navigation requiring 3D Google Earth rendering;
- Driver assist systems requiring object tracking and multiple video camera inputs ;
- Visual computing.
Responding to the growing performance needs of mobile platforms requires using advanced multicore CPUs in conjunction with latest architectures and powerful GPUs. Notice that GPUs for the mentioned applications must be highly power efficient to fit within mobile device power and thermal constraints.
Tegra K1 is available in two versions, one using a 32-bit
quad-core, 4-PLUS-1 ARM Cortex A15 CPU and one using a NVIDIA 64-bit
dual Super Core CPU, codenamed “Denver”, and based on the ARMv8 architecture, which brings the energy efficiency of ARM processors to 64-bit
computing.
The GPU in Tegra K1 is constructed from the same Kepler architecture which is used in higher-end systems, like desktop and laptop PCs, as well as in workstations and in some of the world’s fastest supercomputers. However, while Kepler GPUs in high-end systems include up to 2880
single-precision floating point CUDA cores and consume a few hundred Watts, the Kepler GPUs in Tegra K1 consist of 192
CUDA cores and consume less than two Watts. Nevertheless, Tegra K1 Kepler GPUs have more cores than many state-of-the-art GPUs of just a few years ago.
Kepler GPUs in Tegra K1 also include a number of optimizations for mobile system usage to save power. For example:
- large unified L2 cache that significantly decreases accesses to power hungry off-chip memory;
- low level optimizations to reduce both idle and dynamic power consumption; such optimizations identify blocks of the GPU core that are idle, and turn off both clock and voltages sources to reduce idle power consumption of these blocks;
- optimized routing of data paths and interconnects.