ZZZArchived: L4T Integration Issues - OE4T/meta-tegra GitHub Wiki

Some notes on the issues with adapting L4T/JetPack for use with OE/Yocto Project builds and use of the BSP for creating and maintaining a product based on the Jetson hardware.

Note that this is a bit of a laundry list, may be too detailed, and may not cover everything.

Release engineering issues

There have been improvements in this area, with more consistency between releases recently. Hopefully this will continue to improve.

Why separate L4T packages for t210 and t186/t194? This is an ongoing maintenance headache. One package for all supported hardware would be helpful.
- Discussed in May 2021 meeting. This was a historical issue related to a different build setup for t210, won't be the case for future product.
L4T releases supporting only a subset of available hardware.
- Discussed in May 2021 meeting. This was an occasional issue in the 28.x releases, should not be a problem for future product. Future releases will handle with developer preview for new hardware. Production releases will be for all supported product going forward.
General upgrading issues - apparently no consideration given to products needing to upgrade from one version of L4T to another without manual re-flashing via USB. No documentation on interdependencies between various bootloader/binary blob components, the kernel, and userland libraries, making planning for upgrades difficult. Custom partitioning is difficult to maintain across releases with flash.xml layouts which change in non compatible ways.
- Some changes are coming to help with legacy upgrades in L4T r32.6 release per the March 2021 meeting notes.
- See also the mechanism detailed in https://github.com/OE4T/meta-tegra/wiki/Over-the-air-reflashing-process and discussed in May 2021 meeting.
- Additional changes discussed in May 2021 meeting: Making packages more finegrained, documenting dependencies between packages. Partition changes will be minimized going forward. If needed will be made in backward compatible way. Partition changes will be discussed with the OE4T community before implementing in the future.
Spotty releases of cboot sources, particularly for t186.
- CBoot source releases will be provided for all releases going forward, as discussed in OE4T meetings.
URL/path/filename/version numbering changes "under the hood" from release to release. Relatively minor, and probably doesn't affect their primary customers (SDK Manager users).
Using identical package names with different content - in particular, tensorRT packages in JetPack are different between DLA (Xavier) and non-DLA platforms, but the packages are named identically.
- See discussion in the May 2021 meeting - NVIDIA will avoid this issue going forward.
Better separation between host-side and target-side files in L4T releases would be helpful. Breaking the software components down further and releasing them individually, instead of big-bang releases, would help with getting fixes out in a more timely fashion, too.
It would be ideal to have support for a more standardized way of partitioning through WIC or a similar tool, see discussion here
~~Download links for package content to remove the requirement to use the SDK manager. This is addressed for everything but the host side CUDA tools as discussed in the March 2021 Meeeting.~~ Resolved with https://github.com/OE4T/meta-tegra/pull/677
Unclear plan for container support across glibc version upgrades and pre-built containers. See https://github.com/OE4T/meta-tegra/issues/857
Challenges related to gstreamer version support beyond 1.14, unclear plan for NVIDIA support of future gstreamer versions. See https://github.com/OE4T/meta-tegra/pull/864

Tooling issues

The tools are quite difficult to use outside of the L4T directory tree. The shell script (flash.sh) -> python (tegraflash.py) -> binary tools flow isn't inherently bad, but the python layer is poorly implemented, the binary tools have quirks/bugs that are difficult to debug (and have no documentation), and the shell scripts have several hacks to deal with differences between chips/modules, spread all over. Features like multi-target BUP generation appear to be an afterthought and involve more layers of scripts that repeat a lot of operations, slowing down execution substantially (particularly with code signing) for every new hardware variant introduced.

Discussed in the May 2021 meeting. NVIDIA will work to document the flow of the script, tegraflash team will plan to refactor scripts.

The secure boot tools don't appear to get fully tested, particularly for burning fuses. Even code signing, however, has been a problem, with users having to contribute fixes to the scripts to support, e.g., PKC+SBK BUP generation. Documentation covers hardware features that aren't necessarily supported in L4T, could be clearer about that.

Discussed in the May 2021 meeting. New flag supports testing the process without burning fuses. Looking into future hardware changes to support fuse emulation for easier testing. Working on improving QA process regarding secureboot testing.

Hardware variants

This is a difficult problem, but more documentation on what the variants are and which components need to be changed for them would help. Variants are handled differently for each of the SoCs/modules - kernel device tree, pinmux configurations, BPMP/PMIC/mem controller configurations, etc. Unclear which module/chip revisions customers need to worry about (vs. pre-release internal-only revisions), what "board ID", "FAB (board version)", "SKU", "chip revision", and "board revision" differences entail. Separation between module and carrier not as clean as it could be in kernel configuration. Each variant drives up the BUP package size.

Build workflow baked into the tooling and BSP assumes you know the target hardware revision before constructing the rootfs. The flash.sh script adjusts files in the rootfs image based on the hardware revision (in particular, the nv_boot_control.conf file). This makes automating builds and imaging in Manufacturing more difficult. This has been worked around for meta-tegra.

MIPI camera support is also messy - the interdependencies between the drivers, the device tree sections, the overrides configuration files, and the libnvodm_imager library have been problematic even when working with an approved partner. Changes between L4T versions have required expensive rework and retesting, with no warning.

Kernel and general open-source support

Kernel releases stay stuck at a particular LTS point release for too long - 4.9.140 being the latest.
The multi-repo kernel sources and "overlay" Kbuild changes are a nuisance.
- Discussed in the May 2021 meeting. Several NVIDIA teams share the same kernel tree. NVIDIA will have internal discussions about better ways to manage this going forward.
The Android patches make it more difficult to stay up-to-date with security patches and can cause issues when compiling drivers.
Stop doing Coverity runs and "fixing" issues found in static analysis in the upstream kernel code - stick to just your downstream code. Ran across several bad fixes or conflicts with upstream fixes.
- Discussed in the May 2021 meeting
The switch to libglvnd has been helpful for GL/EGL support, and the libdrm shim is better than it was, but there are still many userland binary blobs and little-to-no information on which blobs are required for what functionality, and which can be replaced with either open source or with NVIDIA-released sources.
- Discussed in the May 2021 meeting. NVIDIA requests additional information on this one.
As mentioned above, cboot source releases, particularly for TX2, have been spotty. Nice to finally see a LICENSE file accompanying the sources, though. Cboot source releases will be continued going forward per March 2021 meeting notes.
Would be nice to see some of the system-level support code released as source (e.g., power model/hinting, bootloader update engine).
Would be nice to open source python bindings for Deepstream. To be addressed based on March 2021 meeting notes.
Using publicly shared git repositories as a part of open source software release flow would make it much easier to track updates to packages, see reasons for updates, and integrate/submit upstream patches. See this discussion.
- Discussed in the May 2021 meeting. NVIDIA has ideas about how to manage contributions and how to support this request going forward. Will plan to incorporate this in future releases, especially Jetpack 5.x.