AGDI: Arm Generic Diagnostic Dump and Reset - AmpereComputing/ampere-lts-kernel---DEPRECATED GitHub Wiki

Some use-cases, such as system management, require the ability to generate a non-maskable event to the OS to request the OS kernel to perform a diagnostic dump and reset the system.

Upstream Patches

Upstream 5.18 kernel added support for AGDI:

  1. https://github.com/torvalds/linux/commit/e86801b0ff1c5c6d1f78232f7e3b52c0b0631560, "ACPI: tables: Add AGDI to the list of known table signatures"
  2. https://github.com/torvalds/linux/commit/a2a591fb76e6f5461dfd04715b69c317e50c43a5, "ACPI: AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device"
  3. Optional patch: https://github.com/torvalds/linux/commit/5579649e7eb756a4e3d5784b6958374e5bfc41de, "ACPICA: iASL: Add suppport for AGDI table"

Beside above two patches, another dependent patch: https://github.com/torvalds/linux/commit/dc4e8c07e9e2f69387579c49caca26ba239f7270 - Since AGDI driver depends on SDEI driver, sdei_init() need to be called before agdi_init().

These patches has been backported to 5.15, 5.10 and 5.4 kernel.

Test on Altra system

  1. Enable AGDI in BIOS: Chipset->Enable AGDI
  2. Enable AGDI in kernel: CONFIG_ACPI_AGDI=y
  3. When booting kernel, there is ACPI AGDI info:
ACPI: AGDI 0x00000000AEBB0000 000030 (v01 Ampere Altra    00000001 INTL 20200717)
  1. Issue IPMI command to trigger dump:

in-band:

ipmitool raw 0x3c 0x16

OOB:

ipmitool -I lanplus -H <hostip> raw 0x3c 0x16

This will trigger kernel panic. If kdump is properly configured (refer to: https://github.com/AmpereComputing/ampere-lts-kernel/wiki/kdump-on-arm64), kdump will be triggered automatically.

kernel panic if kdump not configured

[  817.412987] Kernel panic - not syncing: Arm Generic Diagnostic Dump and Reset SDEI event issued
[  817.413001] CPU: 36 PID: 0 Comm: swapper/36 Not tainted 5.17.0+ #3
[  817.413009] Hardware name: WIWYNN Mt.Jade Server System B81.030Z1.0007/Mt.Jade Motherboard, BIOS 0.00.100 (SCP: 0.11.20220331) 2022/03/31
[  817.413014] Call trace:
[  817.413017]  dump_backtrace+0x100/0x128
[  817.413036]  show_stack+0x20/0x68
[  817.413041]  dump_stack_lvl+0x68/0x84
[  817.413053]  dump_stack+0x18/0x34
[  817.413057]  panic+0x12c/0x324
[  817.413063]  nmi_panic+0x88/0xa8
[  817.413069]  agdi_sdei_handler+0x24/0x38
[  817.413078]  sdei_event_handler+0x28/0x90
[  817.413087]  do_sdei_event+0x94/0x148
[  817.413093]  __sdei_handler+0x68/0xe8
[  817.413099]  __sdei_asm_handler+0xbc/0x180
[  817.413104]  __arm_smccc_smc+0x14/0x40
[  817.413110]  __invoke_psci_fn_smc+0x48/0x78
[  817.413117]  psci_0_2_cpu_suspend+0x34/0x60
[  817.413123]  psci_cpu_suspend_enter+0x88/0xe0
[  817.413126]  acpi_processor_ffh_lpi_enter+0x44/0x88
[  817.413133]  acpi_idle_lpi_enter+0x50/0x68
[  817.413139]  cpuidle_enter_state+0x180/0x370
[  817.413146]  cpuidle_enter+0x40/0x58
[  817.413148]  call_cpuidle+0x38/0x48
[  817.413153]  do_idle+0x228/0x288
[  817.413158]  cpu_startup_entry+0x2c/0x30
[  817.413160]  secondary_start_kernel+0x1c4/0x1d8
[  817.413162]  __secondary_switched+0xa0/0xa4
[  817.413169] SMP: stopping secondary CPUs
[  817.413899] Kernel Offset: 0x4f29ba160000 from 0xffff800008000000
[  817.413902] PHYS_OFFSET: 0xfff1000080000000
[  817.413903] CPU features: 0x000,00081c0d,19805c82
[  817.413906] Memory Limit: none

Internals

  1. The ACPI for Arm Components Spec added AGDI table. In there table, there are two fields:
SDEI Event number: SDEI Event number of the event generated by the device. This field is valid only if the signaling mode is set to 0b.
GSIV: The GSIV of the interrupt that is generated by the device. This field is valid only if the signaling mode is set to 1b.

Currently GSIV (interrupt signaling mode) is not supported. The AGDI drive reads SDEI event number from ACPI table, and registers handler for the SDEI event:

	status = acpi_get_table(ACPI_SIG_AGDI, 0,
				(struct acpi_table_header **) &agdi_table);
	if (ACPI_FAILURE(status))
		return;

	if (agdi_table->flags & ACPI_AGDI_SIGNALING_MODE) {
		pr_warn("Interrupt signaling is not supported");
		goto err_put_table;
	}

	pdata.sdei_event = agdi_table->sdei_event;

... ...

       sdei_event_register(adata->sdei_event, agdi_sdei_handler, pdev);

In the SDEI event handler, simply calls nmi_panic().

static int agdi_sdei_handler(u32 sdei_event, struct pt_regs *regs, void *arg)
{
	nmi_panic(regs, "Arm Generic Diagnostic Dump and Reset SDEI event issued");
	return 0;
}
⚠️ **GitHub.com Fallback** ⚠️