Build kernel with 64KB or 4KB page - AmpereComputing/ampere-lts-kernel---DEPRECATED GitHub Wiki

WIP:

Suggest to use:

  1. 4KB page + ARM64_PA_BITS_48 + ARM64_VA_BITS_48
  2. 64KB page + ARM64_PA_BITS_48 + ARM64_VA_BITS_48
  3. 64KB page + ARM64_PA_BITS_52 + ARM64_VA_BITS_52

Cannot boot on Altra:

  1. 4KB page + ARM64_PA_BITS_39 + ARM64_VA_BITS_39

Reference:

  1. https://www.kernel.org/doc/html/latest/arm64/memory.html
  2. https://www.spinics.net/lists/arm-kernel/msg319552.html
  3. https://www.kernel.org/doc/html/latest/arm64/hugetlbpage.html

=================================================================

VA_BITS = (PAGE_SHIFT - 3) * levels + PAGE_SHIFT

With 16K pages and 3 levels we can cover 47 bits. So we'll eventually
have the following VA bits options:

39 if 4K (3 levels)
42 if 64K (2 levels)
47 if 16K (3 levels)
48 if 4K || 16K || 64K (4/4/3 levels depending on page size)

config PGTABLE_LEVELS
        int
        default 2 if ARM64_16K_PAGES && ARM64_VA_BITS_36
        default 2 if ARM64_64K_PAGES && ARM64_VA_BITS_42
        default 3 if ARM64_64K_PAGES && (ARM64_VA_BITS_48 || ARM64_VA_BITS_52)
        default 3 if ARM64_4K_PAGES && ARM64_VA_BITS_39
        default 3 if ARM64_16K_PAGES && ARM64_VA_BITS_47
        default 4 if !ARM64_64K_PAGES && ARM64_VA_BITS_48

But with 5.4 kernel, sometimes 4KB page with VA_BITS_48 cannot boot on Altra:

[    0.000000] Booting Linux on physical CPU 0x0000120000 [0x413fd0c1]
[    0.000000] Linux version 5.4.93+ (adam@adam_mj_cent83) (gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)) #1 SMP Wed May 19 14:57:36 CST 2021
[    0.000000] earlycon: pl11 at MMIO32 0x0000100002600000 (options '')
[    0.000000] printk: bootconsole [pl11] enabled
[    0.000000] efi: Getting EFI parameters from FDT:
[    0.000000] efi: EFI v2.70 by American Megatrends
[    0.000000] efi:  ACPI 2.0=0xb0190000  TPMFinalLog=0xb01d0000  SMBIOS 3.0=0xb3f7ff98  MEMATTR=0xa7121018  ESRT=0xae86a318  RNG=0xb79cb318  MEMRESERVE=0xa7125e18
[    0.000000] efi: seeding entropy pool
[    0.000000] esrt: Reserving ESRT space from 0x00000000ae86a318 to 0x00000000ae86a378.
[    0.000000] ------------[ cut here ]------------
[    0.000000] initrd not fully accessible via the linear mapping -- please check your bootloader ...
[    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:382 arm64_memblock_init+0x1c0/0x420
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.93+ #1
[    0.000000] pstate: 60000089 (nZCv daIf -PAN -UAO)
[    0.000000] pc : arm64_memblock_init+0x1c0/0x420
[    0.000000] lr : arm64_memblock_init+0x1c0/0x420
[    0.000000] sp : ffffffe2dc9d3e50
[    0.000000] x29: ffffffe2dc9d3e50 x28: ffffffe2dc6be000
[    0.000000] x27: 0000000001db0000 x26: ffffffe2dc546000
[    0.000000] x25: ffffffe2dc546080 x24: 0000000001bfd000
[    0.000000] x23: ffffffe2db800000 x22: 00000000a30f4000
[    0.000000] x21: ffffffe2dc546000 x20: 00000000a4ea4000
[    0.000000] x19: 0000004000000000 x18: 0000000000000010
[    0.000000] x17: 0000000000000007 x16: 000000000000000e
[    0.000000] x15: ffffffffffffffff x14: 79206b6365686320
[    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
[    0.000000] x11: 6d207261656e696c x10: 2065687420616976
[    0.000000] x9 : 20656c6269737365 x8 : ffffffe2dbdfb1c0
[    0.000000] x7 : 000000000000000b x6 : ffffffe2dcbe7b95
[    0.000000] x5 : 0000000000000001 x4 : 0000000000000000
[    0.000000] x3 : 0000000000000000 x2 : 00000000ffffffff
[    0.000000] x1 : 0000000000000000 x0 : 0000000000000000
[    0.000000] Call trace:
[    0.000000]  arm64_memblock_init+0x1c0/0x420
[    0.000000]  setup_arch+0x26c/0x644
[    0.000000]  start_kernel+0x8c/0x4fc
[    0.000000] random: get_random_bytes called from print_oops_end_marker+0x30/0x58 with crng_init=0
[    0.000000] ---[ end trace 0cb965cc14bdf8c4 ]---
[    0.000000] crashkernel: memory value expected
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000B0190000 000024 (v02 Ampere)
[    0.000000] ACPI: XSDT 0x00000000B0180000 0000E4 (v01 Ampere Altra    00000000 AMI  01000013)
[    0.000000] ACPI: FACP 0x00000000B0160000 000114 (v06 Ampere Altra    00000000 INTL 20190509)
[    0.000000] ACPI: DSDT 0x00000000B00D0000 02EE02 (v02 Ampere Jade     00000001 INTL 20200717)
[    0.000000] ACPI: DBG2 0x00000000B0170000 00005C (v00 Ampere Altra    00000000 INTL 20190509)
[    0.000000] ACPI: GTDT 0x00000000B0150000 000110 (v03 Ampere Altra    00000000 INTL 20190509)
[    0.000000] ACPI: SSDT 0x00000000B0140000 00002D (v02 Ampere Altra    00000001 INTL 20190509)
[    0.000000] ACPI: BERT 0x00000000B0130000 000030 (v01 Ampere Altra    00000001 INTL 20200717)
[    0.000000] ACPI: EINJ 0x00000000B0120000 000150 (v01 Ampere Altra    00000001 INTL 20200717)
[    0.000000] ACPI: HEST 0x00000000B0110000 000308 (v01 Ampere Altra    00000001 INTL 20200717)
[    0.000000] ACPI: SDEI 0x00000000B0100000 000024 (v01 Ampere Altra    00000001 INTL 20200717)
[    0.000000] ACPI: SPMI 0x00000000B00C0000 000041 (v05 ALASKA A M I    00000000 AMI. 00000000)
[    0.000000] ACPI: SPMI 0x00000000B00B0000 000041 (v05 ALASKA A M I    00000000 AMI. 00000000)
[    0.000000] ACPI: SPMI 0x00000000B00A0000 000041 (v05 ALASKA A M I    00000000 AMI. 00000000)
[    0.000000] ACPI: FIDT 0x00000000B0090000 00009C (v01 ALASKA A M I    01072009 AMI  00010013)
[    0.000000] ACPI: SPCR 0x00000000B0080000 000050 (v02 A M I  APTIO V  01072009 AMI. 0005000F)
[    0.000000] ACPI: BGRT 0x00000000B0070000 000038 (v01 ALASKA A M I    01072009 AMI  00010013)
[    0.000000] ACPI: TPM2 0x00000000B0060000 000064 (v04 ALASKA A M I    00000001 AMI  00000000)
[    0.000000] ACPI: PPTT 0x00000000B0040000 006E60 (v02 Ampere Altra    00000000 AMP. 01000013)
[    0.000000] ACPI: SLIT 0x00000000B0030000 000030 (v01 Ampere Altra    00000000 AMP. 01000013)
[    0.000000] ACPI: SRAT 0x00000000B0020000 000CF0 (v03 Ampere Altra    00000000 AMP. 01000013)
[    0.000000] ACPI: MCFG 0x00000000B0010000 0000EC (v01 Ampere Altra    00000001 AMP. 01000013)
[    0.000000] ACPI: IORT 0x00000000B0000000 000900 (v00 Ampere Altra    00000000 AMP. 01000013)
[    0.000000] ACPI: APIC 0x00000000B0050000 003354 (v05 Ampere Altra    00000003 AMI  01000013)
[    0.000000] ACPI: PCCT 0x00000000AFFF0000 000ABC (v02 Ampere Altra    00000003 AMP. 01000013)
[    0.000000] ACPI: WSMT 0x00000000AFFE0000 000028 (v01 ALASKA A M I    01072009 AMI  00010013)
[    0.000000] ACPI: FPDT 0x00000000AFFD0000 000044 (v01 ALASKA A M I    01072009 AMI  01000013)
[    0.000000] ACPI: SPCR: console: pl011,mmio32,0x100002600000,115200
[    0.000000] Unable to handle kernel paging request at virtual address 0000078a31ffc000
[    0.000000] Mem abort info:
[    0.000000]   ESR = 0x96000044
[    0.000000]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.000000]   SET = 0, FnV = 0
[    0.000000]   EA = 0, S1PTW = 0
[    0.000000] Data abort info:
[    0.000000]   ISV = 0, ISS = 0x00000044
[    0.000000]   CM = 0, WnR = 1
[    0.000000] [0000078a31ffc000] address between user and kernel address ranges
[    0.000000] Internal error: Oops: 96000044 [#1] SMP
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W         5.4.93+ #1
[    0.000000] pstate: 20000089 (nzCv daIf -PAN -UAO)
[    0.000000] pc : numa_init+0x108/0x3d4
[    0.000000] lr : numa_init+0xbc/0x3d4
[    0.000000] sp : ffffffe2dc9d3de0
[    0.000000] x29: ffffffe2dc9d3de0 x28: 0000080ab0df9340
[    0.000000] x27: 0000000000000000 x26: ffffffe2dc9e3000
[    0.000000] x25: ffffffe2dc6c3bb8 x24: ffffffe2dc6266c8
[    0.000000] x23: 0000078a31ffc000 x22: ffffffe2dc546000
[    0.000000] x21: ffffffe2dc9d9908 x20: ffffffe2dc6c3000
[    0.000000] x19: ffffffe2dc9e3000 x18: 0000000000000010
[    0.000000] x17: 0000000000000007 x16: 000000000000000e
[    0.000000] x15: ffffffffffffffff x14: ffffffffffffffff
[    0.000000] x13: 0000000000000028 x12: 0000000000000028
[    0.000000] x11: 0101010101010101 x10: 0000080ab1ffd000
[    0.000000] x9 : 0000000000001000 x8 : 00000000000002a5
[    0.000000] x7 : 000000000000000a x6 : 000000000000000a
[    0.000000] x5 : 0000000000000014 x4 : 0000000000000000
[    0.000000] x3 : 0000000000000000 x2 : 0000000000000040
[    0.000000] x1 : 0000000000000000 x0 : 0000000000000001
[    0.000000] Call trace:
[    0.000000]  numa_init+0x108/0x3d4
[    0.000000]  arm64_numa_init+0x5c/0x78
[    0.000000]  bootmem_init+0x60/0xd8
[    0.000000]  setup_arch+0x294/0x644
[    0.000000]  start_kernel+0x8c/0x4fc
[    0.000000] Code: 6b00003f 1a8500c7 11000400 6b00005f (3824cae7)
[    0.000000] ---[ end trace 0cb965cc14bdf8c5 ]---
[    0.000000] Kernel panic - not syncing: Fatal exception
[    0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]---

This message indicates bootloader (grub) does not position kernel and initrd properly. Since it happens randomly, just log here.

/*
		 * We can only add back the initrd memory if we don't end up
		 * with more memory than we can address via the linear mapping.
		 * It is up to the bootloader to position the kernel and the
		 * initrd reasonably close to each other (i.e., within 32 GB of
		 * each other) so that all granule/#levels combinations can
		 * always access both.
		 */
		if (WARN(base < memblock_start_of_DRAM() ||
			 base + size > memblock_start_of_DRAM() +
				       linear_region_size,
			"initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {
			phys_initrd_size = 0;
		} else {
			memblock_remove(base, size); /* clear MEMBLOCK_ flags */
			memblock_add(base, size);
			memblock_reserve(base, size);
		}