2024 MSI Prestige 13 AI Evo A2VM - lhl/linuxlaptops GitHub Wiki
2024 MSI Prestige 13 AI+ Evo A2VM
(NOTE: Github Wiki does not like the "+" symbol in the title)
- Global Product Page: https://www.msi.com/Business-Productivity/Prestige-13-AI-plus-Evo-A2VMX
- Global Support Page: https://www.msi.com/Business-Productivity/Prestige-13-AI-plus-Evo-A2VMX/support
- US Support Page: https://us.msi.com/Business-Productivity/Prestige-13-AI-plus-Evo-A2VM/support?sku_id=95206#driver
Ordered from: https://www.newegg.com/global/jp-en/p/N82E16834156671?Item=N82E16834156671 (Ships to Japan w/ free shipping, charged standard 10% JCT)
Hardware
For a full output of from hw-probe see: https://linux-hardware.org/?probe=9be89b2454
But, some abbreviated info on the A2VMG-014US SKU:
- Intel Core Ultra 7 258V (4P,4E)
- 32GB LPDDR5X-8533
- 1TB SSD (Micron 2400 2280 NVMe, PCIe Gen 4, QLC)
- 13.3" 2.8K (2880x1800), OLED (60Hz, 500 nits)
- 75Wh Battery (4-cell)
- 299 x 210 x 16.9 mm (WxDxH)
- 0.99kg
- I/O
- 2x Thunderbolt™ 4 (DisplayPort™/ Power Delivery 3.0)
- 1x Type-A USB3.2 Gen1
- 1x Micro SD Card Reader
- 1x HDMI™ 2.1 (8K @ 60Hz / 4K @ 120Hz)
- 1x Mic-in/Headphone-out Combo Jack
- Goodix Fingerprint Reader in Power Button
- 5MP IR webcam (30fps@1944p)
- Proximity Sensor
- Ambient Light Sensor
- Backlit Keyboard (White) with Copilot Key
- Intel® Killer™ BE Wi-Fi 7 + Bluetooth 5.4
- 2x 2W Speaker
Lunar Lake
This laptop uses Intel's latest and greatest SoC with a drastically different chip architecture built to fend off Qualcomm and AMD by providing drastically improved power efficiency and performance/watt for thin and lights (with high performance applications being pushed to a forthcoming Arrow Lake H processor). Here are the highlights:
- Intel Arc Graphics 140V (Xe2) iGPU - this is in the same class of performance as the AMD Radeon 890M on AMD's latest Strix Point (Ryzen AI 9) processors, and an Nvidia GTX 1650 Max-Q processor. Hard to find accurate details, but one source says it has an estimated 8 FP16 TFLOPS (vs 890M's 11.8).
- Intel NPU 4 - this claims to have up to "48 peak TOPS" (INT8, supports half-speed FP16). There is a linux-npu-driver (AUR package)
https://www.youtube.com/watch?v=ymoiWv9BF7Q
SSD Replacement
1TB will be a bit cramped (one of the main tasks for this laptop is for offloading drone video), so I wanted to replace the SSD almost immediately. I ended up going with a Predator SSD GM7 M.2 4TB as the 1TB version was incredibly power efficient. The Lexar NM790 4TB was similarly efficient with almost identical controller and memory, but the GM7 was a bit cheaper and could arrive a day earlier, so that sealed the deal. Here are its power modes reported by smartctl:
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.50W - - 0 0 0 0 0 0
1 + 5.80W - - 1 1 1 1 0 0
2 + 3.60W - - 2 2 2 2 0 0
3 - 0.0500W - - 3 3 3 3 5000 10000
4 - 0.0025W - - 4 4 4 4 8000 41000
CachyOS
This laptop of course came with Windows 11, and I ran a couple benchmark tests before getting rid of it.
- https://browser.geekbench.com/v6/cpu/compare/8055961
- SC: 2770
- MC: 10310 For this laptop, I'm giving CachyOS a try. It's a variant of Arch Linux that has a bunch of performance improvements.
CachyOS has both x86-64-v3 and x86-64-v4 repos available (zenver4 as well). AVX-512 is required for v4, so this Lunar Lake laptop probably doesn't support it:
❯ /lib/ld-linux-x86-64.so.2 --help | grep supported
x86-64-v3 (supported, searched)
x86-64-v2 (supported, searched)
Nope. Anyway, you will need to modify your pacman.conf to enable the x86-64-v3 repo.
Linux Compatibility
When I received my laptop (2024-09-30), a week after release, Lunar Lake is quite new and requires a bleeding edge kernel for support.
Drivers and FW
The default Arch Linux kernel when the laptop arrived was 6.11.1-arch1-1 and a fair number of things are broken under it, like the graphics and wifi drivers. These are fixed with the current linux-mainline (6.12rc1)
You will also need sof-firmware to get the sound working.
I'm using linux-firmware-git as well to troubleshoot.
I'm also using CachyOS's mesa-git
Webcam
The built-in webcam uses an OmniVision OV5675 sensor (ACPI HID OVTI5675) behind a TI TPS68470 PMIC, connected via the Intel IPU7. Getting it working on Linux required kernel patches for three things: IPU7 driver support, ipu-bridge sensor registration, and MSI-specific INT3472/TPS68470 board data (regulator mappings and GPIO configuration).
Current status: working with libcamera + pipewire-libcamera on a patched kernel (tested on 7.0.0-rc2-1-mainline-dirty). The camera shows up as "Built-in Front Camera" in Chrome and Firefox with no manual setup beyond enabling PipeWire camera support in each browser.
Full bring-up notes, ACPI reverse-engineering, patches, and usage guide: https://github.com/lhl/msi-prestige13-a2vm-webcam/
Upstream Status
IPU7 core driver: Merged in Linux 6.17 (staging, CONFIG_VIDEO_INTEL_IPU7). Provides base hardware support for Lunar Lake/Panther Lake IPU.
ipu-bridge OV5675 sensor config (Leif Skunberg): Merged into mainline (commit d6576b85d3fe), backported to stable 6.12+ via AUTOSEL. Adds OVTI5675 to ipu_supported_sensors[]. Originally submitted for the Lenovo ThinkPad X1 Fold 16 Gen 1.
MSI Prestige board data (Antti Laakso, Intel): A 5-patch series (v2, March 11 2026) adds the TPS68470 regulator/GPIO board data and I2C daisy-chain support for the Prestige 14 AI+ Evo. A follow-up patch (March 19 2026) extends the same board data to the Prestige 13 AI+ Evo A2VMG (MS-13Q3) and the Prestige 16 AI+ Evo B2VMG. The series received detailed review from Hans de Goede, Dan Scally, Bartosz Golaszewski, Sakari Ailus, and Ilpo Järvinen. As of May 2026, the board-data patches appear to still be under review — an issue report from a Prestige 14 user on kernel 6.18.4 (Fedora 43) still hit the missing board-data error. These patches will likely land in the 7.1 or later merge window.
What this means practically: On a stock kernel ≤7.0, the webcam won't work out of the box because the MSI-specific TPS68470 board data is missing. You'll need either Antti's patches or the local bring-up patches from the webcam repo until the board data is upstreamed.
Related
- Upstream issue tracking IPU7 webcam support: https://github.com/intel/ipu7-drivers/issues/17
- Intel out-of-tree IPU7 drivers: https://github.com/intel/ipu7-drivers
- MSI Prestige 14 board-data missing report: https://github.com/intel/ipu6-drivers/issues/414
brltty
CachyOS uses loads br1tty that causes shutdown to hang. Here's how to fix that:
pacman -R brltty orca
dracut --hostonly --no-hostonly-cmdline /boot/initramfs-linux.img --force
Suspend
Resume Lockups
I'm having some lockups that include RCU issues that I've been trying to debug. It seems to cause resume to not hard lock, but to immediately send the laptop back into suspend mode.
The initial logs looked the boltd.service might have been involved so I shut off the service and unloaded the Thunderbolt module before suspend (and did the reverse on resume) after this seemed to help with some other cases however this turned out to not be the fix for my problem.
I'm still diagnosing this.
logs for race https://chatgpt.com/c/66fcd1f3-a7b8-8012-88a3-0d1d16f49c61
Power Drain
Suspend battery drain, as measured by my https://github.com/lhl/batterylog tool is quite high:
Slept for 8.01 hours
Used 7.17 Wh, an average rate of 0.89 W
At 0.89/Wh drain you battery would be empty in 45.24 hours
For your 75.99 Wh battery this is 1.18%/hr or 28.25%/day
How can we tune this? First go through power calibration to make sure powertop is working.
Then use the S0ixSelftestTool:
Power Usage
https://chatgpt.com/c/6702b30d-6e00-8012-aa9f-44bcf290e15e
Performance mode
❯ sudo python3 -c "import time; f='/sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj'; s=int(open(f).read()); time.sleep(2); e=int(open(f).read()); print(f'{(e-s)/(2*1e6):.2f} W')"
29.21 W
Here are the turbostat results now:
RAPL: 15420 sec. Joule Counter Range, at 17 Watts
cpu0: MSR_RAPL_POWER_UNIT: 0x000a0e03 (0.125000 Watts, 0.000061 Joules, 0.000977 sec.)
cpu0: MSR_PKG_POWER_INFO: 0x00000088 (17 W TDP, RAPL 0 - 0 W, 0.000000 sec.)
cpu0: MSR_PKG_POWER_LIMIT: 0x12800dc8640 (UNlocked)
cpu0: PKG Limit #1: ENabled (200.000 Watts, 28.000000 sec, clamp DISabled)
cpu0: PKG Limit #2: DISabled (37.000 Watts, 0.000977* sec, clamp DISabled)
cpu0: MSR_VR_CURRENT_CONFIG: 0x000002f8
cpu0: PKG Limit #4: 95.000000 Watts (UNlocked)
cpu0: MSR_DRAM_POWER_LIMIT: 0x00000000 (UNlocked)
cpu0: DRAM Limit: DISabled (0.000 Watts, 0.000977 sec, clamp DISabled)
cpu0: MSR_PP0_POLICY: 0
cpu0: MSR_PP0_POWER_LIMIT: 0x00000000 (UNlocked)
cpu0: Cores Limit: DISabled (0.000 Watts, 0.000977 sec, clamp DISabled)
cpu0: MSR_PP1_POLICY: 0
cpu0: MSR_PP1_POWER_LIMIT: 0x00000000 (UNlocked)
cpu0: GFX Limit: DISabled (0.000 Watts, 0.000977 sec, clamp DISabled)
cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x85640000 (95 C) (100 default - 5 offset)
cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88140808 (80 C)
cpu0: MSR_IA32_PACKAGE_THERM_INTERRUPT: 0x02000003 (100 C, 100 C)
The limits reported limits should have changed, did the status change? Can you reanalyze, and then also output a table with the various limits (limit, W, duration/time, description)
Power Usage
This is a minimum usage type of scenario
50%
0%
100%
OLED
100% Inverse
setterm --background white --foreground black --clear
PowerTop
First, go to the Settings:Power in GNOME and disable:
- Screen Dimming
- Screen Blank
- Suspend after Inactivity
Then run:
powertop --calibrate
powertop
https://lore.kernel.org/lkml/[email protected]/
https://www.phoronix.com/review/intel-meteorlake-epp-perf https://www.phoronix.com/news/Intel-MTL-EPP-Tuning-64
https://web.archive.org/web/20230614200816/https://01.org/blogs/qwang59/2018/how-achieve-s0ix-states-linux https://web.archive.org/web/20230614200306/https://01.org/blogs/qwang59/2020/linux-s0ix-troubleshooting
https://github.com/system76/firmware-open/issues/506 https://forums.linuxmint.com/viewtopic.php?t=386187 https://stackoverflow.com/questions/28078711/switch-pci-device-to-d3-cold-d3cold-state https://www.graniteriverlabs.com/en-us/technical-blog/thunderbolt-fv-modern-standby
Performance
Geekbench
Memory Bandwidth
mbw pmbw
Passmark
Performance BIOS
So interestingly setting things to max performance...
# Set everything to max
# Performance mode
sudo cpupower frequency-set -g performance
# EPB: 0
sudo x86_energy_perf_policy performance
Double check the status:
❯ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
❯ cat /sys/devices/system/cpu/cpu*/cpufreq/energy_performance_preference
performance
performance
performance
performance
performance
performance
performance
performance
❯ sudo x86_energy_perf_policy -r
cpu0: EPB 0
cpu0: HWP_REQ: min 5 max 5 des 0 epp 0 window 0x0 (0*10^0us) use_pkg 0
cpu0: HWP_CAP: low 1 eff 18 guar 26 high 55
cpu1: EPB 0
cpu1: HWP_REQ: min 5 max 5 des 0 epp 0 window 0x0 (0*10^0us) use_pkg 0
cpu1: HWP_CAP: low 1 eff 18 guar 26 high 56
cpu2: EPB 0
cpu2: HWP_REQ: min 5 max 5 des 0 epp 0 window 0x0 (0*10^0us) use_pkg 0
cpu2: HWP_CAP: low 1 eff 18 guar 26 high 55
cpu3: EPB 0
cpu3: HWP_REQ: min 5 max 5 des 0 epp 0 window 0x0 (0*10^0us) use_pkg 0
cpu3: HWP_CAP: low 1 eff 18 guar 26 high 56
cpu4: EPB 0
cpu4: HWP_REQ: min 4 max 4 des 0 epp 0 window 0x0 (0*10^0us) use_pkg 0
cpu4: HWP_CAP: low 1 eff 18 guar 22 high 37
cpu5: EPB 0
cpu5: HWP_REQ: min 4 max 4 des 0 epp 0 window 0x0 (0*10^0us) use_pkg 0
cpu5: HWP_CAP: low 1 eff 18 guar 22 high 37
cpu6: EPB 0
cpu6: HWP_REQ: min 4 max 4 des 0 epp 0 window 0x0 (0*10^0us) use_pkg 0
cpu6: HWP_CAP: low 1 eff 18 guar 22 high 37
cpu7: EPB 0
cpu7: HWP_REQ: min 37 max 37 des 0 epp 0 window 0x0 (0*10^0us) use_pkg 0
cpu7: HWP_CAP: low 1 eff 18 guar 22 high 37
pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
pkg0: MSR_HWP_INTERRUPT: 0x00000005 (Excursion_Min-Disabled, Guaranteed_Perf_Change-ENabled)
pkg0: MSR_HWP_STATUS: 0x00000000 (No-Excursion_Min, No-Guaranteed_Perf_Change)
sysbench
| Powersave | Performance | % Difference | |
|---|---|---|---|
| Single-core | 3550.04 | 3615.52 | +1.84% |
| Multi-core | 35298.07 | 35550.51 | +0.72% |
sysbench 1.0.20 (using system LuaJIT 2.1.1720049189)powersave, EPB=6performance, EPB=0
Here's what it looks like running stress -c 8
# CPU cores loaded - initial
❯ sudo cpupower monitor
...
| Nehalem || Mperf || RAPL || Idle_Stats
CPU| C3 | C6 | PC3 | PC6 || C0 | Cx | Freq || pack | dram | core | unco || POLL | C1_A | C2_A | C3_A
0| 0.00| 0.00| 0.00| 0.00|| 99.68| 0.32| 4200||27595083| 26429|26116022| 0|| 0.00| 0.00| 0.00| 0.00
1| 0.00| 0.00| 0.00| 0.00|| 99.68| 0.32| 4198||27595083| 26429|26116022| 0|| 0.00| 0.00| 0.00| 0.00
2| 0.00| 0.00| 0.00| 0.00|| 99.68| 0.32| 4197||27595083| 26429|26116022| 0|| 0.00| 0.00| 0.00| 0.00
3| 0.00| 0.00| 0.00| 0.00|| 99.68| 0.32| 4200||27595083| 26429|26116022| 0|| 0.00| 0.00| 0.00| 0.00
4| 0.00| 0.00| 0.00| 0.00|| 99.68| 0.32| 3703||27595083| 26429|26116022| 0|| 0.00| 0.00| 0.00| 0.00
5| 0.00| 0.00| 0.00| 0.00|| 99.68| 0.32| 3703||27595083| 26429|26116022| 0|| 0.00| 0.00| 0.00| 0.00
6| 0.00| 0.00| 0.00| 0.00|| 99.68| 0.32| 3703||27595083| 26429|26116022| 0|| 0.00| 0.00| 0.00| 0.00
7| 0.00| 0.00| 0.00| 0.00|| 99.68| 0.32| 3703||27595083| 26429|26116022| 0|| 0.00| 0.00| 0.00| 0.00
# Sustained
| Nehalem || Mperf || RAPL || Idle_Stats
CPU| C3 | C6 | PC3 | PC6 || C0 | Cx | Freq || pack | dram | core | unco || POLL | C1_A | C2_A | C3_A
0| 0.00| 0.00| 0.00| 0.00|| 99.68| 0.32| 3452||16973040| 31006|15404868| 0|| 0.00| 0.00| 0.00| 0.00
1| 0.00| 0.00| 0.00| 0.00|| 99.68| 0.32| 3521||16973040| 31006|15404868| 0|| 0.00| 0.00| 0.00| 0.00
2| 0.00| 0.00| 0.00| 0.00|| 99.68| 0.32| 3456||16973040| 31006|15404868| 0|| 0.00| 0.00| 0.00| 0.00
3| 0.00| 0.00| 0.00| 0.00|| 99.68| 0.32| 3525||16973040| 31006|15404868| 0|| 0.00| 0.00| 0.00| 0.00
4| 0.00| 0.00| 0.00| 0.00|| 99.64| 0.36| 2701||16973040| 31006|15404868| 0|| 0.00| 0.03| 0.01| 0.00
5| 0.00| 0.00| 0.00| 0.00|| 99.68| 0.32| 2701||16973040| 31006|15404868| 0|| 0.00| 0.00| 0.00| 0.00
6| 0.00| 0.00| 0.00| 0.00|| 99.68| 0.32| 2701||16973040| 31006|15404868| 0|| 0.00| 0.00| 0.00| 0.00
7| 0.00| 0.00| 0.00| 0.00|| 99.68| 0.32| 2701||16973040| 31006|15404868| 0|| 0.00| 0.00| 0.00| 0.00
# Temps
❯ sensors
...
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +62.0°C (high = +100.0°C, crit = +100.0°C)
Core 0: +60.0°C (high = +100.0°C, crit = +100.0°C)
Core 4: +58.0°C (high = +100.0°C, crit = +100.0°C)
Core 8: +62.0°C (high = +100.0°C, crit = +100.0°C)
Core 12: +61.0°C (high = +100.0°C, crit = +100.0°C)
Core 32: +53.0°C (high = +100.0°C, crit = +100.0°C)
Core 33: +53.0°C (high = +100.0°C, crit = +100.0°C)
Core 34: +53.0°C (high = +100.0°C, crit = +100.0°C)
Core 35: +53.0°C (high = +100.0°C, crit = +100.0°C)
...
# Power
❯ sudo python3 -c "import time; f='/sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj'; s=int(open(f).read()); time.sleep(5); e=int(open(f).read()); print(f'{(e-s)/(5*1e6):.2f} W')"
17.01 W
llama.cpp
First lets build llama.cpp
# 17W Balanced BIOS, balanced mode
❯ make clean && time make -j
________________________________________________________
Executed in 51.53 secs fish external
usr time 206.39 secs 282.00 micros 206.39 secs
sys time 7.51 secs 0.00 micros 7.51 secs
# 28W Performance BIOS, balanced mode
________________________________________________________
Executed in 43.59 secs fish external
usr time 172.84 secs 293.00 micros 172.84 secs
sys time 6.00 secs 143.00 micros 6.00 secs
# 5950X (-j32)
real 0m52.351s
user 3m51.819s
sys 0m21.703s
# 6900HX (-j16)
________________________________________________________
Executed in 69.49 secs fish external
usr time 387.87 secs 586.00 micros 387.87 secs
sys time 15.98 secs 248.00 micros 15.98 secs
CPU:
# -t 2 : 2 P-Cores
❯ ./llama-bench -m ~/ai/models/gguf/llama-2-7b.Q4_0.gguf
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 2 | pp512 | 12.94 ± 0.01 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 2 | tg128 | 5.28 ± 0.00 |
build: 71967c2a (3884)
# -t 4 : 4 P-Cores
❯ ./llama-bench -m ~/ai/models/gguf/llama-2-7b.Q4_0.gguf -t 4
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 4 | pp512 | 19.19 ± 0.05 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 4 | tg128 | 8.28 ± 0.16 |
build: 71967c2a (3884)
# -t 8 : 4 P-Cores 4 E-Cores
❯ ./llama-bench -m ~/ai/models/gguf/llama-2-7b.Q4_0.gguf -t 8
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 8 | pp512 | 14.99 ± 1.06 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 8 | tg128 | 9.67 ± 0.07 |
build: 71967c2a (3884)
Some interesting results with the Performance BIOS settings. With 4 threads (4 P-cores) both pp and tg are better, so it was previously power limited. With 8 threads, it's able to get 30% better tg (still comes out to about only 55GB/s, much lower than the theoretical MBW of 133GB/s)
❯ ./llama-bench -m ~/ai/models/gguf/llama-2-7b.Q4_0.gguf -t 4 (base)
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 4 | pp512 | 24.91 ± 0.03 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 4 | tg128 | 11.89 ± 0.12 |
build: 71967c2a (3884)
❯ ./llama-bench -m ~/ai/models/gguf/llama-2-7b.Q4_0.gguf -t 8 (base)
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 8 | pp512 | 14.85 ± 0.64 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 8 | tg128 | 15.49 ± 0.09 |
build: 71967c2a (3884)
SYCL:
# Build
source /opt/intel/oneapi/setvars.sh
cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_NATIVE=ON
cmake --build build --config Release
# CPU -t 2
$ build/bin/llama-bench -m ~/ai/models/gguf/llama-2-7b.Q4_0.gguf
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | BLAS | 2 | pp512 | 22.97 ± 0.56 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | BLAS | 2 | tg128 | 5.32 ± 0.00 |
build: 71967c2a (3884)
# CPU -t 4
$ build/bin/llama-bench -m ~/ai/models/gguf/llama-2-7b.Q4_0.gguf -t 4
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | BLAS | 4 | pp512 | 23.80 ± 0.35 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | BLAS | 4 | tg128 | 7.57 ± 0.10 |
build: 71967c2a (3884)
# CPU -t 8
$ build/bin/llama-bench -m ~/ai/models/gguf/llama-2-7b.Q4_0.gguf -t 8
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | BLAS | 8 | pp512 | 24.05 ± 0.11 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | BLAS | 8 | tg128 | 8.68 ± 0.02 |
build: 71967c2a (3884)
- On CPU, all cores seem to be used no matter the
-tsetting for prompt processing, and then the set number of cores for token generation
SYCL GPU:
$ clinfo -l
Platform #0: Intel(R) FPGA Emulation Platform for OpenCL(TM)
`-- Device #0: Intel(R) FPGA Emulation Device
Platform #1: Intel(R) OpenCL
`-- Device #0: Intel(R) Core(TM) Ultra 7 258V
Platform #2: Intel(R) OpenCL Graphics
`-- Device #0: Intel(R) Graphics [0x64a0]
$ sycl-ls
ZE_LOADER_DEBUG_TRACE:Using Loader Library Path:
ZE_LOADER_DEBUG_TRACE:Tracing Layer Library Path: libze_tracing_layer.so.1
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2024.17.3.0.08_160000]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 7 258V OpenCL 3.0 (Build 0) [2024.17.3.0.08_160000]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Graphics [0x64a0] OpenCL 3.0 NEO [24.35.30872]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x64a0] 1.5 [1.3.30872]
- Note the IPEX llama.cpp IPEX container sadly does not work for LNL: https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/DockerGuides/docker_cpp_xpu_quickstart.md#start-docker-container
# sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 7 258V OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
Oof we need to fix our packages...
for basekit...
$ pacman -Qq | grep '^intel-oneapi-'
intel-oneapi-common
intel-oneapi-compiler-dpcpp-cpp-runtime
intel-oneapi-compiler-dpcpp-cpp-runtime-libs
intel-oneapi-compiler-shared
intel-oneapi-compiler-shared-runtime
intel-oneapi-compiler-shared-runtime-libs
intel-oneapi-dev-utilities
intel-oneapi-dpcpp-cpp
intel-oneapi-dpcpp-debugger
intel-oneapi-mkl
intel-oneapi-openmp
intel-oneapi-tbb
intel-oneapi-tcm
# More dependencies
paru -Rns $(paru -Qq | grep '^intel-oneapi-') onnxruntime blas-mkl blas64-mkl intel-opencl-runtime opencv
# OK, let's try the basekit
paru -S intel-oneapi-basekit
# Install stuff back in
onnxruntime blas-mkl blas64-mkl opencv
# OK, lets get in bash and compile
# https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/SYCL.md#ii-build-llamacpp
source /opt/intel/oneapi/setvars.sh
cmake -B build -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
cmake --build build --config Release -j -v
# Get devices
$ ./build/bin/llama-ls-sycl-device
ZE_LOADER_DEBUG_TRACE:Using Loader Library Path:
ZE_LOADER_DEBUG_TRACE:Tracing Layer Library Path: libze_tracing_layer.so.1
found 1 SYCL devices:
| | | | |Max | |Max |Global | |
| | | | |compute|Max work|sub |mem | |
|ID| Device Type| Name|Version|units |group |group|size | Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]| Intel Graphics [0x64a0]| 1.5| 64| 1024| 32| 15064M| 1.3.30872|
# See: ./examples/sycl/run-llama2.sh
# PROBLEM:
ipex-llm
# wow really?
# https://github.com/pytorch/pytorch/issues/123097
mamba install mkl==2024.0
- https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/benchmark_quickstart.md
NPU
mamba create -n npu python=3.11
pip install openvino-genai==2024.4.0
pip install optimum[openvino,nncf]
python -m pip install intel-extension-for-pytorch
python -m pip install oneccl_bind_pt --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
# Well I can get GPU working
https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/hello-npu/hello-npu.ipynb
# Models
https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/genai-guide.html
https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/genai-guide-npu.html
git clone https://github.com/openvinotoolkit/openvino.genai.git
python openvino.genai/samples/python/benchmark_genai/benchmark_genai.py -m TinyLlama-1.1B-Chat-v1.0
- OpenVINO: https://docs.openvino.ai/2024/get-started/install-openvino.html?PACKAGE=OPENVINO_GENAI&VERSION=v_2024_4_0&OP_SYSTEM=LINUX&DISTRIBUTION=PIP
- https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/llm-inference-hf.html
- https://github.com/intel/intel-npu-acceleration-library
- https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=cpu&version=v2.4.0%2bcpu&os=linux%2fwsl2&package=pip
- https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu
- broken, wants lsb_release
- https://github.com/openvinotoolkit/nncf
- https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/llm-inference-hf.html
- https://github.com/openvinotoolkit/openvino.genai
- https://docs.openvino.ai/2024/ovms_what_is_openvino_model_server.html
- https://medium.com/openvino-toolkit/how-to-run-openvino-on-a-linux-ai-pc-52083ce14a98
- https://raymondlo84.medium.com/running-llama2-on-cpu-with-openvino-125fbf10daa1
- https://docs.llamaindex.ai/en/stable/examples/llm/openvino/
Packages Installed
### PACSTRAP
base base-devel linux linux-firmware linux-headers intel-ucode
### CHROOT
fish vim neovim
# cd /usr/bin; ln -s nvim vi
networkmanager
openssh
avahi nss-mdns
paru # in cachyos
### INSTALL
# shell
byobu kitty
starship atuin
less ripgrep tree
keyd
tldr
# system info
btop htop dool
powertop powerstat turbostat
linux-tools
usbutils
smartmontools
lshw
bmon iftop nethogs
hw-probe
fastfetch
stress-ng s-tui
msr-tools
nvtop intel-gpu-tools
# basic
rsync wget
# hardware support
sof-firmware
linux-mainline linux-mainline-headers
bluez-tools
pipewire pipewire-alsa alsa-utils
fwupd
linux-firmware-git # why not
acpilight
# Energy tuning
auto-cpufreq thermald tuned tuned-ppd
gnome-power-manager
# GUI
gdm gnome
appimagelauncher
ulauncher
# Fonts
noto-fonts
noto-fonts-emoji
noto-fonts-cjk
nerd-fonts
ttf-ms-fonts
ttf-liberation
# intel GPU drivers
mesa intel-media-driver vulkan-intel
libva-utils clinfo mesa-utils vulkan-tools
intel-compute-runtime intel-opencl-runtime
# intel-oneapi-basekit is sort of integreated but has conflicts w/ runtime install
# intel-oneapi-hpckit is another megapackage
intel-oneapi-dpcpp-cpp intel-oneapi-compiler-shared-runtime intel-oneapi-mkl onednn onnx onnxruntime
blas-mkl blas64-mkl
# broken
# openvino
# Benchmarks
geekbench passmark-performancetest-bin
kdiskmark
sysbench
stress-ng
bandwidth mbw pmbw
# Media
ffmpeg mpv vlc
spotify
yt-dlp
# Browsers
firefox
firefox-developer
google-chrome
# Chat
discord vesktop slack-desktop-wayland
# Docker
docker docker-compose
BIOS Update
MSI BIOS updates can be downloaded from the US Support Page.
Via fwupd
MSI doesn't typically publish to LVFS, but it's worth checking since it's the easiest method:
fwupdmgr get-devices
fwupdmgr refresh
fwupdmgr get-updates
If an update shows up, just run fwupdmgr update and follow the prompts.
Via M-FLASH (EFI Partition)
If fwupd doesn't have the update (likely), you can use MSI's built-in M-FLASH utility. Rather than needing a USB drive, you can place the BIOS file directly on your EFI System Partition since it's already FAT32:
- Download the BIOS zip from MSI's support page and extract it
- Copy the BIOS file to your ESP:
sudo cp E13Q3IMS.112 /boot/efi/
(Adjust the ESP mount point if yours differs, e.g. /efi or /boot)
3. Reboot and enter BIOS setup (press Delete at boot)
4. Navigate to M-FLASH - it should see the ESP as a FAT32 volume and list the BIOS file
5. Select the file and follow the prompts to flash
Make sure the laptop is plugged in before flashing. After the update completes, clean up:
sudo rm /boot/efi/E13Q3IMS.112
Notes
I recently bought a Lunar Lake laptop myself (MSI Prestige 13 AI+ EVO w/ an Intel Core Ultra 7 258V - boy that naming is awful) and these performance results don't surprise me that much - LNL has 4 P-Cores locked at 17W. But, I mainly got it because it was a great ultralight config at <1kg of weight with a 75Wh battery. My powertop and powerstat testing has whole laptop idling as low as 2.3W and under light use (text editing, browsing) in GNOME it seems to hang around 5-8W, which is not bad.
For me a bunch of stuff (GPU, WiFi) didn't work well w/ 6.11, so I had to go to 6.12rc1 mainline - I do think the Linux support overall is undercooked overall. Also, the suspend on my laptop is wonky. When it does work, it still burns about 0.9% battery/h (almost 30% battery/day), doesn't seem to ever get to PC10 and running S0ixSelftestTool, only ever gets to S0i2.1. That isn't a dealbreaker, assuming suspend-then-hibernate can work, but it also seems to have intermittent RCU timeouts on resume that causes the laptop to immediately go back to suspend and never wake up (not great). Just normal Linux laptop things I guess, but hopefully some kernel updates can iron that out.
I haven't done much testing yet, but on the performance front, one thing that was surprising to me was that even with optimal settings, on CPU the llama.cpp tg128 results for Llama2 7B Q4_0 was only 9.67 tokens/s. This is significantly lower than on my 7940HS minipc which gets 14.42 tokens/s on CPU. Token generation should be mostly memory bandwidth bound, and Lunar Lake has 128-bit LPDDR5X-8533 vs my 7940HS's 128-bit DDR5-5600 so I would have expected the laptop to do a fair bit better (I ran bandwidth as a sanity check and past 2.5MB of sequential reads, MBW drops off a cliff for some reason.
Anyway, I think LNL hits the spot for the its target market. For people looking for workstations I think ARL-H vs Strix (or better yet, Strix Halo) will be where the fun is.
06 October 2024, 11:36 AM
Interesting, how did you test the idle state, like what was still running?
By coincidence I just tested my 11th gen laptop and got it down to 2,9W (brightness 10%) or so but basically nothing running, even no Wifi.
But it's also bad with S0idle, it doesn't really work, like gpu crashing when waking up if it went lowest power state.
I tested the idle in TTY (with GDM running in the bg) and running powerstat. It's Arch-based so there's very little running in the background atm (mostly systemd stuff) but I did install tuned, thermald, and auto-cpufreq. powertop reports that all tuning parameters are 'GOOD'. What's somewhat impressive w/ the idle power is that I didn't turn the brightness down (I believe it was 50%), but it's an OLED display, so with a black TTY should be favoring lower power consumption. Wifi was left on. Note, my old Framework laptop had a 1260P that could also idle quite low, as low as 2.6W in TTY w/ 0% backlight, but it was also at ~99.8% C10 when idling whereas the new LNL chip doesn't seem to get below C7 atm (from turbostat results, powerstat only reports ACPI states (C1_ACPI-C3_ACPI) but it powerstat -d 0 -c -H 1 480 -R -D shows the CPU averaging 1.61W. Here's the breakdown from the RAPL reports:
| Uncore | Package | Core | DRAM | Platform |
| 0.00 | 0.60 | 0.02 | 0.03 | 0.95 |
Pretty impressive if I'm reading them correctly.
For your S0idle, I'd recommend trying out the S0ixSelftestTool (on Github) if you haven't yet, it might tell you something useful.
Oh also, in the BIOS I found a "Performance" setting which seems to disable the PL limits from turbostat's perspective. In practice, this seems to take the power from being locked at 17W to about 29.5W.
The fans get quite loud, but temps on the laptop hold steady at about 84C. Anyway, at this point, if I could figure out the intermittent resume issues I'd be a pretty happy camper.