PVE Proxmox Tips - hpaluch/hpaluch.github.io GitHub Wiki
PVE Proxmox is Linux + KVM (or LXC) virtualization. It is freely available from: https://www.proxmox.com/en/downloads
Tested version:
pveversion
pve-manager/5.1-41/0b958203 (running kernel: 4.13.13-2-pve)
Notable highlights:
- HTML5/js web client - works from both Linux and Windows. The only exception is
SPICE console (requires native binary
virt-viewer
). However you can always use VNC as fallback - KVM/QEMU supports memory overcommit well - see Hypervisor Memory overcommit tests. The only comparable hypervisor (in this respect) is ESXi from VMware.
- also supports LXC containers (no HW virtualization needed - least possible overhead)
- also supports Software QEMU mode (slow, but useful when you have no HW virtualization - for example when running as nested VM on older CPU).
Traps
LVM-thin warnings
Proxmox by default creates lvm-thin LV (logical volume where thin VMs will be placed) called
data
in VG pve
. It may later cause serious issues:
If you plan to deploy lvm-thin on regular HDD you should definitely place at least Metadata on SSD. It is official advice from Zdenek (current lvm-thin maintainer) from: https://listman.redhat.com/archives/linux-lvm/2022-October/026291.html
... - there is rolling update of bTrees which is by design 'seek unfriendly' - so for the performance hunting users the use of SSD/NVMe type storage for these metadata volumes is basically a must (and it's been designed for that).
Otherwise performance of VMs will seriously suffer as disk will fill-up.
In my case - when I used lvm-thin for several years on 1TB SATA Seagate HDD (default Proxmox install - both Data and Metadata on HDD), problems started when I extended lvm-thin to most of the disk and filled around 80%. Symptoms were:
- Windows 11 & Windows 2022 guests booting and running extremely slowly (boot took several minutes just after fresh install)
- testing all of available
cache=
modes includingunsafe
did not improve boot time (!) - however when I moved same VM to ext4 "dir" in .raw format - it was different story. With any read-cache
mode (including
cache=writethrough
) I got superior performance - boot in around 10s (sometimes I have no chance to see boot logo but rather login screen when I initiated VNC or Spice connection to VM).
Diagnostics:
- there was just high load average (over 2.00 when only 1 VM was booting and then running) and high system CPU usage (around 20%).
- however there was no clear pointer to lvm-thin as culprit - iowait was around 2% (iostat), similarly (pidstat) did not report LVM (dm-thin processes) as cause
- so in my case it was pure luck that I moved VM storage from lvm-thin to ext4/raw, enabled any cache
mode (even
writethrough
will help much) and proved that way that lvm-thin was cause of performance regression
Also do not overflow Metadata:
-
never fill-up metadata - watch
Meta%
column inlvs -a
output. It is known "feature" of lvm-thin, that filling up metadata will lead to metadata corruption where manual action is necessary -
see examples on:
Also never fill-up Data (that's not bug but rather feature of thin and over-provisioning):
- when data is filling up, lvm-thin will invoke callback that should increase space in LV, but you have to create such callback first (normally it is no-op)
- when Data fills-up it will for short time suspend I/O on such LV (but only for short time)
- after that timeout it will drop pending I/O requests and report errors (can be seen with
dmesg
). But! If you use any of writeback/unsafe cache mode, that VM already got confirmation that data were written successfully (because write to cache never fails and I/O layer has to return immediately to caller), but actually it was lost. This will lead to massive data corruption in VM. - if it occurs to you, you should not just increase Data for lvm-thin but also restore all affected VMs (that write any data on filled-up lvm-thin) from backup! Do not attempt to just ignore this problem! I really mean that!
And finally (not bug of lvm-thin):
- CloneZilla backup software is unable to backup your machine if any of disk contains lvm-thin partition (not just disk you want to backup)
- it will first halt for long time trying to examine every LV in lvm-thin
- it will later crash while trying to again examine and backup disk...
However if you are aware of all above points:
- always place at least Metadata (or both Data and Metadata) on SSD
- never fill-up Metadata - you will have to manually intervene and restore corrupted metadata
- if you fill-up Data you will have to restore all VMs that tried to write data at that time, because some writes will be lost without VM knowing that (when any kind of write cache mode is used). Also even if VM detect I/O error it will likely have corrupted FS, because it is unexpected event
- never use CloneZilla to backup computer with lvm-thin LVs
You may still find scenarios where lvm-thin will do its job, for example prototyping and testing new OS...
Emulated SATA partition corruption
Never(!) use emulated SATA in Proxmox VE. It will sooner or later corrupt your MBR partition on disk (it really happened to us). It was fixed only recently:
- https://git.proxmox.com/?p=pve-qemu.git;a=commitdiff;h=816077299c92b2e20b692548c7ec40c9759963cf;hp=ef3308db717b00df6dfc25495a6263703fa84d67
- https://mail.gnu.org/archive/html/qemu-devel/2023-08/msg04264.html
- https://forum.proxmox.com/threads/vm-efi-boot-corrupt.82922/
- https://bugzilla.proxmox.com/show_bug.cgi?format=multiple&id=2874
Other
If you use LVM and/or ZFS, please be aware that you can't put 2 HDDs with installed Proxmox to single PC. It will fail with error "duplicate VG" or with error "duplicate ZFS pool"...
In my case I rather renamed VG:
- original pointer is here:
- booting Proxmox install ISO in Advanced mode
- exiting 1st RAMdisk shell (pretty useless)
- in 2nd shell (much more useful) I issued
vgrename pve pvessd
- and mounted original FS and updated VG names in its
/etc/fstab
- next you have to bind mount
/dev/
,/proc/
and/sys
- enter chroot
- call
update-grub
in chroot - exit chroot
- unmount all filesystems in chroot
- reboot
Putty SSH ciphers error
If you get SSH connection error like
Couldn't agree a client-to-server cipher ...
then you need to upgrade your Putty client (had success with putty-64bit-0.70-installer.msi
from https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html)
List and download PVE LXC appliances
To use Linux containers (LXC based) you need to download appliance filesystem archive
Use this command to list LXC appliances:
pveam available
Use this command to download specific appliance:
pveam download local debian-9.0-standard_9.3-1_amd64.tar.gz
NOTE:
The
local
argument is your storage name. You can list available storages using command:pvesm status
You can then create new LXC container using this template. Example using Web UI:
- logon to your Proxmox Web UI at
https://IP_ADDRESS:8006
- click on "Create CT" (blue button at right-top corner of webpage)
- on "General" tab fill:
- "Password"
- "Confirm Password"
- on "Template" tab fill:
- "Storage" - keep "local" if possible
- "Template" - there should be only option - previously downloaded
debian-9.0-standard_9.3-1_amd64.tar.gz
- keep defaults on "Root Disk" (lvm-thin), "CPU" and "Memory" tabs
- on "Network" tab remember to either fill IPv4 (or IPv6) address or choice DHCP (if you have DHCP server on your network)
- "DNS" - I have available "use host settings" only so it is easy
- click on "Confirm" tab and then click on "Finish" button
- wait for creation to complete:
- on "Status" tab you should see "stopped: OK"
Now you can expand in tree
-
"Data Center" -> "pve" -> "100 (CT100)"
-
click on "Start" button
-
you can click on "Console" (or "Console JS") to login to your container
-
in case or our Debian you can query container ip address using command:
ip addr
NOTE: above Debian 9 image does not allow root to log via SSH - so you need to either create non-root user to login, or
PermitRootLogin yes
in/etc/ssh/sshd_config
of your LXC
Getting container info from Proxmox SSH connection:
-
list LXC containers:
lxc-ls 100
-
list information about container
100
:lxc-info --name 100
NOTE: grep
IP:
column to get IP address of container.
Quick Download of ISO vm
To have new VM you typically need installation ISO. You can use either web UI to upload ISO this way:
- logon to your Proxmox Web UI at
https://IP_ADDRESS:8006
- expand/click on "Datacenter" -> "pve" -> "local"
- clock on "Upload" on right-pane to Upload your ISO
Or you can download ISO image directly to your proxmox storage:
- SSH to your Proxmox
- cd to ISO directory and download installation ISO image, for example:
cd /var/lib/vz/template/iso
wget http://ftp.linux.cz/pub/linux/debian-cd/9.4.0/amd64/iso-cd/debian-9.4.0-amd64-netinst.iso
NOTE: If your download fail you can continue it (without need to download whole file again) using
-c
argument for example:wget -c http://ftp.linux.cz/pub/linux/debian-cd/9.4.0/amd64/iso-cd/debian-9.4.0-amd64-netinst.iso
Now you can create your first VM using this ISO, for example:
- logon to your Proxmox Web UI at
https://IP_ADDRESS:8006
- click on "Create VM" blue button at the right-top of web page
- click on "OS" tab
- select your "Iso image:" -
debian-9.4.0-amd64-netinst.iso
- select your "Iso image:" -
- click on "Hard Disk", "CPU", "Memory", "Network", "Confirm" tabs and on "Finish" button.
- expand "Data Center" -> "pve" -> "101 (VM 101)"
- click on "Start"
- click on "Console" and follow standard Debian installation
PVE SSH commands to get basic info about VMs:
qm list
# 101 is VMID from "qm list"
qm config 101
Listing available templates
Use this command to list installed LXC and ISO templates:
# note: "local" is storage name
pvesm list local
Install QEMU Agent
QEMU Agent is used to run commands (for example shutdown...) in guest from Host.
Please see https://pve.proxmox.com/wiki/Qemu-guest-agent for guide.
You can then run Agent commands from Proxmox SSH, for example:
qm agent 101 network-get-interfaces
Enable backups for local storage
It is two step operation:
- enable backup for
local
storage:pvesm set local --content rootdir,images,backup,iso
- set maximum number of backups to 99 (should be more than enough):
pvesm set local --maxfiles 99
Set of misc. scripts
WARNING! Always customize and/or double-check these scripts before running them !!!
Script batch_create.sh
to create multiple VMs from single backup ("full clone" from backup):
#!/bin/bash
set -e
for i in `seq 2 15`
do
vmid=$((100 + $i))
set -x
qm create $vmid \
--archive /var/lib/vz/dump/vzdump-qemu-101-2018_05_09-17_07_13.vma.lzo \
--unique 1
# --name can't be specify with --archive option...
qm set $vmid --name "CentOS7.4MariaBench-$i"
set +x
done
echo "ALL Done"
exit 0
Script batch_remove.sh
to quickly remove VMs from 102 to 112(!):
#!/bin/bash
set -e
for i in `seq 2 12`
do
vmid=$((100 + $i))
set -x
qm destroy $vmid
set +x
done
echo "ALL DONE"
exit 0
Script batch_start_and_wait.sh
to start and wait (till QEMU agent responds) sequence of VMs:
#!/bin/bash
t1=`mktemp`
set -e
for i in `seq 106 110`
do
echo -n "Starting $i: "
qm start $i
while true
do
if qm agent $i ping 2> $t1
then
echo
break
fi
fgrep -q timeout $t1
echo -n "."
sleep 1
done
done
echo "All done"
exit 0
Script shutdown_all_running.sh
to shutdown all running VMs using QEMU Guest Agent's Shutdown command:
#!/bin/bash
set -e
for i in `qm list | grep ' running ' | awk '{print $1}'`
do
set -x
qm agent $i shutdown
set +x
done
echo "All running VMs were shut down"
exit 0
SPICE - limit resolution under CentOS7
I have Proxmox VE 6.2-15
host.
My CentOS 7.9.2009 (Core)
guest uses:
- Display:
SPICE (qxl)
- or on command line:
qm config VMID_NUMBER | egrep '^vga:' vga: qxl
I want to reduce ridiculously high console resolution to something sane. I had great luck with documentation on: https://wiki.archlinux.org/index.php/kernel_mode_setting
Here is step by step guide:
- verify that kernel really uses
qxl
drm device:lsmod | grep qxl qxl 59032 1 ttm 96673 1 qxl drm_kms_helper 186531 1 qxl drm 456166 4 qxl,ttm,drm_kms_helper
- there must be
qxl
driver and usage count must be at least1
- now we need to find used video output - using modified
scriptlet from https://wiki.archlinux.org/index.php/kernel_mode_setting
for p in /sys/class/drm/*/status; do con=${p%/status}; echo -n "${con#*/card?-}: "; cat $p; done | egrep ':\s+connected' Virtual-1: connected
- now we can add (again from ArchLinux wiki) proper
video=settings
to grub configuration - update line below in
/etc/default/grub
:GRUB_CMDLINE_LINUX="video=Virtual-1:800x600"
- (course change your output and resolution to suit your needs)
- regenerate boot grub2 configuration using:
grub2-mkconfig -o /boot/grub2/grub.cfg
- reboot your computer using
init 6
and watch your SPICE console (usingvirt-viewer
associated to*.vv
extension)
Enable os-prober
I share my Proxmox installation with other Linux distributions and
therefore I need os-prober
to automatically add those in Grub menu.
However in Proxmox os-prober
is disabled and not installed, to avoid
errors caused by scanning LVM (where are VM installed - lvm-thin
storage).
To enable it we need to:
- install
os-prober
using (obvious):apt-get install os-prober
- Now we have to edit
os-prober
script to disable scanning LVM (my foerign - Gentoo Linux is installed in normal primary partition):--- /usr/bin/os-prober.orig 2022-11-13 08:50:14.873216283 +0100 +++ /usr/bin/os-prober 2022-11-13 08:51:45.234407275 +0100 @@ -70,10 +70,10 @@ fi # Also detect OSes on LVM volumes (assumes LVM is active) - if type lvs >/dev/null 2>&1; then - echo "$(LVM_SUPPRESS_FD_WARNINGS=1 log_output lvs --noheadings --separator : -o vg_name,lv_name | - sed "s|-|--|g;s|^[:space:](/hpaluch/hpaluch.github.io/wiki/:space:)*\(.*\):\(.*\)$|/dev/mapper/\1-\2|")" - fi + #if type lvs >/dev/null 2>&1; then + # echo "$(LVM_SUPPRESS_FD_WARNINGS=1 log_output lvs --noheadings --separator : -o vg_name,lv_name | + # sed "s|-|--|g;s|^[:space:](/hpaluch/hpaluch.github.io/wiki/:space:)*\(.*\):\(.*\)$|/dev/mapper/\1-\2|")" + #fi }
- Finally we have to enable
os-prober
by changing line in/etc/default/grub.d/proxmox-ve.cfg
to:GRUB_DISABLE_OS_PROBER=false
- And run
update-grub
- now you should have detected other OSes installed.
Comparing Proxmox backends - "benchmark"
I decide to compare best speed of various Proxmox backends (settings to get max speed, not data safety!):
- ext4 (Dir backend) - using these mount flags:
commit=60,noatime,barrier=0,data=writeback
- lvm-thin
- zfs
WARNING! It is not real benchmark. It just gives me little clue what can I expect from various Proxmox backends...
I use:
- MB: K9N Platinum, MSI 7250
- CPU: AMD X2 (dual-core, 2GHz)
- PCIe AHCI SATA 3 controller:
# Numeric IDs 03:00.0 0106: 1b21:1164 (rev 02) (prog-if 01 [AHCI 1.0]) Subsystem: 2116:2116 # Names 03:00.0 SATA controller: ASMedia Technology Inc. Device 1164 (rev 02) (prog-if 01 [AHCI 1.0]) Subsystem: ZyDAS Technology Corp. Device 2116
- details here: https://www.axagon.eu/en/produkty/pces-sa4x4
- chipset (from above page):
ASMedia ASM1164
- SATA3 SSSD:
KINGSTON SA400S37480G
, 480GB - 8 GB RAM
Hypervisor: Proxmox VE 7.4-3
(May 2023)
Disabled mitigations for both Host and Guest in /etc/default/grub
:
GRUB_CMDLINE_LINUX_DEFAULT="mitigations=off"
And run sudo update-grub
Tested VM:
- latest Debian 11
- recommended settings:
- Options -> Use tablet for pointer:
No
(USB eats lot of CPU, even when Idle) - Hotplug:
Disabled
- Options -> Use tablet for pointer:
- 1 CPU core (
host
type for passthrough), 2GB RAM - filesystem: BTRFS with default settings
relatime,space_cache,subvolid=256,subvol=/@rootfs
- testing package build (mc):
- source build installed:
mkdir ~/src cd ~/src sudo apt-get install devscripts dpkg-dev sudo apt-get build-dep mc apt-get source mc
- full rebuild of package with
cd ~/src/mc-4.8.26 time debuild -i -us -uc -b
General results:
- Proxmox iowait barely touches 1% (!)
- But System is around 30% in guest, and 20% in host
Ext4 backend results:
- build time:
real 8m49.175s user 5m50.623s sys 2m47.223s
- 2nd run - very consistent(!)
real 8m58.381s user 5m53.755s sys 2m53.494s
- 2nd run - very consistent(!)
- under ZFS backend:
real 9m8.706s user 5m56.599s sys 3m0.121s
- 2nd run was even worse:
real 9m40.666s user 6m12.011s sys 3m16.522s
- 2nd run was even worse:
- under LVM-thin backend (Proxmox default)
real 8m58.277s user 5m55.074s sys 2m51.104s
- 2nd run worse, but not so much:
real 9m17.272s user 6m8.327s sys 2m58.437s
- 2nd run worse, but not so much:
Trying Debian 11 guest on ext4 instead of BTRFS show no measurable difference. Here is
- guest fs: ext4
- Proxmox backend: ext4 (dir)
- build time:
real 8m57.992s user 5m51.097s sys 2m56.637s
Retest: Proxmox Host installed just on single ext4 LV (removed lvm-thin, /var/lib/vz uses main root
LV, using qcow2),
and same Debian 11 VM on ext4:
- build time:
real 8m35.107s user 5m44.194s sys 2m42.038s
- results are very consistent - on 2nd run just +/-1second.
Disable KSM
Kernel Samepage Merging (KSM) attempt to dedpulicate same memory pages of VMs. However it comes at the expense of CPU usage. Because I have only 2 cores and I rarely run more than 1 VM I disable it by following: https://pve.proxmox.com/wiki/Kernel_Samepage_Merging_(KSM)
- script
./disable-ksm.sh
#!/bin/bash
# https://pve.proxmox.com/wiki/Kernel_Samepage_Merging_(KSM)
set -xeuo pipefail
systemctl disable --now ksmtuned
echo 2 > /sys/kernel/mm/ksm/run
exit 0
Accessing partitions on LVM-Thin
Scenario:
- have installed Fedora 39 on lvm-thin
- want to mount that volume directly on Proxmox VE and backup it with tar
The problem is how to tell kernel to recognize partitions inside LVM volume.
- solution is described here: https://serverfault.com/questions/440287/accessing-the-partitions-on-an-lvm-volume
- use
kpart -x LVM_PATH
to add partitions to Mapper
Example for Fedora 39, using GPT UEFI:
# NOTE: disk-0 is hoding UEFI variables, so OS disk is disk-1 !
echo p | fdisk /dev/mapper/pveiron-vm--103--disk--1
...
Disklabel type: gpt
...
Device Start End Sectors Size Type
/dev/mapper/pveiron-vm--103--disk--1-part1 2048 1230847 1228800 600M EFI System
/dev/mapper/pveiron-vm--103--disk--1-part2 1230848 30590975 29360128 14G Linux filesystem
/dev/mapper/pveiron-vm--103--disk--1-part3 30590976 33552383 2961408 1.4G Linux swap
# now list partitions that will be mapped:
kpartx -l /dev/mapper/pveiron-vm--103--disk--1
pveiron-vm--103--disk--1p1 : 0 1228800 /dev/mapper/pveiron-vm--103--disk--1 2048
pveiron-vm--103--disk--1p2 : 0 29360128 /dev/mapper/pveiron-vm--103--disk--1 1230848
pveiron-vm--103--disk--1p3 : 0 2961408 /dev/mapper/pveiron-vm--103--disk--1 30590976
# Finaly map thse partitions
kpartx -a /dev/mapper/pveiron-vm--103--disk--1
# mount filesystem of interest
mount -r /dev/mapper/pveiron-vm--103--disk--1p2 /mnt/source/
mkdir -p /PATH_TO_BACKUPS/fedora39-in-vm
tar -cva --numeric-owner --one-file-system -f PATH_TO_BACKUPS/fedora39-in-vm/f39-rootfs.tar.zst -C /mnt/source .
WARNING! Above tar command does not include extended attributes (required for selinux), but I will disable selinux.
Getting Storage usage from CLI
When using pvesm status
it displays disk usage in bytes, which is not much useful for humans.
But based on https://forum.proxmox.com/threads/how-to-alert-on-disk-space.90982/ found
that this simple command produces nice output:
pvesh get /nodes/`hostname`/storage
┌──────────────────────────┬───────────┬─────────┬────────┬────────────┬─────────┬────────┬────────────┬────────────┬───────────────┐
│ content │ storage │ type │ active │ avail │ enabled │ shared │ total │ used │ used_fraction │
╞══════════════════════════╪═══════════╪═════════╪════════╪════════════╪═════════╪════════╪════════════╪════════════╪═══════════════╡
│ images,rootdir │ ssd-thin │ lvmthin │ 1 │ 42.37 GiB │ 1 │ 0 │ 56.27 GiB │ 13.90 GiB │ 24.70% │
├──────────────────────────┼───────────┼─────────┼────────┼────────────┼─────────┼────────┼────────────┼────────────┼───────────────┤
│ images,vztmpl,backup,iso │ local │ dir │ 1 │ 24.02 GiB │ 1 │ 0 │ 103.78 GiB │ 74.54 GiB │ 71.83% │
├──────────────────────────┼───────────┼─────────┼────────┼────────────┼─────────┼────────┼────────────┼────────────┼───────────────┤
│ rootdir,images │ local-lvm │ lvmthin │ 1 │ 154.47 GiB │ 1 │ 0 │ 392.75 GiB │ 238.28 GiB │ 60.67% │
└──────────────────────────┴───────────┴─────────┴────────┴────────────┴─────────┴────────┴────────────┴────────────┴───────────────┘
All storage units are nicely formatted - you can see both GiB's and percentages. Tested on Proxmox VE 8.1-4.
Tip: There is also one handful parameter to specify output format:
man pvesh
...
--output-format <json | json-pretty | text | yaml> (default = text)
I plan to use some simple monitoring (probably monit
?) to send alerts when disk usage exceeds specific threshold.
Add NAT Network with custom DHCP and DNS
Sometimes it is useful to have dedicated NAT network with custom DHCP and DNS server for VMs, for example to log DNS queries. We will partially follow https://pve.proxmox.com/wiki/Network_Configuration and also [[Proxmox in Azure]]
Here is my original standard /etc/network/interfaces
with standard bridge vmbr0
:
auto lo
iface lo inet loopback
iface enp0s8 inet manual
auto vmbr0
iface vmbr0 inet static
address 192.168.0.51/24
gateway 192.168.0.1
bridge-ports enp0s8
bridge-stp off
bridge-fd 0
iface enp0s9 inet manual
To add NAT network append to it this:
# Experimental NAT network
auto vmbr2
iface vmbr2 inet static
address 10.10.10.1/24
bridge-ports none
bridge-stp off
bridge-fd 0
post-up echo 1 > /proc/sys/net/ipv4/ip_forward
post-up iptables -t nat -A POSTROUTING -s '10.10.10.0/24' -o vmbr0 -j MASQUERADE
post-down iptables -t nat -D POSTROUTING -s '10.10.10.0/24' -o vmbr0 -j MASQUERADE
Please note that -o vmbr0
on last 2 lines must specify routable interface that has
Internet access - normally vmbr0
.
If you enable firewall on VM you need to also have these two lines to vmbr2
definition
in /etc/network/interfaces
:
post-up iptables -t raw -I PREROUTING -i fwbr+ -j CT --zone 1
post-down iptables -t raw -D PREROUTING -i fwbr+ -j CT --zone 1
See https://pve.proxmox.com/wiki/Network_Configuration for details.
Now we will use dnsmasq
to provide DHCP and DNS server for NAT network
- WARNING! System wide dnsmasq may clash with Proxmox SDN feature. Do not use configuration below if you also use Proxmox SDN!
- install dnsmasq with:
apt-get install dnsmasq
- create new configuration file
/etc/dnsmasq.d/nat.conf
with contents:listen-address=10.10.10.1 # below specify interface where is NAT network: interface=vmbr2 log-queries log-dhcp dhcp-range=10.10.10.100,10.10.10.200,12h # set gateway in DHCP response; dhcp-option=option:router,10.10.10.1 # set dnsmasq as DNS server: dhcp-option=option:dns-server,10.10.10.1 # DNS: Do NOT return IPv6 addresses (AAAA) filter-AAAA # register static IP for specific MAC address #dhcp-host=11:22:33:44:55:66,192.168.0.60 # add custom DNS entry #address=/double-click.net/127.0.0.1
- if you disabled
resolveconf
you may have to also uncomment in/etc/default/dnsmasq
IGNORE_RESOLVCONF=yes
Now reboot to both:
- apply new
/etc/network/interfaces
with NAT network on bridgevmbr2
- apply new configuration for
dnsmasq
After reboot:
- to see log output, use
journalctl -u dnsmasq
- to test it on VM replace original bridge
vmbr0
with our NAT bridgevmbr2
in VM -> Options -> Hardware -> Network - then boot VM and ensure that it uses DHCP assigned IP. In case of RHEL and clones
you can use
nmtui
(NetworkManager Text User Interface) to change IPv4 configuration from "Manual" to "Automatic" - verify that assigned IP is in expected range (10.10.10.100 to 10.10.10.200),
that gateway is 10.10.10.1 (with
ip r
command) and that DNS is correct (inspecting/etc/resolv.conf
for example)
pvestatd
Enabling Debug mode in The pvestatd
service does many important housekeeping jobs including VMs balloon management
when memory allocation exceeds 80%. To see memory goals (every 10s) and other debug info you
can enable debug mode with:
systemctl edit pvestatd
- paste following lines to uncommented area:
[Service] ExecStart= ExecStart=/usr/bin/pvestatd start --debug 1 # Fork will no longer work in debug mode Type=simple
- note: first
ExecStart=
lines empties list parameters (otherwise it will add another command to list) - finally restart service with
systemctl restart pvestatd
- and watch incoming messages using:
journalctl -f -u pvestatd
Tip: To print memory in Megabytes instead of Bytes try this patch:
--- usr/share/perl5/PVE/Service/pvestatd.pm.orig 2024-07-21 07:15:03.996548609 +0000
+++ usr/share/perl5/PVE/Service/pvestatd.pm 2024-07-21 07:22:57.255070977 +0000
@@ -206,6 +206,7 @@
my ($vmstatus) = @_;
my $log = sub { $opt_debug and printf @_ };
+ my $fmt = sub { sprintf("%.2f MB",$_[0]/1024/1024) };
my $hostmeminfo = PVE::ProcFSTools::read_meminfo();
# NOTE: to debug, run 'pvestatd -d' and set memtotal here
@@ -214,7 +215,8 @@
# try to use ~80% host memory; goal is the change amount required to achieve that
my $goal = int($hostmeminfo->{memtotal} * 0.8 - $hostmeminfo->{memused});
- $log->("host goal: $goal free: $hostfreemem total: $hostmeminfo->{memtotal}\n");
+ $log->("host goal: %s free: %s total: %s\n",
+ $fmt->($goal), $fmt->($hostfreemem), $fmt->($hostmeminfo->{memtotal}));
my $maxchange = 100*1024*1024;
my $res = PVE::AutoBalloon::compute_alg1($vmstatus, $goal, $maxchange);
@@ -224,7 +226,7 @@
my $current = int($vmstatus->{$vmid}->{balloon});
next if $target == $current; # no need to change
- $log->("BALLOON $vmid to $target (%d)\n", $target - $current);
+ $log->("BALLOON $vmid to %s (%s)\n", $fmt->($target), $fmt->($target - $current));
eval { PVE::QemuServer::Monitor::mon_cmd($vmid, "balloon", value => int($target)) };
warn $@ if $@;
}
Get memory usage from CLI
How to get value of Datacenter -> Node -> Summary -> RAM Usage
on CLI:
Script name show_used_ram.sh
:
#!/bin/bash
set -euo pipefail
echo -e "Node\t\tRAM Usage Pct"
pvesh get /cluster/resources --output-format json-pretty |
jq -r '.[] | select(.type == "node") | [.id,(10000*.mem/.maxmem|round/100)] | @tsv'
exit 0
If you have wide terminal you can try:
pvesh get /cluster/resources
To see all resources.
Development
How to get source:
- install devel packages with:
apt-get install dpkg-dev
- recommended: create non privileged user and became that user:
/usr/sbin/useradd -m -s /bin/bash dev su - dev
- now as
dev
do checkout and ran make task to fetch submodules:mkdir -p ~/src cd ~/src # below is only master package project that uses submodules for kernel and zfs: git clone git://git.proxmox.com/git/pve-kernel.git cd pve-kernel # prepared task to retrieve submodules for kernel and zfs: # Warning! It will download around 2GB of data! make submodule
- kernel is under
submodules/ubuntu-kernel
and ZFS stuff undersubmodules/zfsonlinux
Transparent Huge Pages - THP
Work in Progress...
When Virtual Memory (paging) support was introduced to i386 CPU the page size was fixed to 4KB, because first 386 computers had barely 4MB of RAM. The problem is that on each "new" page access there must be done mapping using PDE (Page Directory Entry) and PTE (Page Table Entry) for each such 4KB page (once there is mapping CPU can use TLB buffer to avoid costly translation on each access).
Imagine when PC has 4GB RAM. Using 4KB pages it means that there are 1 milion pages - which has serious overhead.
Additionally old CPUs has no HAP (Hardware Assisted Paging) which means that each access to new page must be trapped by Hypervisor and translated (using so called "Shadow Pages").
To remedy this problem there were introduced 2MB pages (and later even larger).
Under Linux there were 2 ways how to utilitize Huge Pages:
- Mostly obsolete: hugetlbfs - pages exposed as filesystem
- Transparent Huge Pages (THP) - integrated into system, default behavior depends on swich below.
To see if THP are available, try:
$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
Boot default can be set with kernel boot parameter transparent_hugepage=KEYWORD
.
On my Proxmox Host I see:
$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
$ fgrep Huge /proc/PID_OF_KVM/status
HugetlbPages: 0 kB
fgrep Huge /proc/meminfo | fgrep -vW 0
AnonHugePages: 3082240 kB
Hugepagesize: 2048 kB
Inside Guest (openSUSE LEAP 15.5 with GitLab):
$ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
fgrep Huge /proc/meminfo | fgrep -vW 0
AnonHugePages: 266240 kB
Hugepagesize: 2048 kB
Hmm, only around 256MB of HugePages used.
However this helped:
# On Proxmox Host:
$ echo always > /sys/kernel/mm/transparent_hugepage/enabled
# started Linux VM (5GB RAM in VM settings):
# and watched on Host:
$ fgrep HugePages /proc/meminfo | grep -vw 0
AnonHugePages: 2408448 kB
2.5GB in Huge Pages (host has 8GB RAM) - looks promising.
Inside guest:
$ fgrep Huge /proc/meminfo | fgrep -vw 0
AnonHugePages: 278528 kB
Hugepagesize: 2048 kB
Works even for Windows guest.
On Host System CPU% dropped to half (from 40% when booting VM to 20% when booting VM!).
Resources
- https://www.linux-kvm.org/images/9/9e/2010-forum-thp.pdf
- https://www.percona.com/blog/benchmark-postgresql-with-linux-hugepages/
Linux guest tips
Running Linux guest under KVM is paradoxically more challenging than running BSD or Windows guest. Why? Because Linux (including guest) tends to eat all free memory for Cache and/or Buffers - which puts significant memory pressure on Host (Proxmox).
I have found that decreasing vm.swappiness
in guest does NOT help.
What however helped is increasing vm.vfs_cache_pressure
from default
100 to 300.
Why? When guest is short on memory (for example running GitLab CI or Artifact checks or LFS checks) it will more easily evict cache avoiding very costly guest swapping (that kills globally Proxmox host performance - as expected). Do more disk from disk to cache is significantly faster that swapping in (or even swapping out).
Unfortunately the whole mechanism is poorly documented - there is "shrinker"
infrastructure in many places (including fs/) called centrally from mm/vmscan.c
.
Some little information is available here:
Also found old but useful:
There also exists vmstat -m
command to show various caches, but again I did not found
more details how to use it for tuning...