k8s_storage - henk52/knowledgesharing GitHub Wiki

Kubernetes storage

Introduction

Purpose

References

need iscsi admin nfs core lib or something

Vocabulary

Longhorn

open issues for Longhorn

  • how is data replicated
  • when is data replicated
  • what happens when a pod is started on a node without a replica
  • how to add prometheus scraping of longhorn metrics
  • which engine version are we using? v1 or v2?
  • how does a container/pod talk to a volume in general?

Longhorn overview

  • Architecture and Concepts

  • UI

  • Engine

    • The Longhorn Engine always runs in the same node as the Pod that uses the Longhorn volume.
      • The engine synchronously replicates the volume across the multiple replicas stored on multiple nodes.
    • v2 is a Storage Performance Development Kit - SPDK
    • Each engine manages one volume.
  • The Longhorn CSI driver takes the block device, formats it, and mounts it on the node. Then the kubelet bind-mounts the device inside a Kubernetes Pod. This allows the Pod to access the Longhorn volume.

  • A Longhorn volume itself cannot shrink in size if you’ve removed content from your volume.

    • For example, if you create a volume of 20 GB, used 10 GB, then removed the content of 9 GB, the actual size on the disk would still be 10 GB instead of 1 GB.
  • It seems the engine is paused when creating a new replica How New Replicas are Added

Installing longhorn

Pre-requirements for installation

Requirements Quick Installation

On all nodes:

  • open-iscsi
  • iscsid daemon
  • NFSv4 client
  • disk fmt xfs or ext4
  • bash, curl, findmnt, grep, awk, blkid, lsblk
  • Mount propagation must be enabled.

ansible-playbook playbooks/longhorn_requirements.yml -u root -b -v -i kubespray/inventory/test/hosts.yml --private-key=~/.ssh/test_ops

---
- name: Configure longhorn requirements
  hosts: all
  become: yes

  tasks:
    - name: Install iscsi-initiator-utils without scripts
      ansible.builtin.shell:
        cmd: dnf --setopt=tsflags=noscripts install -y iscsi-initiator-utils
      args:
        creates: /usr/sbin/iscsiadm

    - name: Generate iSCSI initiator name
      ansible.builtin.command:
        cmd: /sbin/iscsi-iname
      register: iscsi_initiator_name
      changed_when: false

    - name: Configure initiator name
      ansible.builtin.copy:
        content: "InitiatorName={{ iscsi_initiator_name.stdout }}\n"
        dest: /etc/iscsi/initiatorname.iscsi
        owner: root
        group: root
        mode: '0600'

    - name: Enable and start iscsid service
      ansible.builtin.systemd:
        name: iscsid
        enabled: yes
        state: started

    - name: Load iscsi_tcp kernel module
      community.general.modprobe:
        name: iscsi_tcp
        state: present

    - name: Install nfs-utils
      ansible.builtin.dnf:
        name: nfs-utils
        state: present

    - name: Load nfs kernel module
      community.general.modprobe:
        name: nfs
        state: present

    - name: Install Cryptsetup
      ansible.builtin.dnf:
        name: cryptsetup
        state: present

    - name: Install Device Mapper Userspace Tool
      ansible.builtin.dnf:
        name: device-mapper
        state: present

    - name: Load dm_crypt kernel module
      community.general.modprobe:
        name: dm_crypt
        state: present
~/longhornctl check preflight
INFO[2026-01-30T20:20:45+08:00] Initializing preflight checker               
INFO[2026-01-30T20:20:45+08:00] Cleaning up preflight checker                
INFO[2026-01-30T20:20:45+08:00] Running preflight checker                    
WARN[2026-01-30T20:21:19+08:00] Failed to get pod container log               container=output-longhornctl error="Get \"https://10.26.101.186:10250/containerLogs/longhorn-system/longhorn-preflight-checker-2wjrv/output-longhornctl\": dial tcp 10.26.101.186:10250: i/o timeout" kind=DaemonSet name=longhorn-preflight-checker namespace=longhorn-system pod=longhorn-preflight-checker-2wjrv
WARN[2026-01-30T20:21:49+08:00] Failed to get pod container log               container=output-longhornctl error="Get \"https://10.26.101.71:10250/containerLogs/longhorn-system/longhorn-preflight-checker-6f7xl/output-longhornctl\": dial tcp 10.26.101.71:10250: i/o timeout" kind=DaemonSet name=longhorn-preflight-checker namespace=longhorn-system pod=longhorn-preflight-checker-6f7xl
WARN[2026-01-30T20:22:19+08:00] Failed to get pod container log               container=output-longhornctl error="Get \"https://10.26.101.174:10250/containerLogs/longhorn-system/longhorn-preflight-checker-d66tt/output-longhornctl\": dial tcp 10.26.101.174:10250: i/o timeout" kind=DaemonSet name=longhorn-preflight-checker namespace=longhorn-system pod=longhorn-preflight-checker-d66tt
INFO[2026-01-30T20:22:20+08:00] Retrieved preflight checker result:
node1:
  info:
  - '[KubeDNS] Kube DNS "coredns" is set with 2 replicas and 2 ready replicas'
  - '[IscsidService] Service iscsid is running'
  - '[MultipathService] multipathd.service is not found (exit code: 4)'
  - '[MultipathService] multipathd.socket is not found (exit code: 4)'
  - '[NFSv4] NFS4 is supported'
  - '[Packages] nfs-utils is installed'
  - '[Packages] iscsi-initiator-utils is installed'
  - '[Packages] cryptsetup is installed'
  - '[Packages] device-mapper is installed'
  - '[KernelModules] nfs is loaded'
  - '[KernelModules] iscsi_tcp is loaded'
  - '[KernelModules] dm_crypt is loaded' 
INFO[2026-01-30T20:22:20+08:00] Cleaning up preflight checker                
INFO[2026-01-30T20:22:20+08:00] Completed preflight checker       
  • TODO figure out why this changes every run at /etc/iscsi/initiatorname.iscsi and what impact it might have on a reboot.

Longhorn configurations

Installing longhorn via helm

  • Install with Helm

  • helm repo add longhorn https://charts.longhorn.io

  • helm repo update

  • export KUBECONFIG=~/.kube/longhorn_test

  • helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace --version 1.11.0

WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/heko/.kube/longhorn_test
E0206 17:02:27.717677 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
...
E0206 17:02:33.896787 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:33.980648 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:34.147621 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:34.225116 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:34.386397 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:34.470796 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:34.646416 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:34.707847 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:34.872312 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:34.947485 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:35.102502 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:35.178767 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:35.344459 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:35.407633 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:35.565732 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:35.640357 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:35.801318 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:35.881761 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:36.036600 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:36.117715 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:36.320326 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:36.371316 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:36.534127 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:36.601262 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:36.764860 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:36.841641 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:37.014472 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:37.080299 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:37.242986 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:37.308348 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:37.674044 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:37.739585 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:37.905393 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:37.975335 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:38.141388 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:38.216406 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:38.387204 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:38.452441 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:38.620645 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:38.687255 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:38.887336 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:38.964706 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:39.158787 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:39.234262 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:39.421942 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:39.491892 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:39.686965 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:39.756624 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:40.398356 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:40.469471 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:40.646569 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:40.724785 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:40.910783 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:40.965578 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:41.220171 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:41.281784 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:43.740110 2319823 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0206 17:02:43.811645 2319823 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
NAME: longhorn
LAST DEPLOYED: Fri Feb  6 17:02:27 2026
NAMESPACE: longhorn-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Longhorn is now installed on the cluster!

Please wait a few minutes for other Longhorn components such as CSI deployments, Engine Images, and Instance Managers to be initialized.

Visit our documentation at https://longhorn.io/docs/

Troubleshooting longhorn

Multi-Attach error for volume

this is shown for about a minute thne it comes back.

Multi-Attach error for volume "pvc-ee166920-785f-4312-b1c8-1a9965457c8a" Volume is already exclusively attached to one node and can't be attached to another. (40s)