Kdump_for_Linux_diskless_nodes - xcat2/xcat-core GitHub Wiki
Table of Contents
{{:Design Warning}}
The following sections are for the internal code changes.
Put the dump attribute to the linuximage schema. The user can use the chdef command to set/change the dump attribute for the image.
Disable the kdump service by default.
chroot $rootimg_dir chkconfig kdump off
Create one fake command (fsck.nfs) which always return true, if "fsck.nfs" doesn't exist in the root image.
Update code for
nodeset <noderange> osimage=<osimagename>
If the dump attribute is set for the corresponding image, then put the kernel parameter
crashkernel=128M@32M
to the boot config file. For the platforms using "yaboot", the config file is
/tftpboot/etc/<nodename>
, and then append another kernel parameter
dump=<dump value>
When the node is booting up, The enablekdump postscipt is used to start the kdump service; for RHEL6, it also do some workaround to generate the initial ramdisk for kdump. In the enablekdump postscript, /proc/kcmdline will be parsed, if dump= is found, its value will be parsed, and update the "/etc/kdump.conf" file. After the /etc/kdump.conf file is updated, the kdump service should be started by calling the command:
/etc/init.d/kdump start
For SLES11, it alse need workaround to generate the inital ramdisk for kdump. In the enablekdump postscript, /proc/kcmdline will be parsed, if dump= is found, its value will be parsed, and update the "/etc/sysconfig/kdump" file. After the "/etc/sysconfig/kdump" file is updated, the kdump service should be started by calling the command:
/etc/init.d/boot.kdumpstart
Before kdump service is started the NFS directory is mounted to the /var/tmp which is used as a temp directory for the mkdumprd command to generate the intial ramdisk for kdump. The NFS directory is read-writeable. The $xcatmaster:/install/kdump/tmp will be created when the xCAT package is installed, since the /install directory is exported by default, the $xcatmaster:/install/kdump/tmp directory is read-writeable, too. After the kdump service is started successfully, this NFS directory will be umounted from the /var/tmp directory, so this workaround won't affect the running of the node.
For rhels6.1 the kdump service needs /tmp instead of /var/tmp for this workaround.
The link_delay = 180 is added to the /etc/kdump.conf in the enablekdump postscript. Some network cards take a long time to initialize, and some spanning tree enabled networks do not transmit user traffic for long periods after a link state changes. This optional parameter defines a wait period after a link is activated in which the initramfs will wait before attempting to transmit user data.
On SLES the boot.kdump service is configured via /etc/sysconfig/kdump file. The boot.kdump under /etc/init.d will call mkdumprd -K "$kdump_kernel" -I "$kdump_initrd" -q to create the initrd(call it kdumpinit) which will be used by the kdump. The mkdumprd will call /sbin/mkinitrd to create the kdumpinit. (the mkinitrd only work for diskfull install, it did not consider the diskless install scenario). The /sbin/mkinitrd runs all of the shell script under /lib/mkinitrd/setup to generate the kdumpinit(will pack all scripts under /lib/mkinitrd/boot into the kdumpinit). To simulate a crash do:
echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger
The kdumpinit generated by /sbin/mkinitrd contains all shell scripts under /lib/mkinitrd/boot. All these scripts will be found in the init. There are two special scripts 83-mount.sh and 84-remount.sh. 83-mount.sh is used to mount and check the root device, 84-remount.sh is used to mount the root file system and run the init under the root file system instead of the normal init binary. This is the reason of this problem. For a diskless install server, the root file system is tmpfs and there is no corresponding device, so the hanging error will appear when running 83-mount.sh. If dumping to a remote server, the root file system is useless, only initrd is enough. There is no need to pack these two scripts into the initrd. The around is change these two script names to avoid packing into the initrd. When the initrd created the names are changed back. There is no root device discovering and checking progress so the script 91-kdump.sh can run correctly and the dump is successful.
For hirarchical diskless environment, the /install directory of the Service Node is mounted from the Management Node. When the node is starting up, the $xcatmaster:/install/kdump/tmp directory cannot be mounted because NFS denies re-mount action. How can we do for such a scenario?
xCAT/xCAT.spec
perl-xCAT/xCAT/Schema.pm
xCAT-server/share/xcat/netboot/rh/genimage
xCAT-server/share/xcat/netboot/add-on/statelite/rc.statelite
xCAT-server/lib/xcat/plugins/anaconda.pm
xCAT-server/lib/xcat/plugins/sles.pm
xCAT/postscripts/enablekdump
- Required reviewers: Bruce Potter
- Required approvers: Bruce Potter
- Database schema changes: N/A
- Affect on other components: N/A
- External interface changes, documentation, and usability issues: N/A
- Packaging, installation, dependencies: N/A
- Portability and platforms (HW/SW) supported: N/A
- Performance and scaling considerations: N/A
- Migration and coexistence: N/A
- Serviceability: N/A
- Security: N/A
- NLS and accessibility: N/A
- Invention protection: N/A