Terascala Management Server - shawfdong/hyades GitHub Wiki
The Terascala Management Server (tms) monitors and manages the Terascala Lustre Storage system. This should not be confused with the Lustre Management Server (MGS), which stores configuration information for all the Lustre file systems in a cluster and provides this information to other Lustre components. In the Terascala Lustre Storage, Lustre MGS, along with Lustre Management Servers (MDS), run on the Terascala Metadata Servers.
The Terascala Management Server (tms) is a Dell PowerEdge R210 II server. It is equipped with a quad-core Intel Sandy Bridge Xeon processor E3-1230 at 3.20GHz and 8GB memory.
There are 2 Gigabit Ethernet interfaces on tms:
# lspci 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet (rev 20) 02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet (rev 20)
Interface | IP Address | Netmask | Subnet |
---|---|---|---|
eth0 | 10.6.8.2 | 255.255.0.0 | Hyades Private GbE |
eth1 | 192.168.3.1 | 255.255.255.0 | Management #1 |
eth1:0[1] | 192.168.4.1 | 255.255.255.0 | Management #2 |
Terascala uses the init script /etc/init.d/network_config (written in Perl) to generate network configuration scripts (/etc/sysconfig/network-scripts/ifcfg-eth*), from properties file /usr/local/terascala/etc/allNetwork.properties.
The SAS controller appears to be an LSI SAS 2008:
# lspci 01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
One 2TB hard drive is attached to the SAS controller:
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 1.8T 0 disk |-sda1 8:1 0 62.7M 0 part /bootcom |-sda2 8:2 0 62.8M 0 part |-sda3 8:3 0 62.8M 0 part /boot `-sda4 8:4 0 1.8T 0 part |-rootVG-A (dm-0) 253:0 0 40G 0 lvm |-rootVG-B (dm-1) 253:1 0 40G 0 lvm / |-rootVG-data (dm-2) 253:2 0 1.7T 0 lvm /usr/local/terascala/data |-rootVG-crash (dm-3) 253:3 0 10G 0 lvm `-rootVG-swap (dm-4) 253:4 0 2G 0 lvm [SWAP]So it seems that sda2 & sda3 are /boot partitions (only one is mounted and active); and dm-0 & dm-1 are root partitions (only one is mounted and active). But why is there a /bootcom partition? It seems that Terascala did something unique! They put the GRUB[2] configuration file in one partition /bootcom (/bootcom/grub/menu.lst):
# # Version: 3.1.10-0 # Boot Location: B # serial --unit=1 --speed=115200 terminal --timeout=10 console serial default 1 timeout 10 fallback 2 title vmlinuz-2.6.32-279.19.1.el6_lustre.2.1.5_1.0.7-SERIAL (B) root (hd0,2) kernel /vmlinuz-2.6.32-279.19.1.el6_lustre.2.1.5_1.0.7 root=/dev/mapper/rootVG-B console=ttyS1,115200n8 elevator=deadline selinux=0 initrd /initrd-2.6.32-279.19.1.el6_lustre.2.1.5_1.0.7 boot title vmlinuz-2.6.32-279.19.1.el6_lustre.2.1.5_1.0.7-CONSOLE (B) root (hd0,2) kernel /vmlinuz-2.6.32-279.19.1.el6_lustre.2.1.5_1.0.7 root=/dev/mapper/rootVG-B elevator=deadline selinux=0 initrd /initrd-2.6.32-279.19.1.el6_lustre.2.1.5_1.0.7 boot title Boot Other Location configfile /grub/menu.lst.other
but put the kernel image and initrd (initial ramdisk) in a separate partition /boot (root (hd0,2)). Note this appears to be a clever way of updating the system! For example, the kernel initially running on tms was 2.6.18-308.4.1.el5_lustre.1.8.8_1.3.4; and kernel image was (and still is) in sda2 and root partition was rootVG-A (dm-0). tms was updated in May 2014. The new kernel image (2.6.32-279.19.1.el6_lustre.2.1.5_1.0.70) was installed in sda2. With this scheme, they were able to activate the new kernel and new root partition (rootVG-B) by simply editing the GRUB configuration file (/bootcom/grub/menu.lst) while preserving the old kernel and old root partition!
Terascala created an LVM Physical Volume (PV) on /dev/sda4, then a Virtual Group (VG) on the PV[3][4]:
# file -s /dev/sda4 /dev/sda4: LVM2 (Linux Logical Volume Manager) , UUID: EpapbtsYfae2hP8Lc6IYgOoYF9bQTlq # pvdisplay --- Physical volume --- PV Name /dev/sda4 VG Name rootVG PV Size 1.82 TiB / not usable 1.70 MiB Allocatable yes PE Size 4.00 MiB Total PE 476741 Free PE 9125 Allocated PE 467616 PV UUID Epapbt-sYfa-e2hP-8Lc6-IYgO-oYF9-bQTlq7 # vgdisplay
They then created 5 Logical Volumes (LV) on the VG:
# lvdisplay -C LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert A rootVG -wi-a---- 40.00g B rootVG -wi-ao--- 40.02g crash rootVG -wi-a---- 10.00g data rootVG -wi-ao--- 1.69t swap rootVG -wi-ao--- 1.98g
Note Device Mapper is the kernel driver that provides a framework for volume management, and is the foundation for LVM:
# ls -l /dev/mapper/rootVG-A lrwxrwxrwx 1 root root 7 Feb 22 18:04 /dev/mapper/rootVG-A -> ../dm-0 # ls -l /dev/rootVG/A lrwxrwxrwx 1 root root 7 Feb 22 18:04 /dev/rootVG/A -> ../dm-0
Curiously, Terascala installed SLURM (Simple Linux Utility for Resource Mangement) 2.6.2 on tms. However, they didn't specify ControlMachine in the configuration file /usr/local/etc/slurm.conf; so the init script /etc/init.d/slurm fails to start. But munged and slurmdbd get started nonetheless. Although the commands sacct and sreport will fail with the fatal error ControlMachine not specified, one can still access the MySQL database for slurmdbd directly (the username and password are listed in /usr/local/etc/slurmdbd.conf):
$ mysql -u root -p Enter password: mysql> use slurm_acct_db; mysql> show tables; +-------------------------+ | Tables_in_slurm_acct_db | +-------------------------+ | acct_coord_table | | acct_table | | cluster_table | | qos_table | | table_defs_table | | txn_table | | user_table | +-------------------------+ 7 rows in set (0.00 sec) mysql> select * from acct_table; +---------------+------------+---------+------+----------------------+--------------+ | creation_time | mod_time | deleted | name | description | organization | +---------------+------------+---------+------+----------------------+--------------+ | 1400254337 | 1400254337 | 0 | root | default root account | root | +---------------+------------+---------+------+----------------------+--------------+ 1 row in set (0.00 sec) mysql> quit Bye
Not sure why Terascala put a broken SLURM on tms.
TeraView (http://10.6.8.2/) is a PHPDevShell web application. The DocumentRoot is /usr/local/terascala/phpdevshell. There are 7 tabs on the TeraView web application:
- Dashboard
- http://10.6.8.2/index.php?m=3751860488, which contains an IFRAME whose SRC URL is http://10.6.8.2/dashboard. /dashboard is aliased to /usr/local/terascala/home on tms (see /etc/httpd/conf/httpd.conf);
- Alerts
- http://10.6.8.2/index.php?m=1386757813, which contains an IFRAME whose SRC URL is http://10.6.8.2/nagios/cgi-bin/status.cgi?hostgroup=all&style=summary&servicestatustypes=23&noheader. /nagios/cgi-bin is aliased to /usr/local/nagios/sbin on tms;
- Management Console
- Analytics
- http://10.6.8.2/index.php?m=181699303, which contains an IFRAME whose SRC URL is http://10.6.8.2/gweb?tab=v. /gweb is aliased to /var/www/html/gweb on tms;
- Administration
- http://10.6.8.2/index.php?m=4116449254, which contains an IFRAME whose SRC URL is http://10.6.8.2/dashboard/admin.php. /dashboard is aliased to /usr/local/terascala/home on tms.
- Nagios Administration
- http://10.6.8.2/index.php?m=1933646959, which contains an IFRAME whose SRC URL is http://10.6.8.2/nagios/index.php. /nagios is aliased to /usr/local/nagios/share on tms.
- Terascala
- http://www.terascala.com/. Unfortunately the site is down, because Terascala is out of business.
Terascala uses Ganglia to monitor the Terascala Lustre Storage.
- gmetad
- The Ganglia Meta Daemon (gmetad) runs on tms. It periodically polls a collection of child data sources, parses the collected XML, saves all numeric, volatile metrics to round-robin databases and exports the aggregated XML over a TCP socket to clients. On tms, the configuration file for gmetad is /usr/local/terascala/etc/ganglia/gmetad.conf; and the RRD data are stored in /usr/local/terascala/data/gmetad/rrds.
- gmond
- A Ganglia Monitoring Daemon (gmond) also runs on tms. Its configuration file is /usr/local/terascala/etc/ganglia/gmond.conf
- gweb
- The Ganglia PHP Web Front-end (gweb) also runs on tms. Its URL is http://10.6.8.2/gweb/, which is aliased to /var/www/html/gweb on tms.
- rrdcached
- The data caching daemon for rrdtool (rrdcached) also runs on tms. rrdcached receives updates to existing RRD files, accumulates them and, if enough have been received or a defined time has passed, writes the updates to the RRD file.
/usr/bin/rrdcached -p /var/run/rrdcached/rrdcached.pid -z 30 \ -s ganglia -m 777 \ -l unix:/var/run/rrdcached/rrdcached.sock \ -s nagios -m 777 -P FLUSH,STATS,HELP \ -l unix:/var/run/rrdcached/rrdcached.limited.sock \ -b /usr/local/terascala/data/gmetad/rrds -B
Another monitoring tool, Nagios, also runs on the Terascala Lustre Storage. Nagios offers both monitoring and alerting services. The main configuration file for Nagios is /usr/local/nagios/etc/nagios.cfg[5].
jee_server is started by the init script /etc/init.d/jee_server:
/usr/local/terascala/isb_jee/bin/jee_server -D -l finest -p 10000 http://localhost:80/index.php?m=2609527728
jee_server tries to POST to http://localhost:80/index.php?m=2609527728. However, the URL doesn't exist and the Apache HTTP server returns a 404 (Not Found) response code.
Not sure what Terascale tried to accomplish with jee-server.
terascala-management-server is started by the init script /etc/init.d/terascala-management-server:
java \ -Djava.security.egd=file:/dev/urandom \ -Xmx1024m \ -Djava.library.path=/usr/lib:/usr/lib64:/lib/lib64:/usr/tera/lib:/usr/local/lib:/usr/local/lib/ipmitool \ -Dorg.newsclub.net.unix.library.path=/usr/tera/lib \ -cp /usr/tera/lib/BladeManagerServer.jar:/usr/tera/lib/BuildProperties.jar:/usr/tera/lib/AntTar.jar:/usr/tera/lib/SNMP4J.jar:/usr/tera/lib/SNMP4J-agent.jar:/usr/tera/lib/SYMsdk.jar:/usr/tera/lib/xercesImpl.jar:/usr/tera/lib/activation.jar:/usr/lib/dns_sd.jar:/usr/local/lib/ipmitool/ipmitool.jar:/usr/tera/lib/mail.jar:/usr/tera/lib/commons-exec.jar:/usr/tera/lib/junixsocket-1.3.jar \ com.terascala.manager.server.BladeManagerServer
Terascala Management Console is a Java application (/usr/local/terascala/tmc/TerascalaManagementConsole.jar). One can start it as a Java Web Start application from a web browser (http://10.6.8.2/tmc/tmc.php).
Or one can download the jar file (http://10.6.8.2/tmc/TerascalaManagementConsole.jar) and the run the application:
$ java -cp TerascalaManagementConsole.jar com.terascala.manager.client.Splasher
There are CLI tools at:
- /root/admin-bin
- /root/bin
- /usr/local/bin
- /usr/local/terascala/bin