Terascala Management Server - shawfdong/hyades GitHub Wiki

The Terascala Management Server (tms) monitors and manages the Terascala Lustre Storage system. This should not be confused with the Lustre Management Server (MGS), which stores configuration information for all the Lustre file systems in a cluster and provides this information to other Lustre components. In the Terascala Lustre Storage, Lustre MGS, along with Lustre Management Servers (MDS), run on the Terascala Metadata Servers.

The Terascala Management Server (tms) is a Dell PowerEdge R210 II server. It is equipped with a quad-core Intel Sandy Bridge Xeon processor E3-1230 at 3.20GHz and 8GB memory.

Table of Contents

Network

There are 2 Gigabit Ethernet interfaces on tms:

# lspci 
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet (rev 20)
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet (rev 20)
Interface IP Address Netmask Subnet
eth0 10.6.8.2 255.255.0.0 Hyades Private GbE
eth1 192.168.3.1 255.255.255.0 Management #1
eth1:0[1] 192.168.4.1 255.255.255.0 Management #2

Terascala uses the init script /etc/init.d/network_config (written in Perl) to generate network configuration scripts (/etc/sysconfig/network-scripts/ifcfg-eth*), from properties file /usr/local/terascala/etc/allNetwork.properties.

Storage

The SAS controller appears to be an LSI SAS 2008:

# lspci
01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)

One 2TB hard drive is attached to the SAS controller:

# lsblk
NAME                    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                       8:0    0  1.8T  0 disk 
|-sda1                    8:1    0 62.7M  0 part /bootcom
|-sda2                    8:2    0 62.8M  0 part 
|-sda3                    8:3    0 62.8M  0 part /boot
`-sda4                    8:4    0  1.8T  0 part 
  |-rootVG-A (dm-0)     253:0    0   40G  0 lvm  
  |-rootVG-B (dm-1)     253:1    0   40G  0 lvm  /
  |-rootVG-data (dm-2)  253:2    0  1.7T  0 lvm  /usr/local/terascala/data
  |-rootVG-crash (dm-3) 253:3    0   10G  0 lvm  
  `-rootVG-swap (dm-4)  253:4    0    2G  0 lvm  [SWAP]
So it seems that sda2 & sda3 are /boot partitions (only one is mounted and active); and dm-0 & dm-1 are root partitions (only one is mounted and active). But why is there a /bootcom partition? It seems that Terascala did something unique! They put the GRUB[2] configuration file in one partition /bootcom (/bootcom/grub/menu.lst):
#
# Version:        3.1.10-0
# Boot Location:  B
#

serial --unit=1 --speed=115200
terminal --timeout=10 console serial
default 1
timeout 10
fallback 2

title vmlinuz-2.6.32-279.19.1.el6_lustre.2.1.5_1.0.7-SERIAL (B)
root (hd0,2)
kernel /vmlinuz-2.6.32-279.19.1.el6_lustre.2.1.5_1.0.7 root=/dev/mapper/rootVG-B console=ttyS1,115200n8 elevator=deadline selinux=0
initrd /initrd-2.6.32-279.19.1.el6_lustre.2.1.5_1.0.7
boot

title vmlinuz-2.6.32-279.19.1.el6_lustre.2.1.5_1.0.7-CONSOLE (B)
root (hd0,2)
kernel /vmlinuz-2.6.32-279.19.1.el6_lustre.2.1.5_1.0.7 root=/dev/mapper/rootVG-B elevator=deadline selinux=0
initrd /initrd-2.6.32-279.19.1.el6_lustre.2.1.5_1.0.7
boot

title Boot Other Location
configfile /grub/menu.lst.other

but put the kernel image and initrd (initial ramdisk) in a separate partition /boot (root (hd0,2)). Note this appears to be a clever way of updating the system! For example, the kernel initially running on tms was 2.6.18-308.4.1.el5_lustre.1.8.8_1.3.4; and kernel image was (and still is) in sda2 and root partition was rootVG-A (dm-0). tms was updated in May 2014. The new kernel image (2.6.32-279.19.1.el6_lustre.2.1.5_1.0.70) was installed in sda2. With this scheme, they were able to activate the new kernel and new root partition (rootVG-B) by simply editing the GRUB configuration file (/bootcom/grub/menu.lst) while preserving the old kernel and old root partition!

Terascala created an LVM Physical Volume (PV) on /dev/sda4, then a Virtual Group (VG) on the PV[3][4]:

# file -s /dev/sda4
/dev/sda4: LVM2 (Linux Logical Volume Manager) , UUID: EpapbtsYfae2hP8Lc6IYgOoYF9bQTlq

# pvdisplay 
  --- Physical volume ---
  PV Name               /dev/sda4
  VG Name               rootVG
  PV Size               1.82 TiB / not usable 1.70 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              476741
  Free PE               9125
  Allocated PE          467616
  PV UUID               Epapbt-sYfa-e2hP-8Lc6-IYgO-oYF9-bQTlq7

# vgdisplay

They then created 5 Logical Volumes (LV) on the VG:

# lvdisplay -C    
  LV    VG     Attr      LSize  Pool Origin Data%  Move Log Cpy%Sync Convert
  A     rootVG -wi-a---- 40.00g                                             
  B     rootVG -wi-ao--- 40.02g                                             
  crash rootVG -wi-a---- 10.00g                                             
  data  rootVG -wi-ao---  1.69t                                             
  swap  rootVG -wi-ao---  1.98g

Note Device Mapper is the kernel driver that provides a framework for volume management, and is the foundation for LVM:

# ls -l /dev/mapper/rootVG-A
lrwxrwxrwx 1 root root 7 Feb 22 18:04 /dev/mapper/rootVG-A -> ../dm-0
# ls -l /dev/rootVG/A
lrwxrwxrwx 1 root root 7 Feb 22 18:04 /dev/rootVG/A -> ../dm-0

SLURM

Curiously, Terascala installed SLURM (Simple Linux Utility for Resource Mangement) 2.6.2 on tms. However, they didn't specify ControlMachine in the configuration file /usr/local/etc/slurm.conf; so the init script /etc/init.d/slurm fails to start. But munged and slurmdbd get started nonetheless. Although the commands sacct and sreport will fail with the fatal error ControlMachine not specified, one can still access the MySQL database for slurmdbd directly (the username and password are listed in /usr/local/etc/slurmdbd.conf):

$ mysql -u root -p
Enter password:
mysql> use slurm_acct_db;

mysql> show tables;
+-------------------------+
| Tables_in_slurm_acct_db |
+-------------------------+
| acct_coord_table        |
| acct_table              |
| cluster_table           |
| qos_table               |
| table_defs_table        |
| txn_table               |
| user_table              |
+-------------------------+
7 rows in set (0.00 sec)

mysql> select * from acct_table;
+---------------+------------+---------+------+----------------------+--------------+
| creation_time | mod_time   | deleted | name | description          | organization |
+---------------+------------+---------+------+----------------------+--------------+
|    1400254337 | 1400254337 |       0 | root | default root account | root         |
+---------------+------------+---------+------+----------------------+--------------+
1 row in set (0.00 sec)

mysql> quit
Bye

Not sure why Terascala put a broken SLURM on tms.

PHPDevShell

TeraView (http://10.6.8.2/) is a PHPDevShell web application. The DocumentRoot is /usr/local/terascala/phpdevshell. There are 7 tabs on the TeraView web application:

Dashboard
http://10.6.8.2/index.php?m=3751860488, which contains an IFRAME whose SRC URL is http://10.6.8.2/dashboard. /dashboard is aliased to /usr/local/terascala/home on tms (see /etc/httpd/conf/httpd.conf);
Alerts
http://10.6.8.2/index.php?m=1386757813, which contains an IFRAME whose SRC URL is http://10.6.8.2/nagios/cgi-bin/status.cgi?hostgroup=all&style=summary&servicestatustypes=23&noheader. /nagios/cgi-bin is aliased to /usr/local/nagios/sbin on tms;
Management Console
http://10.6.8.2/tmc/tmc.php
Analytics
http://10.6.8.2/index.php?m=181699303, which contains an IFRAME whose SRC URL is http://10.6.8.2/gweb?tab=v. /gweb is aliased to /var/www/html/gweb on tms;
Administration
http://10.6.8.2/index.php?m=4116449254, which contains an IFRAME whose SRC URL is http://10.6.8.2/dashboard/admin.php. /dashboard is aliased to /usr/local/terascala/home on tms.
Nagios Administration
http://10.6.8.2/index.php?m=1933646959, which contains an IFRAME whose SRC URL is http://10.6.8.2/nagios/index.php. /nagios is aliased to /usr/local/nagios/share on tms.
Terascala
http://www.terascala.com/. Unfortunately the site is down, because Terascala is out of business.

Ganglia

Terascala uses Ganglia to monitor the Terascala Lustre Storage.

gmetad
The Ganglia Meta Daemon (gmetad) runs on tms. It periodically polls a collection of child data sources, parses the collected XML, saves all numeric, volatile metrics to round-robin databases and exports the aggregated XML over a TCP socket to clients. On tms, the configuration file for gmetad is /usr/local/terascala/etc/ganglia/gmetad.conf; and the RRD data are stored in /usr/local/terascala/data/gmetad/rrds.
gmond
A Ganglia Monitoring Daemon (gmond) also runs on tms. Its configuration file is /usr/local/terascala/etc/ganglia/gmond.conf
gweb
The Ganglia PHP Web Front-end (gweb) also runs on tms. Its URL is http://10.6.8.2/gweb/, which is aliased to /var/www/html/gweb on tms.
rrdcached
The data caching daemon for rrdtool (rrdcached) also runs on tms. rrdcached receives updates to existing RRD files, accumulates them and, if enough have been received or a defined time has passed, writes the updates to the RRD file.
/usr/bin/rrdcached -p /var/run/rrdcached/rrdcached.pid -z 30 \
-s ganglia -m 777 \
-l unix:/var/run/rrdcached/rrdcached.sock \
-s nagios -m 777 -P FLUSH,STATS,HELP \
-l unix:/var/run/rrdcached/rrdcached.limited.sock \
-b /usr/local/terascala/data/gmetad/rrds -B

Nagios

Another monitoring tool, Nagios, also runs on the Terascala Lustre Storage. Nagios offers both monitoring and alerting services. The main configuration file for Nagios is /usr/local/nagios/etc/nagios.cfg[5].

Misc

jee_server

jee_server is started by the init script /etc/init.d/jee_server:

/usr/local/terascala/isb_jee/bin/jee_server -D -l finest -p 10000 http://localhost:80/index.php?m=2609527728

jee_server tries to POST to http://localhost:80/index.php?m=2609527728. However, the URL doesn't exist and the Apache HTTP server returns a 404 (Not Found) response code.

Not sure what Terascale tried to accomplish with jee-server.

terascala-management-server

terascala-management-server is started by the init script /etc/init.d/terascala-management-server:

java \
  -Djava.security.egd=file:/dev/urandom \
  -Xmx1024m \
  -Djava.library.path=/usr/lib:/usr/lib64:/lib/lib64:/usr/tera/lib:/usr/local/lib:/usr/local/lib/ipmitool \
  -Dorg.newsclub.net.unix.library.path=/usr/tera/lib \
  -cp /usr/tera/lib/BladeManagerServer.jar:/usr/tera/lib/BuildProperties.jar:/usr/tera/lib/AntTar.jar:/usr/tera/lib/SNMP4J.jar:/usr/tera/lib/SNMP4J-agent.jar:/usr/tera/lib/SYMsdk.jar:/usr/tera/lib/xercesImpl.jar:/usr/tera/lib/activation.jar:/usr/lib/dns_sd.jar:/usr/local/lib/ipmitool/ipmitool.jar:/usr/tera/lib/mail.jar:/usr/tera/lib/commons-exec.jar:/usr/tera/lib/junixsocket-1.3.jar \
   com.terascala.manager.server.BladeManagerServer

Terascala Management Console

Terascala Management Console is a Java application (/usr/local/terascala/tmc/TerascalaManagementConsole.jar). One can start it as a Java Web Start application from a web browser (http://10.6.8.2/tmc/tmc.php).

Or one can download the jar file (http://10.6.8.2/tmc/TerascalaManagementConsole.jar) and the run the application:

$ java -cp TerascalaManagementConsole.jar com.terascala.manager.client.Splasher

CLI tools

There are CLI tools at:

  • /root/admin-bin
  • /root/bin
  • /usr/local/bin
  • /usr/local/terascala/bin

References

  1. ^ RHEL6 - Network Interface Alias
  2. ^ GNU GRUB Legacy Manual
  3. ^ Linux lvm - Logical Volume Manager
  4. ^ A Beginner's Guide To LVM
  5. ^ Nagios Core Documentation
⚠️ **GitHub.com Fallback** ⚠️