Installation and usage Script Only - ricniew/check-itm-agent-responsiveness GitHub Wiki

1. INSTALLATION

2. CONSIDERATIONS

2.1 Interpreting output

2.2 Run time

3 USAGE

3.1 Syntax

3.2 Sample executions

3.3 Sample output

4 TROUBLESHOOTING and files created

1. INSTALLATION

  1. If you did not done it already download the latest Source code zip/tar.gz (check-itm-agent-responsiveness-01.21.zip/tar.gz) release of this tool.

  2. The zip downloaded contains the following files:

    • Readme: README.md
    • Agentbuilder Custom Agent install packages smai-agent_responsiveness-[version].[zip,tgz]
    • Agentbuilder project K55.01.21.00.00.zip
  3. Extract check_resp.pl from the smai-agent_responsiveness-01.21.00.00.[zip,tgz] archive

    • Windows: open the zipped folder with File Explorer, navigate to "..\smai-agent_responsiveness-01.21.00.00.zip\ira\agent\common\scripts" directory, then drag the check_resp.pl file from the zipped folder to a new location.
    • Linux/Unix: issue
      1. If tar version is >= tar-1.14.90 use: tar -xf smai-agent_responsiveness-01.21.00.00.tgz ira/agent/common/scripts/check_resp.pl --strip-components 4
      2. Else use: tar -xf smai-agent_responsiveness-01.21.00.00.tgz ira/agent/common/scripts/check_resp.pl --strip-path 4

2. CONSIDERATIONS

2.1 Interpreting output

Please note that responsive agents are not reported to STDOUT. The output of the procedure for the not-responsive agents is the following (output files are described here files created ):

ManagedSystemName       ProductCode StatusB ResponseStatus     StatusA  ManagingSystem
--------------          ----------- ------- ----------------   -------  --------------
itaipu000:LZ            ;LZ         ;Y      ;NO Response       ;Y       ;HUB_TEMSLAB
krakow111:LZ            ;LZ         ;Y      ;NO Response       ;N       ;HUB_TEMSLAB

Where:

    ManagedSystemName = Managed System Name

    ProductCode = Product Code

    StatusB = Status Before Real Time Date Request (Y= on line; N= offline)

    ResponseStatus = Responsiveness Status

    StatusA = Status After Real Time Date Request (Y= on line; N= offline)

    ManagingSystem = TEMS agent is connected with

When StatusA= "N" and ResponseStatus= "NO response" this could have the following reasons:

  1. In big environments with thousand of agents it is normal that agents are stopped (for example because of a maintenance window). In this specific case it could be that just after the real time data request the agent was stopped manually, hence no data was send back to the TEMS. The tool assumes that agent is not responsive and no further checks are performed (to reduce the run time). Therefore you need to execute this script twice: start, wait 15 min and execute it again. Then compare both outputs and mark only those agent as not-responsive which are reported in both runs.

  2. Similar results could be seen during TEMS restart and/or failback of agents from the failover to primary RTEMS. Therefore you need to know if there are any running housekeeping activities on your ITM Server before running this script.

  3. Agent timed out after data was requested. Here no further actions are required. The MS-OFFLINE situation will trigger this event and your defined "Agent Offline Process" takes place.

Conclusion:

You may consider agents as not-responsive only when StatusA= "Y" and ResponseStatus= "NO response". Eventually further selective analysis can be performed in case of StatusA="N" (for example when agent was not stopped manually).

2.2 Run time

Depending on the parameters used and the amount of existing agents it can take up to some minutes until procedure is finished. If you not set the "-n" parameter (specific TEMS Node ID) all agent types set ("-t" parameter) are checked for the entire environment (!connected to all TEMS). It is recommended to start the script first for a particular TEMS. In this case it will finished within 1-2 minutes (it could be seconds it depends on the network bandwidth and especially on the amount of agents connected to this TEMS). During my test it took about 60 seconds for a TEMS with 1000 agents (nt,lz) connected. Please also note that if all agents are responsive the script finishes much quicker. This because TEMS is waiting one minute for the agents to respond (see paramter -o : SQL Timeout value (default is 60 = 1min))

3. USAGE

3.1 Syntax

check_resp.pl {-s host} [-p port] {-u username} {-w userpass} {-n temsname} {-t agt type} [-l dir] [-o timeout] [-pr] [-h]
     -s  : Protocol and Hostname or IP address of the SOAP HUB
          (e.g. HTTP://host, HTTP://10.12.33.123). Only HTTP and HTTPS are supported..
     -p  : Port of the SOAP HUB. (default is 1920)
     -u  : Soap user
     -w  : Soap user pw
     -n  : TEMS name you want to check agents on (RTEMS_01).
              (by default all agents defined by the "-t" parameter are checked on all existing TEMS)
     -t  : Agent type (e.g "ux lz" (no default set)
     -l  : Log file home directory (Default is ITMHOME/logs)
     -o  : SQL Timeout value (default is 60 = 1min)
     -pr : Print results to STDOUT (not set by default)
     -h  : Display help

3.2 Sample executions

  • Verify all OS agent's health (nt ux lz). This would be the one mostly used.

    check_resp.pl -s http://hubtems -p 1920 -u sysadmin -t nt ux lz
    
  • Verify all Log file agent's health (lo) and print results to STDOUT

    check_resp.pl -s localhost -p 1920 -u sysadmin -t lo -pr
    
  • Verify all OS agent's health (nt ux lz) on a specific remote TEMS

    check_resp.pl -s localhost -u sysadmin -w sysadpw -t nt ux lz -pr -n REMOTE_TEMS01
    
  • Verify all OS agent's health (nt ux lz) and indicate directory were output to save

    check_resp.pl -s localhost -u sysadmin -w sysadpw -t nt ux lz -l D:\IBM\scripts
    
  • Verify all OS agent's health (nt ux lz) and set HUBs SOAP portnumber to use

    check_resp.pl -s https://10.23.144.213 -p 3661 -u sysadmin -w sysadpw -t ux lz -o 30
    

3.3 Sample output

sles1164:/opt/IBM/ITM/logs # check-resp.pl -s 192.168.65.111 -t lz nt -u sysadmin -w sysadmpw -l /opt/IBM/ITM/logs -pr
2019-02-05.172337 INFO: Procedure "/tmp/test.pl" Version V3.5 started. LOG Home is /opt/IBM/ITM/logs/
2019-02-05.172337 INFO: Sub_AgentMslist - Get existing MSLs
2019-02-05.172337 INFO: Sub_Get_TEMSes - Get list of TEMS servers
2019-02-05.172337 INFO: Main - Argument "-n" not set. Selected agents on all TEMS will be checked.
2019-02-05.172337 INFO: Sub_AgentStatus - Get Agents status
2019-02-05.172337 INFO: Main - Get data for all agents of type "*LINUX_SYSTEM" , product code = LZ
2019-02-05.172337 INFO: Main - On HUB_TEMSLAB
2019-02-05.172337 INFO: Main - On REMOTE_LAB
2019-02-05.172337 INFO: Main - Get data for all agents of type "*NT_SYSTEM" , product code = NT
2019-02-05.172337 INFO: Main - On HUB_TEMSLAB
2019-02-05.172337 INFO: Main - On REMOTE_LAB
2019-02-05.172337 WARNING: Main -  Sub_CreatePaylaodGetDataFromTEMA did not returned any data for REMOTE1_LAB
2019-02-05.172337 INFO: Sub_AgentStatus - Get Agents status
2019-02-05.172337 INFO: Sub_AnalyseOutputs - Start analysis.. 
2019-02-05.172337 INFO: Sub_AnalyseOutputs - Analysing responsive agents... 
2019-02-05.172337 INFO: Sub_AnalyseOutputs - Analysing if NOT responsive... 
itaipu000:LZ                     ;LZ;Y;NO Response   ;Y;HUB_TEMSLAB
krakow111:LZ                     ;LZ;Y;NO Response   ;Y;HUB_TEMSLAB
niagara12:LZ                     ;LZ;Y;NO Response   ;Y;HUB_TEMSLAB
wisla1223:LZ                     ;LZ;Y;NO Response   ;Y;REMOTE_LAB
wisla0011:LZ                     ;LZ;Y;NO Response   ;Y;REMOTE_LAB
iguazu124:LZ                     ;LZ;Y;NO Response   ;N;REMOTE_LAB
2019-02-05.172337 INFO: Sub_AnalyseOutputs - Counting offline... 
2019-02-05.172337 INFO: Sub_AnalyseOutputs - Result NOT Responsive = 6
2019-02-05.172337 INFO: Sub_AnalyseOutputs - Result OFF-LINE       = 0
2019-02-05.172337 INFO: Sub_AnalyseOutputs - Result Responsive     = 3
2019-02-05.172337 INFO: Logs created:
2019-02-05.172337 INFO:    /opt/IBM/ITM/logs/checkresp.messages-0.log
2019-02-05.172337 INFO:    /opt/IBM/ITM/logs/checkresp.summary.log 
2019-02-05.172337 INFO:    /opt/IBM/ITM/logs/checkresp.responsive.log 
2019-02-05.172337 INFO:    /opt/IBM/ITM/logs/checkresp.notresponsive.log 
2019-02-05.172337 INFO: Sub_AnalyseOutputs - Finished analysing... RC 0, -0 
2019-02-05.172337 INFO: Message log created:
2019-02-05.172337 INFO:   /opt/IBM/ITM/logs/checkresp.messages-0.log 
2019-02-05.172337 INFO: Procedure "check-resp.pl" end

4. TROUBLESHOOTING and files created

Check logs or run the procedure with "-pr" argument again (messages are written to STDOUT) and check message flow for any PERL or other errors reported.

If you get the following error while executing the script on Linux/Unix:

bash: /opt/IBM/ITM/lx8266/55/bin/check_resp.pl: /usr/bin/perl^M: bad interpreter: No such file or directory

Please do the following: sed -e "s/^M//" scriptname > newscriptname

Or open script with vi and do the following: :e ++ff=unix :set list :%s/^M//g # To enter ^M, type CTRL-V, then CTRL-M. That is, hold down the CTRL key then press V and M in succession.

Alternatively you can execute the script as shown below, that way you will not get the shell related error: /usr/bin/perl /opt/IBM/ITM/lx8266/55/bin/check_resp.pl ...

Files created: In case you do not use "-n" parameter (check is done for the all agents configured in "-t" parameter for all TEMS), the files created are :

File Description
LOGDIR/checkresp.messages-0.log Procedure message flow
LOGDIR/checkresp.messages-1.log Procedure message flow previous run
LOGDIR/checkresp.summary.log Summary results
LOGDIR/checkresp.responsive.log List of responsive agents
LOGDIR/checkresp.notresponsive.log List of not-responsive agents

In case you check all agents configured for a specific TEMS (TEMS Node Id) set in "-t" parameter, files created are:

File Description
LOGDIR/checkresp.[TEMS Node Id].messages-0.log Procedure message flow
LOGDIR/checkresp.[TEMS Node Id].messages-1.log Procedure message flow previous run
LOGDIR/checkresp.[TEMS Node Id].summary.log Summary results
LOGDIR/checkresp.[TEMS Node Id].responsive.log List of responsive agents
LOGDIR/checkresp.[TEMS Node Id].notresponsive.log List of not-responsive agents