ConnectionDependencyAgent2 - TADDM/taddm-wiki GitHub Wiki

The topology builder agent named ConnectionDependencyAgent2 (CDA2 for short) can be the victim of performance issues and cause TADDM topology building (which occurs in the background) to be blocked almost entirely. This page is for tips and tricks while troubleshooting issues with this agent.

How the Agent Works

The main purpose of the agent is to query all the LogicalConnection components discovered since the last complete agent run, search for endpoints that match and then create dependency relationships between those endpoints. As far as defining dependencies, this agent is very important to TADDM.

What Can Go Wrong

Database performance issues can cause some major problems for this agent. If database maintenance is not done properly, specifically if runstats are not being collected or are being collected improperly, then this can cause the CDA2 agent to take a long time. The problem can be compounded by a lot of logical connections being discovered.

The problem can be compounded further by scheduled restarts of TADDM. The first thing that the agent does when it starts is look at it's last run time and uses that to collect the list of logical connections. That last run time only gets updated when the agent completes! So if the agent runs for 5 days, and then TADDM is restarted, the agent must start completely over. If weekly restarts are part of your process, and the agent is taking longer than 7 days to complete, essentially it will never complete.

Enabling TRACE logging

Set the following in collation.properties on the primary storage server.

com.collation.log.level.com.ibm.cdb.topomgr.topobuilder.agents.ConnectionDependencyAgent2=TRACE

This will cause trace logging on the agent class, but will not produce all the logging statements. The following properties will enable full tracing, but will also enable tracing for other topology agents.

com.collation.log.level.com.ibm.cdb.topomgr.topobuilder.agents.DependencyAgentBase=TRACE
com.collation.log.level.com.ibm.cdb.topomgr.topobuilder.agents.AgentBase=TRACE

You must restart the PSS in order for the log settings to take affect. Keep in mind that a TADDM restart also causes the agent to start completely over so you may not want to apply this change if you expect the processing to end soon. Then again you probably won't have any idea how close it is to completion without this log setting.

The Root of the Problem - Database Performance

The root of the problem is database performance. This needs to be addressed first and foremost. See the wiki page for Server Sizing and Tuning Guidelines and pay special attention to the database tuning sections and ensure they are followed. For Oracle specifically, ensure that the SGA values are properly set for automatic memory management and that the stats collection is being done properly.

Also ensure that proper database maintenance is being done on a regular basis.

Increasing Cache Size

APAR IV10245 was created against TADDM 7.2.1 to adjust the cache size that is used for results of searches. This can be increased to theoretically save searches against the database that have already been done. Set the following property to increase the cache size.

com.ibm.cdb.topomgr.topobuilder.agents.ConnectionDependencyAgent2.FromAppSocketCacheSize=60000

If there are underlying database performance problems this is not going to make a huge difference, but should slightly speed up a long running agent.

timeframeBunch Property

There is a property documented in the TADDM troubleshooting guide for memory management that will cause the storage of relationships to happen throughout, rather than right at the end. It is a helpful property to have enabled because the log will show the progress much more clearly. The property breaks up the query for logical connections into bunches according to millisecond timeframes. The 60 second value (60000 milliseconds) in the documentation is much too small of a bunch, so I recommend the following 1 hour bunch.

com.ibm.cdb.topomgr.topobuilder.agents.ConnectionDependencyAgent2.timeframeBunch=3600000

Warning! If you use the value recommended in the documentation (60000 milliseconds) this might actually cause the agent to run very long. With a small value, the loop will have to execute millions of times and potentially print a logging statement for each loop.

But wait! This property is actually broken in TADDM v7.3 including FP1 and FP2. If it is run with this property, a bunch of exceptions will be thrown and the agent will end unsuccessfully. This issue has been resolved in v7.3 FP3. However, in FP4 setting this property causes CDA2 to hang. You must contact IBM support to receive an efix for this issue before setting this property. The FP5 rebuild has resolved the hanging issue.

Skip Processing of Logical Connections

There might be a scenario where an environment is experiencing a performance issue and you just need the topology builder to continue, even if it means that some dependencies won't be created based off of discovered logical connections. If this is the case, it is possible to manually update the database table for the topology agents and update the lastRunTime value so that the agent won't attempt to process so much data. This is not supported and you do so at your own risk. Do not do this in a production environment.

ignoreLoopbackProcesses Property

To help avoid this problem in the future, ensure that the value for com.collation.platform.os.ignoreLoopbackProcesses is set to the default of true in collation.properties. If set to false, the number of logical connections to the loopback will build up in the database.

Healthcheck module

Download checkCDA2.py healthcheck module and place it under dist/lib/healthcheck in your TADDM installation. Now you can run bin/healthchech checkCDA2 to report the last runtime of the CDA2 agent. Example below.

$ healthcheck -u administrator -p collation checkCDA2
GROUP:  performance

****************************************************************************************************
**                                        Begin checkCDA2                                         **
**                                        ---------------                                         **
**                                                                                                **
**         This check runs a test to see if the ConnectionDependencyAgent2 (CDA2) is completing   **
**         It performs the following:                                                             **
**             - connects to taddm db                                                             **
**             - checks last run time for CDA2 agent                                              **
**                                                                                                **
**         The result is the last run time for the CDA2 agent                                     **
**                                                                                                **
**                                                                                                **
****************************************************************************************************
Label                                   LastRunTime
ConnectionDependencyAgent2 Mon Dec 17 12:00:38 2018