AdvancedConfiguration - googlegsa/manager.v3 GitHub Wiki

Advanced Configuration for Connector Manager v2.4.4 and Later

Connector Manager version 2.4.4 moves most of the configuration options that a Connector administrator may wish to set out of the web application bean definition file and into a separate application properties file.

This change makes the upgrade process much less painful for those administrators have have tailored the Connector Manager deployment to their enterprise. The web application bean definition file is redeployed during the upgrade process, overwriting any modifications that an administrator may have made. However, the web application properties file is not overwritten during an upgrade, so setting common configuration properties in that file preserves, preserves those customizations through future upgrades.

It is strongly suggested that the administrator only set advanced properties whose values differ from the default values mentioned below. This allows Google to tune the defaults in later releases to the benefit of all who had not explicitly overridden them.

Setting or Modifying Advanced Configuration Properties

To set or modify Connector Manager Advanced Configuration Properties, you must edit its applicationContext.properties file. This Java Properties file is a plain-text file in ISO-8859-1 character encoding, and must remain so when modified. The syntax for setting property values must conform the Java Properties specification. The Connector Manager supports only plain-text properties files, not XML-formatted properties files.

  1. Shutdown the Connector's Tomcat server.

  1. Make a backup copy of the file:
    $TOMCAT_HOME/webapps/connector-manager/WEB-INF/applicationContext.properties

  2. Edit the file:
    $TOMCAT_HOME/webapps/connector-manager/WEB-INF/applicationContext.properties

  3. Make the necessary modifications (see below) and save the file.

  4. Restart the Connector's Tomcat server.

  5. Examine the logs in $TOMCAT_HOME/logs, looking for any errors that may have been generated by mis-configured properties.

Note: $TOMCAT_HOME represents the Apache Tomcat installation directory. For Connectors installed using the Google Connector Installer (GCI), this would be the Tomcat directory in the Connector Installation.


Feed Connection Properties

gsa.feed.protocol

The gsa.feed.protocol property specifies the URL protocol for
the feed host on the GSA. The supported values are http and
https.
For example:
gsa.feed.protocol=http

Since: 2.8.0

gsa.feed.host

The gsa.feed.host property specifies the host IP address for the
feed host on the Google Search Appliance.
For example:
gsa.feed.host=172.24.2.0

gsa.feed.port

The gsa.feed.port property specifies the HTTP host port for the
feed host on the GSA.
For example:
gsa.feed.port=19900

gsa.feed.securePort

The gsa.feed.securePort property specifies the HTTPS host port
for the feed host on the GSA. This port will be used if the
gsa.feed.host property is set to https.
For example:
gsa.feed.securePort=19902

Since: 2.8.0

gsa.feed.validateCertificate

The gsa.feed.validateCertificate property specifies whether to
validate the GSA certificate when sending SSL feeds. If the GSA
certificate is installed in the Tomcat keystore, this should be
set to true, otherwise it must be set to false.
For example:
gsa.feed.validateCertificate=false

Since: 2.8.0

manager.locked

The manager.locked property is used to lock out the Admin Servlet
and prevent it from making further changes to the Feed Connection properties.
If it is set to true or missing the Servlet will
not be allowed to update the Feed Connection properties.

NOTE:This property will automatically be changed to true upon
successful update of the gsa.feed.host and gsa.feed.port when registering
a Connector Manager with a Google Search Appliance. Therefore, once the
Feed Connection properties are successfully updated by the Admin Servlet,
subsequent updates will be locked out until the flag is manually
reset to false. For more information, see
Changing the GSA Feed Host.

manager.locked=false

Feed Logging Properties

feedLoggingLevel

The feedLoggingLevel property controls the logging of the feed
record to a log file. The log record will contain the feed XML
without the content data. Set this property to ALL to enable feed
logging, OFF to disable. Customers and developers can use this
functionality to observe the feed record and metadata information
the connector manager sends to the Google Search Appliance.
The feed log contains most of the information feed to the Search
Appliance, but does not log the Document content.
For example:
feedLoggingLevel=ALL

feed.logging.FileHandler.pattern

The feed.logging.FileHandler.pattern property specifies the
location and naming convention used when generating feed logs.
The feed log filename pattern follows the
java.util.logging.FileHandler rules. The default pattern places feed logs is the logs directory of
the Tomcat installation for the Connector Manager, in files named
google-connectors.feed*.log.
For example:
feed.logging.FileHandler.pattern=/var/logs/connectors/acme-connectors.feed%g.log

feed.logging.FileHandler.limit

The feed.logging.FileHandler.limit property specifies an approximate
maximum size, in bytes, to any one feed log file before creating a new
feed log file. If this is zero, then there is no limit. The default limit is 50MB.
For example:
feed.logging.FileHandler.limit=0

feed.logging.FileHandler.count

The feed.logging.FileHandler.count property specifies how many feed
log files to cycle through. No more than this number of feed logs will be
maintained, with older logs being discarded as needes. The default feed
log count is 10.
For example:
feed.logging.FileHandler.limit=30

teedFeedFile

If you set the teedFeedFile property to the name of an existing
file, whenever the connector manager feeds content to the Search Appliance,
it will write a duplicate copy of the feed XML to the file specified by
the teedFeedFile property. Google Search Appliance customers and
third-party developers can use this functionality to observe the content
the connector manager sends to the Search Appliance and reproduce any
issue which may arise.
For example:
teedFeedFile=/tmp/connector/CMTeedFeedFile
NOTE: The teedFeedFile will contain all feed data sent to the
Search Appliance, including document content and metadata.
The teedFeedFile can therefore grow quite large very quickly.

Feed Content Properties

feed.timezone

The feed.timezone property defines the default time zone used
for Date metadata values for Documents. A null or empty string
indicates that the local time zone of the machine running the
Connector Manager should be used. The default feed time zone
is local time zone of the Connector Manager. Standard Java TimeZone identifiers may be used. For example:
  feed.timezone=America/Los_Angeles
If a standard TimeZone identifier is unavailable, then a custom
TimeZone identifier can be constructed as +/-hours[minutes] offset
from GMT. For example:
  feed.timezone=GMT+10    # GMT + 10 hours
feed.timezone=GMT+0630 # GMT + 6 hours, 30 minutes
feed.timezone=GMT-0800 # GMT - 8 hours, 0 minutes

feed.file.size

The feed.file.size property sets the target size, in bytes, of
an accumulated feed file. The Connector Manager tries to collect
many feed Documents into a single feed file to improve the
efficiency of sending data to the Google Search Appliance.
Specifying too small a value may result in many small feeds which
might overrun the GSA's feed processor. However, specifying too
large a feed size reduces concurrency and may result in OutOfMemory
errors in the Java VM, especially if using multiple Connector instances.
The default target feed size is 10MB, which will typically hold
50-100 fed documents.
feed.file.size=10485760

feed.document.size.limit

The feed.document.size.limit property defines the maximum
allowed size in bytes of a Document's content. Documents whose
content exceeds this size will still have metadata indexed,
however the content itself will not be fed. The default value is
30MB, the maximum file size accepted by the Google Search Appliance.
feed.document.size.limit=31457280

feed.backlog.floor, feed.backlog.ceiling, feed.backlog.interval

The Feed Backlog properties are used to throttle back the
document feed if the Search Appliance has fallen behind processing
outstanding feed items. The Connector Manager periodically polls the Search Appliance,
fetching the count of unprocessed feed items (the backlog count).
If the backlog count exceeds the ceiling value, feeding is paused.
Once the backlog count drops back down below the floor value, feeding
resumes.
# Stop feeding the GSA if its backlog exceeds this value.
feed.backlog.ceiling=10000
# Resume feeding the GSA if its backlog falls below this value.
feed.backlog.floor=1000
# How often to check for feed backlog (in seconds).
feed.backlog.interval=900

Traversal Properties

traversal.batch.size

The traversal.batch.size property defines the optimal number
of items to return in each repository traversal batch. The batch
size represents the size of the roll-back that occurs during a
failure condition. Batch sizes that are too small may incur
excessive processing overhead. Batch sizes that are too large
may produce OutOfMemory conditions within a Connector or result
in early termination of the batch if processing time exceeds the
travesal.time.limit. The default traversal batch size is 500 items.
For example:
traversal.batch.size=1000

traversal.poll.interval

The traversal.poll.interval property defines the number of
seconds to wait after a traversal of the repository finds no new
content before looking again. Short intervals allow new content
to be readily available for search, at the cost of increased
repository access. Long intervals add latency before new
content becomes available for search. By default, the Connector
Manager waits 5 minutes (300 seconds) before retraversing the
repository if no new content was found on the last traversal.
For example:
traversal.poll.interval=900

traversal.time.limit

The traversal.time.limit property defines the number of
seconds a traversal batch should run before gracefully exiting.
Traversals that exceed this time period risk cancelation.
The default time limit is 30 minutes (1800 seconds).
For example:
traversal.time.limit=3600

traversal.enabled

The traversal.enabled property is used to enable or disable
Traversals and Feeds for all connector instances in this
Connector Manager. Disabling Traversal would be desirable if
configuring a Connector Manager deployment that only authorizes
search results. Traversals are enabled by default.
traversal.enabled=false

Since: 2.6.0


Miscellaneous Properties

config.change.detect.interval

The config.change.detect.interval property specifies how often
(in seconds) to look for asynchronous configuration changes.
Values <= 0 imply never. For stand-alone deployments, long
intervals or never are probably sufficient. For clustered
deployments with a shared configuration store, 60 to 300 seconds
is probably sufficient. The default configuration change
detection interval is 0 (never).
config.change.detect.interval=60

Since: 2.8.0

⚠️ **GitHub.com Fallback** ⚠️