Engage Rallypoints - rallytac/pub GitHub Wiki
Engage uses multicast capabilities inherent your network to provide a transport mechanism between Engines (and therefore the users of those Engines.). However, what if your network doesn't support multicast or, often the case, you need to communicate over someone else's network - such as the Internet?
That's where Engage Rallypoints come in.
Engage's Rallypoints are small, super-fast, packet routers designed to securely forward packets between Engage Engines that are unable to communicate with each other over multicast. So, in our case of where Engage users need to speak with each other over something like the Internet, Rallypoints provide the means to do so.
- Prerequisites
- Installation
- Operation
- Configuration
- Meshing
- Multicast Reflection
- Whitelisting And Blacklisting Using Restrictions
- Monitoring
- Troubleshooting
Rallypoints run on Linux, MacOSX, and Microsoft Windows at this time - with Linux being the preferred platform. For Linux, you'll need either a branch of Red Hat or Debian. For Red Hat, we recommend CentOS 7 or higher while for Debian, we recommend Ubuntu 18 or higher or Debian 9 or higher.
The default inbound TCP port is 7443 and uses TLS v1.3. Make sure this port is open for inbound conections from Engage clients and other Rallypoints through your firewalls and other network infrastructure. (We mention that this is TLS so that if your infrastructure environment conducts deep packet inspection, TLS-passthrough, or other such operation for purposes of DoS attack detection and the like; you can configure it accordingly.)
Also, for environments where load balancers or other network infrastructure systems check on the availability or health of the process by opening TCP connections, the Rallypoint may be configured to listen for inbound connections on that port. If you are operating in such an environment, make sure that the port you configure for this purpose is opened. This "health check port" does not typically have traffic going back and forth - most health checkers simply open the connection and either close it right away or keep it open for a period of time. Make sure that if you enable this, DoS detection logic in firewalls and/or your operating system may need to be tuned to handle fast connect/disconnect operations from the health checker.
While a Rallypoint fundamentally serves to route packets between entities using TCP, it also supports UDP over multicast. This functionality is provided with the intent of Rallypoints forwarding traffic from unicast TCP to multicast UDP. This capability can be used to create a multicast backbone link between Rallypoints and/or route traffic from non-Rallypoint entities (including multicasting Engage Engines) operating on multicast to Engage-based entities using unicast. If you are going to be forwarding multicast traffic over unicast (and vice versa), make sure your Rallypoint machine has its firewall setup for multicast RX and TX and that the necessary UDP ports are opened for inbound and outbound traffic.
A Rallypoint is most easily installed by using the package manager for your operating system using the appropriate installation package provided by Rally Tactical Systems. These packages will install the necessary binaries, factory default certificates, and a baseline configuration. They will also setup the Rallypoint to operate as a daemon (background service) that starts at operating system boot time. This is done using systemd on Linux platforms and launchd on OSX.
For Red Hat distributions:
sudo yum install <rallypoint_package_file>.rpm
For Debian-based distributions:
sudo apt install <rallypoint_package_file>.deb
NOTE: In the above examples we're telling
yum
orapt
to run the installation from a file. So, to ensure that these tools will try to use the file and not a named package from a repository, you need to tell them you're referring to a file. Do this by changing to the directory where the file is located and then preceding the file name with./
. For example:
sudo yum install ./rallypointd-1.189.9026-0.x86_64.rpm
For OSX, open the <rallypoint_package_file>.dmg file and double-click the install link icon.
If you need to conduct a more sophisticated installation procedure, need to run the Rallypoint process manually (not as a background daemon for example), or generally just have more complex needs for your Rallypoint setup, you will need to install the relevant items manually. This is a pretty straightforward process so it shouldn't be too difficult.
Let's get going by assuming we're not yet configuring (or perhaps never configuring) to run as a background service.
- Place the
rallypointd
executable anywhere you'd like. This can be in a custom directory or in a standard executable location such as /usr/sbin. As long as the code can be executed from that location, you're good to go. - Place the security-related certificate and key files in a location where the Rallypoint can read them. These include the file containing the Rallypoint's certificate, and the file containing that certificate's private key. (Be sure that this location is strictly only accessible to the Rallypoint and any other authorized applications and/or users.) You will also need to place CA certificates used to verify client and peer Rallypoint certificates in a location accessible to the Rallypoint.
- Finally, place your configuration file in a location where the Rallypoint can read it. By default, the Rallypoint looks for
/etc/rallypointd/rallypointd_conf.json
for its configuration.
Note: If you do place your configuration file in a different location or give it a different name, then you'll need to tell the Rallypoint to use that file. Do this with the -cfg
command-line parameter. For example:
rallypointd -cfg:my_custom_configuration.json
Now that you've got everything installed manually, you may still want to setup the Rallypoint to run as a daemon on startup and avail yourself of the services offered by the operating system. In a Linux environment, this is easily done by setting up the required configuration for systemd or the more old-style init.rc
method. Refer to your operating system instructions on how to do this.
Setting up the service for systemd-like operation on Mac OSX systems is a little more tricky. Your best bet is to refer to Apple's documentation at :
Once the code is installed and configured (more on that below), your Rallypoint should just start up and begin accepting connections from clients and/or other Rallypoint peers. If the code is running as a daemon under systemd
, you can use the standard systemd
-related methods of interacting with your daemon. Such as:
Operation | Command Line |
---|---|
Starting the service | sudo systemctl start rallypointd |
Stopping the service | sudo systemctl stop rallypointd |
Restarting the service | sudo systemctl restart rallypoint |
Query service status | sudo systemctl status rallypointd |
Watch the log | sudo journalctl -f -u rallypointd |
If you are running rallypointd
from the command line, you will see the log output in the terminal window. To stop the process, simply press Ctrl-C
or use the kill
command to stop the process.
You can monitor the Rallypoint in a variety of ways.
The simplest is by viewing the output log displayed in a terminal window either directly from the process or, if running as a daemon, using journalctl as described above.
The output log is also sent to the standard operating system logging subsystem. This would be syslog
on Linux systems and Apple's high-performance logger on OSX systems.
All of these log messages follow the syslog
standard format including the timestamp of the message and severity level. These outputs can then be analyzed by log-processing tools such as SolarWinds, PaperTrial, and so on for purposes of generating alerts to administrative personnel or automated systems.
Note: If viewing directly in the terminal that has ANSI color-coding capabilities; the log lines are colorized to make it easier to spot
In addition to the log, the Rallypoint can be configured to produce a status report on a periodic basis. This report is in JSON format and written to a file specified in the configuration at an interval you decide (we recommend 30 seconds intervals, with 5 seconds as the minimum). This JSON file can then be analyzed to determine the health of the Rallypoint.
Here's an example:
{
"id":"demorp0001", // The Rallypoint's instance identifier
"ts":119790397, // UTC UNIX timestamp (number of seconds
// since Jan 1, 1970) of when this report
// was produced
"uptime": 149663, // Number of seconds the process has been
// up
"systemCpuLoad":4.86, // Percentage CPU load of the machine instance
// hosting the Rallypoint
"connections":
{
"active":1, // Number of active client connections
"denied":0, // Number of connection request denied
"total":3 // Total process lifetime count client
// connections*
},
"healthChecks":
{
"count":0, // Number of TCP health checks made by a
// load-balancer or other network
// infrastructure entity
"rate":0.0, // Health checks per second
"rateEma":0.0 // Exponential moving average of health
// checks per second
},
"peers":
{
"configuredConnectedCount":0, // Number of configured peer
// connections that are connected
"configuredCount":0, // Number of configured peer connections
"count":0 // Number of connected Rallypoint
// peers (inbound and outbound)
"leafConnectedCount":0, // Number of peers that are inbound leaf
// peer nodes
"list":[] // List of peers
},
"queue":
{
"avgExecNanos":60567, // Average number of nanoseconds a queue
// operation takes to execute
"maxExecNanos":7733634, // Longest number of nanoseconds a queue
// operation took to execute
"minExecNanos":0, // Least number of nanoseconds a queue
// operation took to execute
"lowPriorityQueueDepth":0, // Current number of operations waiting in the
// low priority queue
"lowPriorityQueueMaxDepth":0 // Maximum number of operations in the low-
// priority queue
"lowPriorityQueueFailures":0, // Number of operations denied entrance to
// the low-priority queue due to load
"normalPriorityQueueDepth":0, // Current number of operations waiting in the
// normal-priority queue
"normalPriorityQueueMaxDepth":1, // Maximum number of operations in the normal-
// priority queue
"normalPriorityQueueFailures":0, // Number of operations denied entrance to
// the normal-priority queue due to load
"ops":
{
"count":3360, // Total number of process
// lifetime operations
"rate":145.0, // Operations per second
"rateEma":15.002 // Exponential moving average
// of operations per second
},
"spuriousWakeups":114, // Number of queue wakeups that
// resulting in a no-op
"wakeUps":3346 // Total number of queue wakeups
},
"routing":
{
"blobs":
{
"rx":
{
"bytes":293895, // Number of received blob bytes
"packets":3118 // Number of received blob packets
},
"tx":
{
"bytes":292495, // Number of transmitted blob bytes
"packets":3116 // Number of transmitted blob packets
}
},
"paths":28, // Number of potential uni-directional
// media stream pathways defined by the
// routing table
"streams":5 // Number of registered streams
},
"rx":
{
"bytes":303209, // Number of received bytes
"packets":3133, // Number of received packets
"rate":106643.2, // Received bytes per second
"rateEma":9613.413 // Exponential moving average of
// received bytes per second
},
"tx":
{
"bytes":297647, // Number of transmitted bytes
"packets":3131, // Number of transmitted packets
"rate":106643.2, // Transmitted bytes per second
"rateEma":9397.57 // Exponential moving average of
// transmitted bytes per second
},
"throughput":
{
"rate":0, // Active network I/O throughput
// in bits per second
"rateEma":0 // Exponential moving average of
// network I/O through (bps)
}
}
By far, the most important element in the report that indicates performance and, therefore, user experience is the queue/normalPriorityQueueDepth
value. Values greater than zero for a prolonged period of time indicate that the Rallypoint is falling behind in processing packets - resulting in degraded audio quality and high latencies. This could be due to CPU pressure, memory overload, or simply I/O backup. This can be addressed by scaling up the performance of the machine/VM hosting the Rallypoint or by bringing additional Rallypoints online if you are operating a meshed Rallypoint cloud.
As a matter of interest, though, is the queue/ops/rate
element. This indicates how the number operations per second the Rallypoint is carrying out. (An "operation" being something like routing a packet, handling a request from a client, and so on.) On a reasonably powerful server-class machine, the Rallypoint comfortably processes in excess of 500,000 operations per second through its queue. So the example of 145.0 per second in the above JSON means that this particular Rallypoint is basically doing nothing :).
Now that you've seen how to install the software, operate it, and monitor it; it's a good idea to get into how you actually configure it. Here goes ...
We want to take a moment to stress something.
It is vitally important that you safeguard access to your X.509 certificate files and, in particular, private keys. While certificates are ultimately public, private keys are just that - PRIVATE. Make sure that only your Rallypoint and other authorized users and entities have access to these files.
As you've probably guess by now, a Rallypoint is configured using a JSON file that it looks for at /etc/rallypointd/rallypointd_conf.json
or one that is specified at the command line with the -cfg
argument.
Here's an example:
{
"id":"rp0001",
"listenPort":7443,
"interfaceName":"en0",
"multicastInterfaceName":"en0",
"allowMulticastForwarding":false,
"ioPools":-1,
"allowPeerForwarding":false,
"isMeshLeaf":false,
"certStoreFileName":"/etc/rallypointd/rallypointd.certstore",
"certStorePasswordHex":"",
"peeringConfigurationFileName": "",
"peeringConfigurationFileCommand":"",
"peeringConfigurationFileCheckSecs":30,
"limits":
{
"maxClients":0,
"maxPeers":0,
"maxMulticastReflectors":0,
"maxRegisteredStreams":0,
"maxStreamPaths":0,
"maxRxPacketsPerSec":0,
"maxTxPacketsPerSec":0,
"maxRxBytesPerSec":0,
"maxTxBytesPerSec":0,
"maxQOpsPerSec":0
},
โ
"statusReport":
{
"enabled":true,
"fileName":"/tmp/${id}_status.json",
"intervalSecs":30,
"includeLinks":true,
"includePeerLinkDetails":true,
"includeClientLinkDetails":false
},
โ
"linkGraph":
{
"enabled":true,
"fileName":"/tmp/${id}_links.dot",
"minRefreshSecs":5,
"includeDigraphEnclosure":true,
"includeClients":false,
"coreRpStyling":"[shape=hexagon color=firebrick style=filled]",
"leafRpStyling":"[shape=box color=gray style=filled]",
"clientStyling":"[dir=none]"
},
"externalHealthCheckResponder":
{
"listenPort":0,
"immediateClose":true
},
"certificate":
{
"certificate":"@certstore://rtsFactoryDefaultRpSrv",
"key":"@certstore://rtsFactoryDefaultRpSrv"
},
โ
"tls":
{
"verifyPeers":true,
"allowSelfSignedCertificates":false,
"caCertificates":
[
"@certstore://rtsCA"
],
"crlSerials":
[
"ad:de:61:33:99:67:21:e1",
"6B:D6:13:51:42:F5:04:31"
]
},
"fipsCrypto":{
"enabled":false,
"path":"/etc/rallypointd",
"curves":"secp521r1",
"ciphers":"TLS_AES_256_GCM_SHA384"
},
"txOptions":
{
"priority":4,
"ttl":128
},
"multicastTxOptions":
{
"priority":4,
"ttl":1
}
"multicastRestrictions":
{
"type":1,
"elements":
[
{
"rx":
{
"address":"234.1.2.3",
"port":25000
},
"tx":
{
"address":"234.1.2.3",
"port":25000
}
},
{
"rx":
{
"address":"234.5.6.7",
"port":17222
},
"tx":
{
"address":"234.5.6.7",
"port":17222
}
}
]
},
"groupRestrictionAccessPolicyType": 0,
"groupRestrictions":
{
"type":1,
"elements":
[
"{58e3e468-c0e1-4ad8-86d2-9931251e6ea0}",
"{b7694a4f-9724-44a6-ae57-c63232ad1f57}"
]
},
"extendedGroupRestrictions": [
{
"id": "{58e3e468-c0e1-4ad8-86d2-9931251e6ea0}",
"restrictions": [
{
"type": 1,
"elementsType": 2,
"elements": [
"-educators",
"-teachers",
"-instructors"
]
},
{
"type": 2,
"elementsType": 5,
"elements": [
"ST=WA",
"ST=ID"
]
}
]
},
{
"id": "{b7694a4f-9724-44a6-ae57-c63232ad1f57}",
"restrictions": [
{
"type": 1,
"elementsType": 6,
"elements": [
"O=My Fictional Issuing Organization"
]
}
]
}
],
"staticReflectors":
[
{
"id":"{58e3e468-c0e1-4ad8-86d2-9931251e6ea0}",
"rx":
{
"address":"234.1.2.3",
"port":25000
},
"tx":
{
"address":"234.1.2.3",
"port":25000
}
},
{
"id":"{b7694a4f-9724-44a6-ae57-c63232ad1f57}",
"rx":
{
"address":"234.5.6.7",
"port":17222
},
"tx":
{
"address":"234.5.6.7",
"port":17222
}
}
]
}
Let's go through these in detail...
The root of the configuration has a number of elements that key to the operation of the Rallypoint.
-
id
is a unique string you assign that identifies this instance of the Rallypoint. Its very important that this string be unique because it has meaning in situations where multiple Rallypoints are meshed together. In fact, if you leave this element blank, the Rallypoint will generate a value automatically. But that won't work terribly well for meshing so we recommend you always set this value The computer's host name works well here or, in the situations like Docker containers or cloud instances, the ID assigned by the container/cloud system. -
listenPort
is the TCP port that the Rallypoint listens on for TLS connections from Engage clients and other Rallypoints. As described earlier, this port runs TLS so make sure that firewalls and such allow inbound connections to this port for TLS - and TLS only! The default is7443
but you can assign any valid TCP port. -
interfaceName
is the name of the operating system network interface card (NIC) used by the listenPort. If you don't assign a name, the Rallypoint will bind on all NICs for thelistenPort
. -
multicastInterfaceName
is the name of the NIC used for receiving and sending multicast UDP traffic. This can be the same as interfaceName or different if you want to forward onto a backbone via another interface. Due to security concerns, you must set the multicast interface name. If you leave it blank, multicast forwarding will be disabled. -
allowMulticastForwarding
indicates whether forwarding of traffic to multicast is allowed. The default isfalse
. If enabled, endpoints that register streams containing multicast addressing information will cause the Rallypoint to automatically forward traffic. No other local configuration is required. Also note that when forward unicast TLS traffic to multicast, the TLS security envelope (the "TRANSEC") is removed and only the contents of the TLS payload is forwarded. You should ensure that the Engage groups your clients are using are encrypted (known as the "COMSEC") to prevent unauthorized access to that traffic. That said, if you are setting up multicast forwarding so that group traffic is relayed to third-party entities (such as LMR gateways) that do not support encryption, your Engage groups will have to be unencrypted. -
ioPools
indicates to the Rallypoint how many threads of parallel operation sould be setup for network I/O. If you leave this blank or set it to-1
, the Rallypoint will setup 1 I/O pool per CPU. (It's best to leave this element at -1 unless you have a very specific requirement to chage it). -
allowPeerForwarding
indicates whether unicast traffic received from Rallypoint peers in a mesh should be forwarded to other Rallypoints. This is an experimental feature at this time and should be left disabled unless you're fully aware of the implications of nasty things such as packet loops being created on your network. -
isMeshLeaf
indicates whether the Rallypoint is a "leaf" hanging off a mesh or is part of the core mesh. More on this further down in Connecting Into A Mesh. -
certStoreFileName
is the path to the certificate store to be used. See Engage Security for more information. -
certStorePasswordHex
is the hex representating of the password/passphrase protecting the certificate store. See Engage Security for more information. -
peeringConfigurationFileName
is the name of the file where the Rallypoint should load details about the mesh of which it forms a part. If there is no mesh, then you can leave this element blank. (More on this later.) -
peeringConfigurationFileCommand
is an operating system command to run instead of polling the file named by``peeringConfigurationFileName**. Its important that this command must send its output to STDOUT and is JSON formatted as per the mesh configuration file. Also, this command must execute and complete as quickly as possible (less than 30 seconds) or the Rallypoint will attempt to terminate that process.
IMPORTANT:
peeringConfigurationFileName
andpeeringConfigurationFileCommand
are mutually exclusive. If you set values for both elements, the Rallypoint will fail to configure and abort operation.
-
peeringConfigurationFileCheckSecs
is the interval (in seconds) that the Rallypoint will check the file defined bypeeringConfigurationFileName
for changes or, ifpeeringConfigurationFileCommand
is defined, the interval at which to run the command.
This object defines boundaries and limits for optimal operation as well as provide a means for the Rallypoint to report metrics to monitoring systems.
-
maxClients
sets the maximum number of client connections allowed. Set to 0 to disable. -
maxPeers
sets the maximum number of peer connections allowed. Set to 0 to disable. -
maxMulticastReflectors
sets the maximum number of multicast reflectors allowed. Set to 0 to disable. -
maxRegisteredStreams
sets the maximum number of streams that may be registered. Set to 0 to disable. -
maxStreamPaths
sets the maximum number of stream pathways that may exist. Set to 0 to disable. -
maxRxPacketsPerSec
sets the maximum number of received packets per second allowed. Set to 0 to disable. -
maxTxPacketsPerSec
sets the maximum number of transmitted packets per second allowed. Set to 0 to disable. -
maxRxBytesPerSec
sets the maximum number of received bytes per second allowed. Set to 0 to disable. -
maxTxBytesPerSec
sets the maximum number of transmitted bytes per second allowed. Set to 0 to disable. -
maxQOpsPerSec
sets the maximum number of queue operations per second allowed. Set to 0 to disable.
This object contains details about the status report generation as described earlier.
-
enabled
set to true to enable - default is false. -
fileName
is the full path name of the JSON file where the status report should be written. If you leave this element blank, no status report will be produced. Include "${id}
" to have the Rallypoint ID inserted into the name. -
intervalSecs
is the interval (in seconds) at which the status report should be produced. While you can set this as low as 1 second, we recommend lower than 5 seconds. 30 seconds is generally a reasonable value. -
includeLinks
indicates whether link information is to be included - default is false. -
includePeerLinkDetails
set to true to include details about peer links ifincludeLinks
is true - default is false. -
includeClientLinkDetails
set to true to include details about client links ifincludeLinks
is true - default is false.
Contains details for creation of a Graphviz file with a graphical representation of links.
-
enabled
set to true to enable - default is false. -
fileName
is the full path name of the Graphiviz file to be written. If you leave this element blank, no file will be produced. Include "${id}
" to have the Rallypoint ID inserted into the name. -
minRefreshSecs
is the minimum time (in seconds) between updates to the file. -
includeDigraphEnclosure
set to true to surround the output in a "strict digraph
" enclosure - default is true. -
includeClients
set to true to include client links - default is false. -
coreRpStyling
Graphiviz styling to be use to represent a core Rallypoint node. -
leafRpStyling
Graphiviz styling to be use to represent a leaf Rallypoint node. -
clientStyling
Graphiviz styling to be use to represent a client node.
Contains information for external health checker interoperability. "Health checkers" are generally network entities such as load balancers and network management systems that monitor the Rallypoint to determine their health and availability. In the most setups, though, health checkers do simple things like open a TCP connection to the process to verify that it's operational. And they generally close that connection right away.
-
listenPort
is the TCP port that the Rallypoint should listen on for health check TCP connections. Set it to whatever is required/desired by your health checker. -
immediateClose
indicates whether to immediatelt close the connection. (Some health checkers require this.)
Security is fundamental to Rallypoints, and that security is ensured largely thanks X.509 certificates. The certificate (and associated key) in this section are vital to the operation of the Rallypoint.
-
certificate
is the PEM content of the X.509 certificate to be used. If the value if this element starts with@
followed by a file name, the Rallypoint will, instead, read the PEM contents from the file path following the@
sign. If the content starts with "@certstore://", the Rallypoint will load the content from the certificate store element by that name. -
key
is much the same as the PEM content of the X.509 certificate but, in this case, the private key associated with the certificate. The same logic with the@
sign applies.
Further to security, this section deals with TLS connections with entities such as clients as Rallypoint peers.
-
verifyPeers
indicates whether the Rallypoint should ask for, and verify, the far-end's X.509 certificate - ensuring mutual authentication/verification as well as certificate-based message-signing. It's a REALLY good idea to enable this. -
allowSelfSignedCertificates
indicates whether the Rallypoint will accept certificates from entities that have been self-signed - i.e. not issued by a known certificate authority. You should generally leave this disabled. -
caCertificates
is an array of CA certificates that the Rallypoint will use to verify client certificates. As with other certificate setting, these strings can be the actual PEM text or references to files or certificate store elements - both using the@
nomenclature. -
crlSerials
is an array of strings - each representing the serial number of a revoked certificate that the Rallypoint should deny (in essence, a Rallypoint-specific Certificate Revocation List). The strings are not case-sensitive but must be in the format "xx:xx:xx:xx:xx:xx:xx:xx" - i.e. a:
seperating the hexidecimal representation of each byte of the serial number. Please note, the examples shown are not necessarily real serial numbers - they are for demonstration purposes only.
Used to configure FIPS140-2 settings.
-
enabled
activates FIPS mode when set totrue
. -
path
is the absolute path name to the directory where therts-fips
FIPS module is located. Note that this value is for the directory, not the actual file name itself. -
curves
specifies a list of elliptic curves to be supported in FIPS mode. The singular default is the currently highest level NIST-approvedsecp521r1
curve, althoughsecp384r1
andsecp256r1
may also be used. For example: to specify all NIST-approved curves, this setting would besecp521r1:secp384r1:secp256r1
. -
ciphers
specifies a list of ciphers to be supported in FIPS mode. The singular default is the NIST-approvedTLS_AES_256_GCM_SHA384
cipher, althoughTLS_AES_128_GCM_SHA256
may also be used. For example: to specify all NIST-approved curves, this setting would beTLS_AES_256_GCM_SHA384:TLS_AES_128_GCM_SHA256
.
These settings have to do with how packets are sent from the Rallypoint to the far-end.
-
priority
sets the QoS-related priority for transmitted packets. See Engage and Network Quality Of Service for more information. -
ttl
sets the IP Time-To-Live value for packets. This may have differing effects on different operating systems.
These settings have to do with how packets are sent from the Rallypoint over multicast and are only applicable for situations like multicast reflecting.
-
priority
sets the QoS-related priority for transmitted packets. See Engage and Network Quality Of Service for more information. -
ttl
sets the IP Time-To-Live value for packets. Be sure that you understand what changing the TTL value is for your multicast environment.
This object describes how multicast addresses are restricted. You can either restrict multicast to only a set of elements, or exclude a set of elements. More on this later in the Multicast Reflection section.
-
type
indicates the elements are to be treated.1
as a "whitelist",2
as a blacklist. -
elements
is an array of objects that, each, describe a multicast address and port pairing for RX and TX.
This setting (values of 0
or 1
) has an important impact on the method by which registrations for groups are allowed on the Rallypoint. A value of 0
(the default) indicates that, unless otherwise specified in groupRestrictions
or extendedGroupRestrictions
(see below), registration for that group will be allowed. In other words, a value of 0
denotes a permissive access policy. A value of 1
on the other hand, enforces a strict access policy whereby registration for a group is denied by default unless it is, at minimum, listed in groupRestrictions
and, if desired, further extended in extendedGroupRestrictions
.
This object describes how group identifiers are restricted. You can either restrict groups to only a set of elements, or exclude a set of elements.
-
type
indicates the elements are to be treated.1
as a "whitelist",2
as a blacklist. -
elements
is an array of strings - each being a valid Engage group ID.
This object extends the basic group restrictions that are defined in groupRestrictions
. The basic idea here is that if groupRestrictions
is viewed as sort of a blunt instrument that simply allows or denies access to groups, the extendedGroupRestrictions
object places more fine-grained control within those groups. For example, while you may want to allow registration of {58e3e468-c0e1-4ad8-86d2-9931251e6ea0}
, you may want to restrict access only to certain sets of users - ideally based on the X.509 certificates they present when connecting. See below for a more detailed discussion on this subject.
-
id
is the ID of the group we're working with. -
restrictions
is the container object for the group's specialized restriction set. -
type
indicates the elements are to be treated.1
as a "whitelist",2
as a blacklist. -
elementsType
indicates how the Rallypoint should interprety the list of elements. See blow -
elements
is an array of strings - each interpreted as perelementsType
.
This is an array of multicast reflectors that need to be maintained for the lifetime of the Rallypoint process. More on this later in the Multicast Reflection section.
OK great, you've setup a Rallypoint and you have lots of users connecting to it. But you're running our of CPU because you have thousands of users talking away like crazy. Or, you have have groups of people in different parts of the world that need to connect to their own Rallypoints - but they all still want to communicate with each other. Or, you have a need to provide additional Rallypoints for failover and redundancy purposes. Or, something else ...
Well, the solution here is generally to just add more Rallypoints.
But, you want all those Rallypoints to interconnect with each other. That's where Rallypoint meshing comes in.
For this example we're going to assume we want to setup a bunch of Rallypoints in a cloud environment such as Amazon Web Services (AWS). We'd like to make those Rallypoints available to our users spread around the world. And we don't want to have those users connect to particular Rallypoints. Rather, we want any user to connect to any available Rallypoint and have the Rallypoints take care of forwarding traffic amongst themselves to create what looks like a big, centralized, cloud-based Rallypoint to our users.
First, we're going to need something that front-ends all these Rallypoints; offering a single DNS name (or single IP address) that all our users connect to. Let's call it cloudrp.example.com
. Also, for purposes of this discussion, we'll say we're going to do all this in Amazon Web Services, taking advantage of Amazon's Elastic Load Balancer.
In an ideal situation, we'd use Rallypoints' ability to forward traffic to a multicast backbone to create something like below. (Notice how clients 1, 2, and 3 connect to the load balancer which, in turn, passes those IP connections onwards to one of the three Rallypoints.)
|
(c1) +------+ |
+-------------> |rp0001| <------> |
| +------+ |
| |
| |
+-------------+ | multicast
c1 ----------> | | (c2) +------+ | backbone
c2 ----------> |Load Balancer| -----> |rp0002| <------> |
c3 ----------> | | +------+ |
+-------------+ |
| |
| |
| (c3) +------+ |
+-------------> |rp0003| <------> |
+------+ |
|
But, sadly, our cloud provider does not support IP multicast (and that is, in fact, true of AWS as well as most of cloud providers). Also, multicast networks can be difficult to setup and maintain so sometimes its just easier to go with unicast.
So, we'd still like to have the logical setup we described above but we have to figure a way to use something other than a multicast bakbone. The answer is to create a Rallypoint Mesh.
A mesh is simply just a configuratio where Rallypoints connect directly to each other for purposes of traffic forwardng. This is not too much different from the way in which IP multicast works anyway - just that the Rallypoints themselves are doing "multicasting" rather than utilizing the IP network for that purpose.
As you can see below, what we've done is to have each Rallypoint in our cloud connect to every other Rallypoint. Now, when a client connects (indirectly) to a Rallypoint, it's traffic is forwarded to other Rallypoints - just like multicast. It's actually rather straightforward.
(c1) +------+ (pc)
+-------------> |rp0001| <---------------+
| +------+ |
| ^ |
| | (pc) |
+-------------+ | |
c1 ----------> | | (c2) +-------> +------+ |
c2 ----------> |Load Balancer| -------------------> |rp0002| | (pc)
c3 ----------> | | +-------> +------+ |
+-------------+ | |
| | (pc) |
| v |
| (c3) +------+ |
+-------------> |rp0003| <---------------+
+------+ (pc)
I bet you have some questions though ...
- You might be wondering if all traffic from all Rallypoints is forwarded to all other Rallypoints - right? Yup, that would be kinda silly and inefficient so what Rallypoints do is "subscribe" to each other for traffic for individual streams. So if client 1 connected to RP1 and client 3 connected to RP3 are registered/subscribed for the same stream, that stream's traffic only flows between RP1 and RP3 - bypassing RP2.
- What about security on these links between Ralypoints? Well, they're TLS connections just like from clients to Rallypoints and are subject to the same level of X.509 mutual authentication and TLS encryption.
- But latency must increase - correct? Well, yes, but only by a miniscule amount. Remember that Rallypoints are packets routers - and just that! They do not process traffic payloads, and therefore the only latency introduced by traffic going a Rallypoint is on the order of microseconds.
Configuration for meshing is pretty straightforward. All that the Rallypoints need is some certificate information and information about all the Rallypoints in the mesh.
Here's an example
{
"peers":[
{
"id": "cloudrp0001",
"enabled":true,
"host":
{
"address": "cloudrp0001.example.com",
"port": 7443
},
"certificate":
{
"certificate":"@certstore://someOtherCert",
"key":"@certstore://someOtherCert"
}
},
{
"id": "cloudrp0002",
"enabled":false,
"host":
{
"address": "cloudrp0002.example.com",
"port": 7443
}
}
]
}
This is an array of peer objects, each describing an individual peer in the mesh as follows:
-
id
is the unique ID of the peer and should match theid
field in the Rallypoint's configuration. -
enabled
indicates whether this peer is enabled - i.e. whether other Rallypoints should connect to it. This element is useful if you want to disable the connection to peer without removing the information from the mesh file. -
host/address
is the DNS name or IP address of the peer. -
host/port
is the peer's TCP port to connect to and must match thelistenPort
in that peer's configuration. -
certificate
allows you to specify a particular X.509 certificate and private key for this peer - i.e. not use the default.
As we saw above in Meshing, a Rallypoint has the ability to connect to the local multicast network to use that network as a backbone for passing traffic between the nodes in the mesh. Well, that's not where it ends. In fact, the Rallypoint can forward any type of traffic on multicast. And this is especially useful when you want to exchange traffic between unicast endpoints (such as Engage Engines and other Rallypoints) and multicast endpoints.
But before we get into that, let's quickly remind ourselves that Engage entities (such as mobile and desktop apps) can exchange multicast voice traffic with non-Engage entities - provided those entities support industry standard protocols such as RTP and CODECS such as G.711, AMR, and so on.
A classic use-case is when you have an entity such as a two-way radio gateway that "speaks" multicast and you and need Engage entities to talk to that system. This is easily accomplished by setting up a group on Engage to use the codec the gateway is configured for and to use the same multicast IP address and port configured on the gateway. Assuming the multicast flows cleanly between the gateway and the clients (generally because they're all on the same network), all works fine.
So, something like the following where we have three Engage "clients" (c1, c2, and c3) on the same multicast network as the gateway. And the gateway is configured to forward a single talk group/channel/frequency on the radio system to bi-directional multicast at 234.5.6.7 with a port number of 15000 over standard RTP. And the gateway is configured to use a CODEC that Engage supports - in this case, G.711 ulaw. (We'll also assume that the gateway is not performing encryption of the RTP traffic (which is pretty typical these days.)
multicast network
-------------------------------------
^ ^ ^ ^
| | | |
v v v v
+---------+ c1 c2 c3
| gateway |
+---------+
^
|
v
+--------------+
| radio system |
+--------------+
Alright, that's pretty straightforward - we're really just matching CODECs and multicast addressing with the gateway. All's good.
Something to be aware of, though, is that under the covers Engage is using a group ID associated with that group. Keep this in mind for later on.
Let's contrive this example a little more and say that we DON'T want our Engage users (c1,c2,c3) on the multicast. Rather, we want them to connect over unicast to a Rallypoint which is on the multicast network with the gateway.
Actually, while seemingly contrived, this is not entirely unlikely as these client devices may be unable to join multicast networks (even though they're local) because of an older operating system, administrative policy, etc.
So, we want something like this:
multicast network
-------------------------------------
^ ^
| |
v v
+---------+ +---------+
| gateway | | | <---------> c1
+---------+ | rp | <---------> c2
^ | | <---------> c3
| +---------+
v
+--------------+
| radio system |
+--------------+
Also, pretty straightforward except for ... How does the Rallypoint know what multicast and port to use? And what Engage group does that map to?
Well, the Rallypoint can "learn" from the clients! That's right - when a client connects to the Rallypoint and registers for a group, it passes multicast info along with that registration. (Remember the client was configured earlier with the multicast info.) If the Rallypoint has been configured to allow multicast forwarding, and forwarding is permitted for the address and port pairing (see below), the Rallypoint will setup a "reflector". The reflector is simply a construct on the Rallypoint that "reflects" unicast traffic from clients (and other Rallypoints) to the multicast - and vis-a-versa.
So ... when the first client connects (let's say c1), the Rallypoint automatically sets up a reflector to the local multicast and keeps it going until all references to it go away. Subsequent registrations (say from c2 and c3) will not result in additional reflectors being setup - the Rallypoint will already have it. Once all clients that are "interested" in that group (and by association, the multicast) have disconnected; the reflector is stopped.
Pretty cool huh!? You don't really need to do too much on the Rallypoint to make this magic happen - other than to allow multicast reflection (which is disabled by default due to security and bandwidth utilization considerations).
But ... bear in mind that you do need to be using the same group ID on all the clients. You cannot have different groups that use the same multicast addresses and ports. Each of those may be able talk to the radio system, but they won't talk to each other!
What's that you say? Your Engage clients are OUTSIDE the corporate network on the Internet but need to talk to your radio system? Hmm, how could we possibly make that happen ... !?
Pretty easy in fact. Have exactly the same setup as before but this time make sure that your clients outside the corporate network have access to the Rallypoint through your firewall. You can do this by putting your Rallypoint in a DMZ zone with only in the inbound TCP port for Rallypoint connections allowed from the outside world and assign that inbound to the interfaceName
setting in your Rallypoint configuration file, and set multicastInterfaceName
in the configuration to bind to a different NIC. Or you could have them on the same NIC, or you can setup your Rallypoint to connect to an external Rallypoint that clients connect to and are then proxied to the Rallypoint on the multicast. Basically, there's a number of ways to do this. But, right now we'll go with the first one where we assume the clients have access through the firewall to the Rallpoint as depicted below.
firewall
internal network |
multicast network | internet
------------------------------------- |
^ ^ |
| | |
v v |
+---------+ +---------+ |
| gateway | | | <--------> c1
+---------+ | rp | <--------> c2
^ | | <--------> c3
| +---------+ |
v |
+--------------+ |
| radio system | |
+--------------+ |
|
Now, in the same way that things worked when the clients were inside the corporate network, it works when the clients are outside - no changes necessary from what we learnt before.
We're doing well but now we want to get away from the clients knowing anything about multicast addressing on the LMR network. That's a real pain to manage - particularly if we have tons of these gateway setups all over the world. What we want here is for only the Rallpoint to know about multicast addressing and leave the clients to simply configure a group that the Rallypoint maps to the local multicast. So ... we have to "teach" the Rallypoint a little. And we do that using the staticReflectors
section of the Rallypoint configuration.
We're going to need the multicast address and port (obviously) as well as the ID that the group has been assigned by the person or entity that creates the group configuration distributed to our clients. This is a little tricky and varies with different vendors' implementations of the Engage system. For our purposes, though, we'll assume we know that the group ID is {1e351ac4-7915-4144-9545-82d60c9cfe4e}
.
Once we have this information, we edit the Rallypoint's configuration file and modify the staticReflectors
section as follows:
.
.
"staticReflectors":
[
{
"id":"{1e351ac4-7915-4144-9545-82d60c9cfe4e}",
"rx":
{
"address":"234.5.6.7",
"port":15000
},
"tx":
{
"address":"234.5.6.7",
"port":15000
}
}
]
.
.
Now, we'll restart the Rallypoint. The reflector will be setup for the group identified as {1e351ac4-7915-4144-9545-82d60c9cfe4e}
and will stay active for the lifetime of the Rallypoint. Any clients registering for {1e351ac4-7915-4144-9545-82d60c9cfe4e}
will receive traffic for that group from other clients as well as from the multicast. Also, traffic from clients will be sent to the multicast as well.
Architecturally, this looks exactly the same as the previous diagram. The only difference is that the Rallypoint knows about the multicast at startup time, rather than only when "learning" about it when the first client connects.
All this is good and well and works fine for simple implementations. But what if we want to get even more sophisticated and connect our statically reflecting Rallypoint into a core Rallypoint mesh and make the multicast traffic available to clients connecting into that mesh.
Well, that functionality comes free with the setup we've just made. Simply point the Rallypoint to the mesh and connect the clients to the mesh instead. Your Engage Engines and Rallypoints will work together to deliver the traffic to where its needed - and nobody except the "local" Rallypoint needs to know anything about the gateway's multicast setup.
You can easily build something like this where you have two (or more) radio systems, each with their own gateway on their own metwork and their own multicast addressing. Each radio system (radio 01
and radio 02
) would be different groups as far as the clients (c1, c2, and c3) goes - with a seperate group ID for each of course. On each Rallypoint (rp01
and rp02
) you'd setup a static reflector just for that local radio system and mark each Rallypoint as being a "mesh leaf" by setting its isMeshLeaf
setting to true. On each Rallypoint you'd peer it into the cloud mesh in much the sme way that you'd setup the core peers in the mesh as described above. But, instead of your Rallpoint peering to each of your core mesh Rallypoints, you'd simply peer it to your load balancer fronting the core mesh. Everything else will be automatically taken care of.
---------------------------------
^ ^
| |
v v
+---------+ +---------+
| gateway | | leaf |
+---------+ | rp01 |<--------------+
^ | | |
| +---------+ |
v |
+--------------+ |
| radio 01 | v
+--------------+ +---------+
| | <----> c1
| cloud | <----> c1
| mesh | <----> c1
| |
+---------+
^
--------------------------------- |
^ ^ |
| | |
v v |
+---------+ +---------+ |
| gateway | | leaf | |
+---------+ | rp02 | <-------------+
^ | |
| +---------+
v
+--------------+
| radio 02 |
+--------------+
Its very important that you set your "local" Rallypoint as a mesh leaf using the
isMeshLeaf
property. If you don't, the core mesh will not forward traffic as expected and you may experience one-way traffic flow.
To further optimize operation, reduce bandwidth utilization, and guard against potential security threats; Rallypoints have the notion of "restrictions". A restriction is simply a rule that governs how the Rallypoint is to allow (or disallow) access to something.
In the case of multicasts, we can specify which multicast addresses and ports are to be allowed for use - by setting type
in the JSON object to 1
. That way, the Rallypoint will only allow those multicasts to operate. Conversely, if we want to allow all multicasts except for some very specific ones, we set type
to 2
in the JSON and list the multicasts we want excluded.
In the same way, we can place restrictions on which group IDs are either specifically allowed on a Rallypoiint, or not allowed.
In our example JSON above, we had the following:
.
.
"multicastRestrictions":
{
"type":1,
"elements":
[
{
"rx":
{
"address":"234.1.2.3",
"port":25000
},
"tx":
{
"address":"234.1.2.3",
"port":25000
}
},
{
"rx":
{
"address":"234.5.6.7",
"port":17222
},
"tx":
{
"address":"234.5.6.7",
"port":17222
}
}
]
},
.
.
What we've done here is to tell the Rallypoint that we ONLY want to allow multicasting for 234.1.2.3:25000
and 234.5.6.7:17222
. All other multicasting attempts will fail. We did this by setting type
to 1
. Pretty simple - huh! Now, if we rather wanted to allow all multicasting EXCEPT for these, we'd simply set type
to 2
.
NOTE: If you set
type
to1
and do not provide any multicasts; you are effectively turning off all multicasting. The Rallypoint will log a warning for this at startup but, as this is not a critical issue, will continue execution.
In much the same way as we define retrictions for multicasting, we can do the same with group IDs. As per the example:
.
.
"groupRestrictions":
{
"type":1,
"elements":
[
"{58e3e468-c0e1-4ad8-86d2-9931251e6ea0}",
"{b7694a4f-9724-44a6-ae57-c63232ad1f57}"
]
},
.
.
Here we're instructing the Rallypoint that the ONLY group IDs allowed on this Rallypoint are {58e3e468-c0e1-4ad8-86d2-9931251e6ea0}
and {b7694a4f-9724-44a6-ae57-c63232ad1f57}
. Attempts by Engage clients to register for any other stream will be denied. Also, any other Rallypoint peer attempting to register for any group ID other than the ones specified will also be denied.
Plus ... there's a cool bonus here ... When Rallypoints first connect to each other, they include their group restrictions in the initial handshake. This allows these peers to filter the registrations they convey across a Rallypoint mesh - greatly reducing bandwidth utilization and operational overhead.
And, of course, we can turn this upside down and allow all groups except some specific ones by simply setting type
to 2
- thereby blacklisting those group identifiers.
NOTE: If you set
type
to1
and do not provide any group identifiers; you are effectively turning off all group-related operations. And, because the Rallypoint's function is to provide packet routing for groups, this will basically make it a dead entity. Hence, if this issue is encountered at startup, the Rallypoint will log a fatal error and abort.
Also ... this stuff has bearing on
staticReflectors
as well. If you setup static reflection on a Rallypoint, and you're using restrictions, make sure that the static reflector's multicast addressing is included in yourmulticastRestrictions
(or at least not excluded from yourmulticastRestrictions
). Similarly, make sure that you'll be allowing the group IDs ingroupRestrictions
.
Group restrictions are a simple way for your Rallypoint to allow or deny access to groups. But with simplicity comes lack of flexibility. For example, by whitelisting {58e3e468-c0e1-4ad8-86d2-9931251e6ea0}
and {b7694a4f-9724-44a6-ae57-c63232ad1f57}
above, we allow ANYONE to access those groups (assuming their X.509 certificate checks out when they connect). But what if we want to specialize a little here and place some restrictions on WHO can access those groups?
Well, the Rallypoint doesn't support the notion of user IDs, passwords, or other such old-school stuff. Rather, it uses X.509 extensively for all kinds of security-related concerns. So we figured we'd use those certificates for the notion of "group-level firewalling". Alright, let's image that that group {58e3e468-c0e1-4ad8-86d2-9931251e6ea0}
is a group used in the education sector - say in schools. We don't want just anyone with an Engage client and a valid certificate getting onto a group where there's discussions about kids, their schedules, interests, and so on. Rather, what we want is only for people - say like teachers - to participate in that group. So we'll place a set of restrictions on the group that only allows educators, teachers, and instructors can access. With something like this:
"extendedGroupRestrictions": [
{
"id": "{58e3e468-c0e1-4ad8-86d2-9931251e6ea0}",
"restrictions": [
{
"type": 1,
"elementsType": 2,
"elements": [
"-educators",
"-teachers",
"-instructors"
]
}
]
},
And how do we know that a user is a educator, teacher, or instructor ... ? Well, we're going to look for that "role" as a tag in their X.509 certificate that their Engage client gave to the RP when it connects. Later on we'll talk about IANA Object Identifiers and other such things that give us the ability to embed these "tags" in the client certificate. For now, just assume that the client's certificate contains one or more of those tags - being -educators
, -teachers
, or -instructors
. If it does, the Rallypoint will allow the client access to it. If none of tags exist in the client certificate; that client will not be allowed to registrer for that group (but it may well be able to register for others of course).
But, let's say that there's a secret discussion going on regarding schools in Washington or Idaho (maybe there's a surprise party being planned - just go with it ... OK!). And we want to exclude folks in Washington and Idaho. Those folks would already have one or more of -educators
, -teachers
, or -instructors
tags in their certificates - which would grant them access. But we don't want to do that right now. So, we add another rule that specifically excludes them - in this case based on the State that their certificate was issued for (this is in the Subject
field of their certificates in the ST
item). Our extendedGroupRestrictions
for {58e3e468-c0e1-4ad8-86d2-9931251e6ea0}
now looks as follows:
"extendedGroupRestrictions": [
{
"id": "{58e3e468-c0e1-4ad8-86d2-9931251e6ea0}",
"restrictions": [
{
"type": 1,
"elementsType": 2,
"elements": [
"-educators",
"-teachers",
"-instructors"
]
},
{
"type": 2,
"elementsType": 5,
"elements": [
"(ST=WA)|(ST=ID)"
]
}
]
},
The first object initially grants (type=1
) access for any certificate containing the tags -educators
, -teachers
, or -instructors
. The second object denies (type=2
) access to certificates whose ST
item in the Subject
(elementsType=5
) is WA
or ID
.
The elementsTypes
field is really important here because it tells the Rallypoint how to interpret the strings in the elements
list. They are as follows:
-
0
- the elements are literal group IDs -
1
- the elements are group ID regex patterns -
2
- the elements are generic access tags regex patterns in the client certificate -
3
- the elements are X.509 serial number regex patterns for the client certificate -
4
- the elements are X.509 fingerprint regex patterns for the client certificate -
5
- the elements are regex patterns the client certificate's subject' field -
6
- the elements are regex patterns the subject field of the CA certificate that issued the client certificate
Except for a elementsType
of 0
, all of these are processed as regular expressions - or "regex" - which are immensely powerful search criteria for processing data. However regex is a black art and seen by many (this author included) as too difficult to comprehend. Nonetheless, you can place regex in the elements
array and the Rallypoint will dutifully process them.
If you want to know more about regex - and have about 6 months of available time - start at Wikipedia and go from there.
NOTE: As if regex wasn't already complicated enough in general; there are different implementations of regex that have sufficient subtle differences to make it hard to keep up. So, we decided to use regex as implemented in the the PCRE2 (Perl Compatible Regular Expressions) library used by many popular applications. Check it out on Wikipedia
If you look at the JSON above and pay attention to the section regarding tags (like -educators
, -teachers
, and -instructors
), your geeky self may see a bug (and you'd be right).
Let's take the tag -educators
for example ... This will certainly match -educators
but it will also match -educatorsInPhysics
and -educatorsWhoDriveCars
. If we really want to match just -educators
then we must tell the Rallypoint that the tag must be EXACT - and we do that with regex by indicating a word boundary with the special \b
notation. So, we'd specify -educators\b
(notice the \b
in there) rather than just -educators
.
REMAINDER OF DISCUSSION OF EXTENDEDGROUPRESTRICTIONS IS PENDING
Every organization has policies that dictate how services such as Rallypoints should be monitored. We've tried to be as flexible as possible to allow for that. And will provide examples for different environments as they come to light.
Here's a script we developed for a Rallypoint mesh residing in Amazon Web Services. Feel free to use it in your environment.
#!/bin/bash
#-------------------------------------------------------------------------
# Rallypoint Status Display
# Copyright (c) 2020 Rally Tactical Systems, Inc.
#
# This script periodically reads a status file produced by a Rallypoint
# and displays useful statistics. We're making use of JQ in this
# script so if you don't already have JQ, install it.
#
# This script has been tested on RedHat-style distros and works quite
# well, your mileage may vary (slight) on other distros. Feel free
# to modify accordingly.
#
# In fact, this script was developed for RTS' internal use for a Rallypoint
# mesh hosted in Amazon Web Services. So there's a little bias toward AWS
# here. Mostly, though, this script should work on most Linux distros.
#
# Here's an example of a JSON status file. We will be using some of the
# elements from it in our script.
#
#{
# "connections": {
# "active": 49, <-- Number of active network connections
# "total": 574 <-- Total network connections for the lifetime of the process
# },
# "healthChecks": {
# "count": 10823, <--- Number of healthcheck connections we've had from the load balancer
# "rate": 0.834, <--- Instantaneous rate of health check connections per second
# "rateEma": 1.353 <--- Exponential moving average of the rate of health check connections
# },
# "id": "i-07ff7082e6969e259", <--- The ID of the Rallypoint
# "links": {
# "clients": {
# "count": 48 <--- Number of active client connections
# },
# "peers": {
# "configuredConnectedCount": 1, <--- Number of peer (mesh) connections the RP has connected that were statically configured
# "configuredCount": 1, <--- Number of peer (mesh) connections the RP is statically configured for
# "count": 1, <--- Number of active peer (mesh) connections
# "leafConnectedCount": 0, <--- Number of connected leaf RP nodes
# "list": [ <--
# {
# "address": "172.31.11.22:7443", <--- Address of a peer
# "id": "i-0aaa5cefaa5cbd870", <--- ID of a peer
# "state": 2, <--- Connection state of the peer (0=NotLinked, 1=InProgress, 2=LinkedOutbound, 3=LinkedInbound)
# "type": 1 <--- Connection type of the peer (1=Core, 2=Leaf)
# }
# ]
# }
# },
# "queue": {
# "avgExecNanos": 0, <--- Average operation execution time in nanoseconds
# "depth": 0, <--- Number of operations pending in the queue (should be 0 or close to it)
# "maxDepth": 172, <--- Maximum number of operations that backed up in the queue duering the process' lifetime
# "maxExecNanos": 0, <--- Maximum operation execution time in nanoseconds
# "minExecNanos": 0, <--- Minimum operation execution time in nanoseconds (should always be 0)
# "ops": {
# "count": 168145, <--- Number of operations processed during the process' lifetime
# "rate": 38.867, <--- Instantaneous rate of operations per second
# "rateEma": 5.035 <--- Exponential moving average of rate of operations per second
# },
# "spuriousWakeups": 16243, <--- Number of times the queue woke up with no work to do
# "wakeUps": 155164 <--- Total number of times the queue woke up
# },
# "routing": {
# "blobs": {
# "rx": {
# "bytes": 41455816, <--- Incoming bytes of routed streamed data
# "packets": 123354 <--- Incoming packets of routed streamed data
# },
# "tx": {
# "bytes": 1252696896, <--- Outgoing bytes of routed streamed data
# "packets": 3815596 <--- Outgoing packets of routed streamed data
# }
# },
# "streams": 7 <--- Number of streams/groups registered for routing
# },
# "rx": {
# "bytes": 43615050, <--- Overall incoming bytes of data (routed and otherwise)
# "packets": 139700, <--- Overall incoming packets of data (routed and otherwise)
# "rate": 59306.667, <--- Overall RX byte rate/second
# "rateEma": 3011.565 <--- Exponential moving average of overall RX byte rate/second
# },
# "ts": 1584653616,
# "tx": {
# "bytes": 1256490445, <--- Overall outgoing bytes of data (routed and otherwise)
# "packets": 3831874, <--- Overall outgoing packets of data (routed and otherwise)
# "rate": 2679872, <--- Overall TX byte rate/second
# "rateEma": 65829.295 <--- Exponential moving average of overall TX byte rate/second
# }
#}
#-------------------------------------------------------------------------
# This script is running on Amazon Web Services, for we use this magic little curl
# call to retrieve this instance's ID. This same instance ID is used by the Rallypoint
# to identify itself in our mesh. For you environment, your Rallypoint ID may be something
# else - like a Kubernetes cluster instance ID, a host name, an IP address, or
# some other uniquie identifier
RP_ID=`curl -s http://169.254.169.254/latest/meta-data/instance-id`
if [ "${RP_ID}" == "" ]; then
echo "ERROR: Cannot determine Rallypoint instance ID"
exit 1
fi
# Our Rallypoint periodically writes it's status to the /tmp directory into a JSON
# file named with the instance ID followed by "_status.json". The default is
# 30 seconds but your configuration may be different.
FN="/tmp/${RP_ID}_status.json"
# We will check the JSON file every so often for changes
UPDATE_CHECK_SECS=10
# Just some internal variables
SHOW_TABLE_HEADER=1
LAST_TS=""
# Load our values
function loadVals()
{
# Determine how long the process has been running
UPTIME=`ps axo etime,%cpu,%mem,cmd | grep 'rallypointd -id' | grep -v grep | awk '{split($0,a); print a[1];}'`
# Parse our JSON into an array of strings
VALUE_ARRAY=(`jq '.ts,.connections.active,.connections.total,.healthChecks.rate,.links.clients.count,.links.peers.count,.routing.streams,.routing.blobs.rx.packets,.routing.blobs.tx.packets,.queue.depth,.queue.maxDepth,.queue.ops.count,.queue.ops.rate,.queue.ops.rateEma' "${FN}"`)
# Grab the elements from the array
TS=${VALUE_ARRAY[0]}
CONN_ACTIVE=${VALUE_ARRAY[1]}
CONN_TOTAL=${VALUE_ARRAY[2]}
CONN_HC_RATE=${VALUE_ARRAY[3]}
CONN_CLIENTS=${VALUE_ARRAY[4]}
CONN_PEERS=${VALUE_ARRAY[5]}
RT_STREAMS=${VALUE_ARRAY[6]}
RT_RX_BLOBS=${VALUE_ARRAY[7]}
RT_TX_BLOBS=${VALUE_ARRAY[8]}
Q_DEPTH=${VALUE_ARRAY[9]}
Q_MAX_DEPTH=${VALUE_ARRAY[10]}
Q_OP_COUNT=${VALUE_ARRAY[11]}
Q_OP_RATE=${VALUE_ARRAY[12]}
Q_OP_RATE_EMA=${VALUE_ARRAY[13]}
}
# Display our pretty table
function showTable()
{
# Only show the table header the first time this function is called
if [ "${SHOW_TABLE_HEADER}" == "1" ]; then
SHOW_TABLE_HEADER=0
echo "Rallypoint Status Display for node ${RP_ID}"
echo "Copyright (c) 2020 Rally Tactical Systems, Inc."
echo "------------------------------ ------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------"
echo " Timestamp Uptime Actv Conns Tot Conns HC Rate Clients Peers Streams RX Blobs TX Blobs Q Depth Q Max Q Ops Q Rate Q Rate EMA"
echo "------------------------------ ------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------"
fi
# Make a human readable as-of timestamp
ASOF=`date -d@${TS}`
printf "%30s %12s %10d %10d %10.2f %10d %10d %10d %10d %10d %10d %10d %10d %10.2f %10.2f\n" \
"${ASOF}" \
"${UPTIME}" \
"${CONN_ACTIVE}" \
"${CONN_TOTAL}" \
"${CONN_HC_RATE}" \
"${CONN_CLIENTS}" \
"${CONN_PEERS}" \
"${RT_STREAMS}" \
"${RT_RX_BLOBS}" \
"${RT_TX_BLOBS}" \
"${Q_DEPTH}" \
"${Q_MAX_DEPTH}" \
"${Q_OP_COUNT}" \
"${Q_OP_RATE}" \
"${Q_OP_RATE_EMA}"
}
# We'll go round and round here
while [ true ]; do
# Load values from the JSON file
loadVals
# Is the timestamp different from the last time we read it? If not,
# there's no need to print a new line in the table
if [ "${TS}" != "${LAST_TS}" ]; then
LAST_TS="${TS}"
# Show the table (actually just one line in the table)
showTable
fi
# Go to sleep for a little while
sleep ${UPDATE_CHECK_SECS}
done
Example output from this script looks as follows (in this case Engage clients we connecting disconnecting frequently and only two peers were configured in the mesh):
$ /mnt/efs/shared/rallypointd/rpstatus.sh
Rallypoint Status Display for node i-07ff7082e6969e259
Copyright (c) 2020 Rally Tactical Systems, Inc.
------------------------------ ------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
Timestamp Uptime Actv Conns Tot Conns HC Rate Clients Peers Streams RX Blobs TX Blobs Q Depth Q Max Q Ops Q Rate Q Rate EMA
------------------------------ ------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
Fri Mar 20 02:15:13 UTC 2020 08:08:58 40 1921 0.93 39 1 5 691490 26112909 0 172 847366 73.10 6.91
Fri Mar 20 02:15:43 UTC 2020 08:09:18 46 1927 0.80 45 1 5 693859 26214356 0 172 849964 86.60 6.91
Fri Mar 20 02:16:13 UTC 2020 08:09:49 2 1934 0.90 1 1 5 695355 26285940 0 172 851725 58.70 6.91
Fri Mar 20 02:16:43 UTC 2020 08:10:19 11 1943 0.97 10 1 5 695834 26288993 0 172 852348 20.77 6.92
Fri Mar 20 02:17:13 UTC 2020 08:10:49 16 1948 0.60 15 1 5 696703 26299841 0 172 853350 33.40 6.92
Fri Mar 20 02:17:43 UTC 2020 08:11:19 21 1953 0.97 20 1 5 697811 26318936 0 172 854607 41.90 6.92
Fri Mar 20 02:18:13 UTC 2020 08:11:49 28 1960 0.97 27 1 5 699118 26350193 0 172 856093 49.53 6.92
Fri Mar 20 02:18:43 UTC 2020 08:12:19 38 1970 0.67 37 1 5 700926 26409198 0 172 858123 67.67 6.93
Fri Mar 20 02:19:13 UTC 2020 08:12:49 46 1978 1.03 45 1 5 703316 26507774 0 172 860747 87.47 6.93
Fri Mar 20 02:19:43 UTC 2020 08:13:19 3 1985 0.83 2 1 5 704682 26573009 0 172 862369 54.07 6.94
Fri Mar 20 02:20:13 UTC 2020 08:13:49 12 1994 0.93 11 1 5 705205 26576656 0 172 863034 22.17 6.94
Fri Mar 20 02:20:43 UTC 2020 08:14:19 21 2003 0.77 20 1 5 706156 26591777 0 172 864154 37.33 6.94
Fri Mar 20 02:21:14 UTC 2020 08:14:49 28 2010 0.80 27 1 5 707295 26618490 1 172 865472 43.93 6.94
Fri Mar 20 02:21:45 UTC 2020 08:15:19 36 2018 1.03 35 1 5 708779 26663941 0 172 867168 56.53 6.95
Fri Mar 20 02:22:15 UTC 2020 08:15:49 40 2022 0.73 39 1 5 710849 26739301 0 172 869438 75.67 6.95
Fri Mar 20 02:22:45 UTC 2020 08:16:19 45 2027 0.87 44 1 5 713398 26844522 0 172 872205 92.23 6.95
Fri Mar 20 02:23:15 UTC 2020 08:16:49 3 2036 0.83 2 1 5 714856 26912651 0 172 873919 57.13 6.96
Fri Mar 20 02:23:45 UTC 2020 08:17:19 8 2041 0.93 7 1 5 715374 26915291 0 172 874548 20.97 6.96
Fri Mar 20 02:24:15 UTC 2020 08:17:49 15 2048 0.87 14 1 5 716251 26925560 0 172 875570 34.07 6.96
Fri Mar 20 02:24:45 UTC 2020 08:18:20 23 2056 0.83 22 1 5 717385 26946167 0 172 876875 43.50 6.96
Fri Mar 20 02:25:15 UTC 2020 08:18:50 33 2066 1.00 32 1 5 718847 26984451 0 172 878544 55.63 6.97
Fri Mar 20 02:25:45 UTC 2020 08:19:20 39 2072 0.87 38 1 5 720672 27048113 0 172 880577 67.77 6.97
Fri Mar 20 02:26:15 UTC 2020 08:19:50 46 2079 0.77 45 1 5 723055 27146906 0 172 883192 87.17 6.98
Fri Mar 20 02:26:45 UTC 2020 08:20:20 2 2084 1.00 1 1 5 724372 27208731 0 172 884746 51.80 6.98
Fri Mar 20 02:27:15 UTC 2020 08:20:50 10 2092 0.93 9 1 5 724956 27212535 0 172 885465 23.97 6.98
Fri Mar 20 02:27:45 UTC 2020 08:21:20 18 2100 0.90 17 1 5 725827 27224674 0 172 886493 34.27 6.99
Fri Mar 20 02:28:16 UTC 2020 08:21:50 25 2107 0.87 24 1 5 727009 27249273 0 172 887845 45.07 6.99
Fri Mar 20 02:28:46 UTC 2020 08:22:20 32 2114 0.97 31 1 5 728467 27291089 0 172 889499 55.13 7.00
Fri Mar 20 02:29:16 UTC 2020 08:22:50 38 2120 0.93 37 1 5 730261 27353807 0 172 891498 66.63 7.01
If your Engage client (or peering Rallypoint) is having trouble reaching a Rallypoint, a quick check of the connection is helpful. Now, you can certainly comb through logs and such to track down the problem but the first thing to do is to see if you can even get to the far end. The simplest way to do this is to open a TLS connection to the Rallypoint in question.
We can do this by using a tool that uses TLS connections such as a web browser or the command-line openssl
tool.
For example: Let's say we're trying to reach rp.example.com
. Unless you've told the RP to listen for incomimg connections on a different port than the default; it'll listen on port 7443
. So we'll specify 7443
as the port.
Enter the URL as follows (be sure to specify https
to tell the browser to use TLS):
https://rp.example.com:7443
Ideally we'd get something like this (Chrome in this case):
This site canโt provide a secure connection
rp.example.com uses an unsupported protocol.
ERR_SSL_VERSION_OR_CIPHER_MISMATCH
Unsupported protocol
The client and server don't support a common SSL protocol version or cipher suite.
This means that the browser resolved the host name (rp.example.com
), managed to open a TCP connection to port 7443
, and began TLS negotiation. But because the RP is not a web server (it really isn't), the browser isn't going to be able to complete the TLS negotiation. That's OK - we've proven that the RP can be reached.
openssl s_client -host rp.example.com -port 7443
Here we'd like to see something like this:
CONNECTED(00000006)
depth=0 C = US, ST = Washington, L = Seattle, O = "Rally Tactical Systems, Inc.", OU = "(c) 2019 Rally Tactical Systems, Inc. - For authorized use only", CN = Rallypoint Factory Default Certificate, emailAddress = [email protected]
verify error:num=20:unable to get local issuer certificate
.
.
.
Just like with the browser test, openssl
will yell about not being to verify certificates and such. But that's OK - we're not expecting the TLS connection to actually be setup and finalized. We just want to make sure we can reach the RP.
If you don't get any joy with the above then you're going to need to check the error the browser or openssl
reports. Some of the following are possible.
-
The RP may not be running on the machine. Check to see if the
rallypointd
process is actually up. -
The RP process may be hung or in some other suspended state. Try restarting the process.
-
DNS may not be resolving the host name. Try using the host's IP address.
-
The machine where the RP resides is blocking incomimg TCP connections to port
7443
. Open that port on the RP's firewall. -
The TCP pathway between your machine and the RP may be blocking TCP and/or port 7443 traffic. Check things like routers, switches, VLANs, proxies, and the like. (Yeah, that's a pain and its often the daily experience of network admins everywhere. So give them all a thought while you're at it.)
If your client (or peering RP) still can't connect after all this, then we need to delve a little deeper.
What we mean by CONNECT is that a TCP connection is established but drops almost right away.
-
There may be a security issue such as the RP not accepting the connecting entity's X.509 certificate, or the connecting entity not accepting the RP's X.509 certificate. Check the logs on both sides to see what gives. Also, using that
openssl
example from above, have a look at the certificate the RP is presenting. You should know what your client-side will accept, so check out that RP's certificate. -
The RP may not be allowing connections because it's experiencing a load beyond the limits that it's been given. Check the
limits
section of yourrallypointd_conf.json
file as well as the RP's logs and status file. -
Run a
Wireshark
ortcpdump
capture to see at what point the link is dropped.
Alright, we've confirmed that we can actually reach the RP (networking is good!). We've also confirmed that we can connect to the RP (the link stays up!). This is good.
But, some (or maybe even all) our groups are not getting connected.
There's two possible reasons for this.
-
Registration of a group may exceed the limits that have been set by the RP's administrator. Check the
limits
section of yourrallypointd_conf.json
file as well as the RP's logs and status file. -
The RP may have had restrictions placed on what group identifiers can be registered. Check the
groupRestrictions
andextendedGroupRestrictions
sections of yourrallypointd_conf.json
file.