Engage Rallypoints - rallytac/pub GitHub Wiki

Engage uses multicast capabilities inherent your network to provide a transport mechanism between Engines (and therefore the users of those Engines.). However, what if your network doesn't support multicast or, often the case, you need to communicate over someone else's network - such as the Internet?

That's where Engage Rallypoints come in.

Engage's Rallypoints are small, super-fast, packet routers designed to securely forward packets between Engage Engines that are unable to communicate with each other over multicast. So, in our case of where Engage users need to speak with each other over something like the Internet, Rallypoints provide the means to do so.

Prerequisites

Operating System

Rallypoints run on Linux, MacOSX, and Microsoft Windows at this time - with Linux being the preferred platform. For Linux, you'll need either a branch of Red Hat or Debian. For Red Hat, we recommend CentOS 7 or higher while for Debian, we recommend Ubuntu 18 or higher or Debian 9 or higher.

IP Ports

TCP

The default inbound TCP port is 7443 and uses TLS v1.3. Make sure this port is open for inbound conections from Engage clients and other Rallypoints through your firewalls and other network infrastructure. (We mention that this is TLS so that if your infrastructure environment conducts deep packet inspection, TLS-passthrough, or other such operation for purposes of DoS attack detection and the like; you can configure it accordingly.)

Also, for environments where load balancers or other network infrastructure systems check on the availability or health of the process by opening TCP connections, the Rallypoint may be configured to listen for inbound connections on that port. If you are operating in such an environment, make sure that the port you configure for this purpose is opened. This "health check port" does not typically have traffic going back and forth - most health checkers simply open the connection and either close it right away or keep it open for a period of time. Make sure that if you enable this, DoS detection logic in firewalls and/or your operating system may need to be tuned to handle fast connect/disconnect operations from the health checker.

UDP

While a Rallypoint fundamentally serves to route packets between entities using TCP, it also supports UDP over multicast. This functionality is provided with the intent of Rallypoints forwarding traffic from unicast TCP to multicast UDP. This capability can be used to create a multicast backbone link between Rallypoints and/or route traffic from non-Rallypoint entities (including multicasting Engage Engines) operating on multicast to Engage-based entities using unicast. If you are going to be forwarding multicast traffic over unicast (and vice versa), make sure your Rallypoint machine has its firewall setup for multicast RX and TX and that the necessary UDP ports are opened for inbound and outbound traffic.

Installation

Prepackaged

A Rallypoint is most easily installed by using the package manager for your operating system using the appropriate installation package provided by Rally Tactical Systems. These packages will install the necessary binaries, factory default certificates, and a baseline configuration. They will also setup the Rallypoint to operate as a daemon (background service) that starts at operating system boot time. This is done using systemd on Linux platforms and launchd on OSX.

For Red Hat distributions:

sudo yum install <rallypoint_package_file>.rpm

For Debian-based distributions:

sudo apt install <rallypoint_package_file>.deb

NOTE: In the above examples we're telling yum or apt to run the installation from a file. So, to ensure that these tools will try to use the file and not a named package from a repository, you need to tell them you're referring to a file. Do this by changing to the directory where the file is located and then preceding the file name with ./. For example:

sudo yum install ./rallypointd-1.189.9026-0.x86_64.rpm

For OSX, open the <rallypoint_package_file>.dmg file and double-click the install link icon.

Manual Installation

If you need to conduct a more sophisticated installation procedure, need to run the Rallypoint process manually (not as a background daemon for example), or generally just have more complex needs for your Rallypoint setup, you will need to install the relevant items manually. This is a pretty straightforward process so it shouldn't be too difficult.

Let's get going by assuming we're not yet configuring (or perhaps never configuring) to run as a background service.

  • Place the rallypointd executable anywhere you'd like. This can be in a custom directory or in a standard executable location such as /usr/sbin. As long as the code can be executed from that location, you're good to go.
  • Place the security-related certificate and key files in a location where the Rallypoint can read them. These include the file containing the Rallypoint's certificate, and the file containing that certificate's private key. (Be sure that this location is strictly only accessible to the Rallypoint and any other authorized applications and/or users.) You will also need to place CA certificates used to verify client and peer Rallypoint certificates in a location accessible to the Rallypoint.
  • Finally, place your configuration file in a location where the Rallypoint can read it. By default, the Rallypoint looks for /etc/rallypointd/rallypointd_conf.json for its configuration.

Note: If you do place your configuration file in a different location or give it a different name, then you'll need to tell the Rallypoint to use that file. Do this with the -cfg command-line parameter. For example:

rallypointd -cfg:my_custom_configuration.json

Manual Daemon Configuration

Now that you've got everything installed manually, you may still want to setup the Rallypoint to run as a daemon on startup and avail yourself of the services offered by the operating system. In a Linux environment, this is easily done by setting up the required configuration for systemd or the more old-style init.rc method. Refer to your operating system instructions on how to do this.

Setting up the service for systemd-like operation on Mac OSX systems is a little more tricky. Your best bet is to refer to Apple's documentation at :

https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/CreatingLaunchdJobs.html

Operation

Once the code is installed and configured (more on that below), your Rallypoint should just start up and begin accepting connections from clients and/or other Rallypoint peers. If the code is running as a daemon under systemd, you can use the standard systemd-related methods of interacting with your daemon. Such as:

Operation Command Line
Starting the service sudo systemctl start rallypointd
Stopping the service sudo systemctl stop rallypointd
Restarting the service sudo systemctl restart rallypoint
Query service status sudo systemctl status rallypointd
Watch the log sudo journalctl -f -u rallypointd

If you are running rallypointd from the command line, you will see the log output in the terminal window. To stop the process, simply press Ctrl-C or use the kill command to stop the process.

Monitoring

You can monitor the Rallypoint in a variety of ways.

Logging

The simplest is by viewing the output log displayed in a terminal window either directly from the process or, if running as a daemon, using journalctl as described above.

The output log is also sent to the standard operating system logging subsystem. This would be syslog on Linux systems and Apple's high-performance logger on OSX systems.

All of these log messages follow the syslog standard format including the timestamp of the message and severity level. These outputs can then be analyzed by log-processing tools such as SolarWinds, PaperTrial, and so on for purposes of generating alerts to administrative personnel or automated systems.

Note: If viewing directly in the terminal that has ANSI color-coding capabilities; the log lines are colorized to make it easier to spot

Status Report

In addition to the log, the Rallypoint can be configured to produce a status report on a periodic basis. This report is in JSON format and written to a file specified in the configuration at an interval you decide (we recommend 30 seconds intervals, with 5 seconds as the minimum). This JSON file can then be analyzed to determine the health of the Rallypoint.

Here's an example:

{
   "id":"demorp0001",            // The Rallypoint's instance identifier
   "ts":119790397,               // UTC UNIX timestamp (number of seconds 
                                 // since Jan 1, 1970) of when this report 
                                 // was produced
   "uptime": 149663,             // Number of seconds the process has been
                                 // up
   "systemCpuLoad":4.86,         // Percentage CPU load of the machine instance 
                                 // hosting the Rallypoint

   "connections":
   {      
      "active":1,                // Number of active client connections
      "denied":0,                // Number of connection request denied
      "total":3                  // Total process lifetime count client 
                                 // connections*
   },

   "healthChecks":
   {
      "count":0,                 // Number of TCP health checks made by a 
                                 // load-balancer or other network 
                                 // infrastructure entity
      "rate":0.0,                // Health checks per second 
      "rateEma":0.0              // Exponential moving average of health 
                                 // checks per second 
   },

   "peers":
   {
      "configuredConnectedCount":0,    // Number of configured peer 
                                       // connections that are connected
      "configuredCount":0,             // Number of configured peer connections
      "count":0                        // Number of connected Rallypoint 
                                       // peers (inbound and outbound)
      "leafConnectedCount":0,          // Number of peers that are inbound leaf
                                       // peer nodes
      "list":[]                        // List of peers
   },

   "queue":
   {
      "avgExecNanos":60567,               // Average number of nanoseconds a queue 
                                          // operation takes to execute
      "maxExecNanos":7733634,             // Longest number of nanoseconds a queue
                                          // operation took to execute
      "minExecNanos":0,                   // Least number of nanoseconds a queue
                                          // operation took to execute
      "lowPriorityQueueDepth":0,          // Current number of operations waiting in the 
                                          // low priority queue
      "lowPriorityQueueMaxDepth":0        // Maximum number of operations in the low-
                                          // priority queue
      "lowPriorityQueueFailures":0,       // Number of operations denied entrance to
                                          // the low-priority queue due to load
      "normalPriorityQueueDepth":0,       // Current number of operations waiting in the 
                                          // normal-priority queue
      "normalPriorityQueueMaxDepth":1,    // Maximum number of operations in the normal-
                                          // priority queue
      "normalPriorityQueueFailures":0,    // Number of operations denied entrance to
                                          // the normal-priority queue due to load

      "ops":
      {
         "count":3360,           // Total number of process 
                                 // lifetime operations
         "rate":145.0,           // Operations per second
         "rateEma":15.002        // Exponential moving average 
                                 // of operations per second
      },
      "spuriousWakeups":114,     // Number of queue wakeups that
                                 // resulting in a no-op
      "wakeUps":3346             // Total number of queue wakeups
   },

   "routing":
   {
      "blobs":
      {
         "rx":
         {
            "bytes":293895,      // Number of received blob bytes
            "packets":3118       // Number of received blob packets
         },

         "tx":
         {
            "bytes":292495,      // Number of transmitted blob bytes
            "packets":3116       // Number of transmitted blob packets
         }
      },
      "paths":28,                // Number of potential uni-directional 
                                 // media stream pathways defined by the 
                                 // routing table
      "streams":5                // Number of registered streams
   },

   "rx":
   {
      "bytes":303209,            // Number of received bytes
      "packets":3133,            // Number of received packets
      "rate":106643.2,           // Received bytes per second
      "rateEma":9613.413         // Exponential moving average of
                                 // received bytes per second
   },

   "tx":
   {
      "bytes":297647,            // Number of transmitted bytes
      "packets":3131,            // Number of transmitted packets
      "rate":106643.2,           // Transmitted bytes per second
      "rateEma":9397.57          // Exponential moving average of
                                 // transmitted bytes per second
   },

   "throughput":
   {
      "rate":0,                  // Active network I/O throughput
                                 // in bits per second
      "rateEma":0                // Exponential moving average of
                                 // network I/O through (bps)
   }
}

By far, the most important element in the report that indicates performance and, therefore, user experience is the queue/normalPriorityQueueDepth value. Values greater than zero for a prolonged period of time indicate that the Rallypoint is falling behind in processing packets - resulting in degraded audio quality and high latencies. This could be due to CPU pressure, memory overload, or simply I/O backup. This can be addressed by scaling up the performance of the machine/VM hosting the Rallypoint or by bringing additional Rallypoints online if you are operating a meshed Rallypoint cloud.

As a matter of interest, though, is the queue/ops/rate element. This indicates how the number operations per second the Rallypoint is carrying out. (An "operation" being something like routing a packet, handling a request from a client, and so on.) On a reasonably powerful server-class machine, the Rallypoint comfortably processes in excess of 500,000 operations per second through its queue. So the example of 145.0 per second in the above JSON means that this particular Rallypoint is basically doing nothing :).

Configuration

Now that you've seen how to install the software, operate it, and monitor it; it's a good idea to get into how you actually configure it. Here goes ...

We want to take a moment to stress something.

It is vitally important that you safeguard access to your X.509 certificate files and, in particular, private keys. While certificates are ultimately public, private keys are just that - PRIVATE. Make sure that only your Rallypoint and other authorized users and entities have access to these files.

As you've probably guess by now, a Rallypoint is configured using a JSON file that it looks for at /etc/rallypointd/rallypointd_conf.json or one that is specified at the command line with the -cfg argument.

Here's an example:

{
   "id":"rp0001",                         
   "listenPort":7443,
   "interfaceName":"en0",
   "multicastInterfaceName":"en0",
   "allowMulticastForwarding":false,
   "ioPools":-1,
   "allowPeerForwarding":false,
   "isMeshLeaf":false,
   
   "certStoreFileName":"/etc/rallypointd/rallypointd.certstore",
   "certStorePasswordHex":"",

   "peeringConfigurationFileName": "",
   "peeringConfigurationFileCommand":"",
   "peeringConfigurationFileCheckSecs":30,

   "limits":
   {
      "maxClients":0,
      "maxPeers":0,
      "maxMulticastReflectors":0,
      "maxRegisteredStreams":0,
      "maxStreamPaths":0,
      "maxRxPacketsPerSec":0,
      "maxTxPacketsPerSec":0,
      "maxRxBytesPerSec":0,
      "maxTxBytesPerSec":0,
      "maxQOpsPerSec":0
   },
โ€‹
   "statusReport":
   {
      "enabled":true,
      "fileName":"/tmp/${id}_status.json",
      "intervalSecs":30,
      "includeLinks":true,
      "includePeerLinkDetails":true,
      "includeClientLinkDetails":false
   },
   โ€‹
   "linkGraph":
   {
      "enabled":true,
      "fileName":"/tmp/${id}_links.dot",
      "minRefreshSecs":5,
      "includeDigraphEnclosure":true,
      "includeClients":false,
      "coreRpStyling":"[shape=hexagon color=firebrick style=filled]",
      "leafRpStyling":"[shape=box color=gray style=filled]",
      "clientStyling":"[dir=none]"
   },        

   "externalHealthCheckResponder":
   {
      "listenPort":0,
      "immediateClose":true
   },
        
   "certificate":
   {
      "certificate":"@certstore://rtsFactoryDefaultRpSrv",
      "key":"@certstore://rtsFactoryDefaultRpSrv"
   },
โ€‹
   "tls":
   {
      "verifyPeers":true,
      "allowSelfSignedCertificates":false,
      "caCertificates":
      [
         "@certstore://rtsCA"
      ],
      "crlSerials":
      [
         "ad:de:61:33:99:67:21:e1",
         "6B:D6:13:51:42:F5:04:31"
      ]
   },

   "fipsCrypto":{
      "enabled":false,
      "path":"/etc/rallypointd",
      "curves":"secp521r1",
      "ciphers":"TLS_AES_256_GCM_SHA384"
   },

   "txOptions":
   {
      "priority":4,
      "ttl":128
   },

   "multicastTxOptions":
   {
      "priority":4,
      "ttl":1
   }   

   "multicastRestrictions": 
   {
      "type":1,
      "elements":
      [
         {
            "rx": 
            {
               "address":"234.1.2.3",
               "port":25000
            },
            "tx": 
            {
               "address":"234.1.2.3",
               "port":25000
            }
         },

         {
            "rx": 
            {
               "address":"234.5.6.7",
               "port":17222
            },
            "tx": 
            {
               "address":"234.5.6.7",
               "port":17222
            }
         }
      ]
   },

   "groupRestrictionAccessPolicyType": 0,

   "groupRestrictions":
   {
      "type":1,
      "elements":
      [
         "{58e3e468-c0e1-4ad8-86d2-9931251e6ea0}",
         "{b7694a4f-9724-44a6-ae57-c63232ad1f57}"
      ]
   },

   "extendedGroupRestrictions": [
      {
         "id": "{58e3e468-c0e1-4ad8-86d2-9931251e6ea0}",
         "restrictions": [
            {
               "type": 1,
               "elementsType": 2,
               "elements": [
                  "-educators",
                  "-teachers",
                  "-instructors"
               ]
            },
            {
               "type": 2,
               "elementsType": 5,
               "elements": [
                  "ST=WA",
                  "ST=ID"
               ]
            }
         ]
      },
      {
         "id": "{b7694a4f-9724-44a6-ae57-c63232ad1f57}",
         "restrictions": [
            {
               "type": 1,
               "elementsType": 6,
               "elements": [
                  "O=My Fictional Issuing Organization"
               ]
            }
         ]
      }
   ],
      
	"staticReflectors": 
   [
      {
         "id":"{58e3e468-c0e1-4ad8-86d2-9931251e6ea0}",
         "rx":
         {
            "address":"234.1.2.3",
            "port":25000
         },
         "tx":
         {
            "address":"234.1.2.3",
            "port":25000
         }
      },
      {
         "id":"{b7694a4f-9724-44a6-ae57-c63232ad1f57}",
         "rx":
         {
            "address":"234.5.6.7",
            "port":17222
         },
         "tx":
         {
            "address":"234.5.6.7",
            "port":17222
         }
      }
   ]
}

Let's go through these in detail...

root

The root of the configuration has a number of elements that key to the operation of the Rallypoint.

  • id is a unique string you assign that identifies this instance of the Rallypoint. Its very important that this string be unique because it has meaning in situations where multiple Rallypoints are meshed together. In fact, if you leave this element blank, the Rallypoint will generate a value automatically. But that won't work terribly well for meshing so we recommend you always set this value The computer's host name works well here or, in the situations like Docker containers or cloud instances, the ID assigned by the container/cloud system.

  • listenPort is the TCP port that the Rallypoint listens on for TLS connections from Engage clients and other Rallypoints. As described earlier, this port runs TLS so make sure that firewalls and such allow inbound connections to this port for TLS - and TLS only! The default is 7443 but you can assign any valid TCP port.

  • interfaceName is the name of the operating system network interface card (NIC) used by the listenPort. If you don't assign a name, the Rallypoint will bind on all NICs for the listenPort.

  • multicastInterfaceName is the name of the NIC used for receiving and sending multicast UDP traffic. This can be the same as interfaceName or different if you want to forward onto a backbone via another interface. Due to security concerns, you must set the multicast interface name. If you leave it blank, multicast forwarding will be disabled.

  • allowMulticastForwarding indicates whether forwarding of traffic to multicast is allowed. The default is false. If enabled, endpoints that register streams containing multicast addressing information will cause the Rallypoint to automatically forward traffic. No other local configuration is required. Also note that when forward unicast TLS traffic to multicast, the TLS security envelope (the "TRANSEC") is removed and only the contents of the TLS payload is forwarded. You should ensure that the Engage groups your clients are using are encrypted (known as the "COMSEC") to prevent unauthorized access to that traffic. That said, if you are setting up multicast forwarding so that group traffic is relayed to third-party entities (such as LMR gateways) that do not support encryption, your Engage groups will have to be unencrypted.

  • ioPools indicates to the Rallypoint how many threads of parallel operation sould be setup for network I/O. If you leave this blank or set it to -1, the Rallypoint will setup 1 I/O pool per CPU. (It's best to leave this element at -1 unless you have a very specific requirement to chage it).

  • allowPeerForwarding indicates whether unicast traffic received from Rallypoint peers in a mesh should be forwarded to other Rallypoints. This is an experimental feature at this time and should be left disabled unless you're fully aware of the implications of nasty things such as packet loops being created on your network.

  • isMeshLeaf indicates whether the Rallypoint is a "leaf" hanging off a mesh or is part of the core mesh. More on this further down in Connecting Into A Mesh.

  • certStoreFileName is the path to the certificate store to be used. See Engage Security for more information.

  • certStorePasswordHex is the hex representating of the password/passphrase protecting the certificate store. See Engage Security for more information.

  • peeringConfigurationFileName is the name of the file where the Rallypoint should load details about the mesh of which it forms a part. If there is no mesh, then you can leave this element blank. (More on this later.)

  • peeringConfigurationFileCommand is an operating system command to run instead of polling the file named by``peeringConfigurationFileName**. Its important that this command must send its output to STDOUT and is JSON formatted as per the mesh configuration file. Also, this command must execute and complete as quickly as possible (less than 30 seconds) or the Rallypoint will attempt to terminate that process.

IMPORTANT: peeringConfigurationFileName and peeringConfigurationFileCommand are mutually exclusive. If you set values for both elements, the Rallypoint will fail to configure and abort operation.

  • peeringConfigurationFileCheckSecs is the interval (in seconds) that the Rallypoint will check the file defined by peeringConfigurationFileName for changes or, if peeringConfigurationFileCommand is defined, the interval at which to run the command.

limits

This object defines boundaries and limits for optimal operation as well as provide a means for the Rallypoint to report metrics to monitoring systems.

  • maxClients sets the maximum number of client connections allowed. Set to 0 to disable.
  • maxPeers sets the maximum number of peer connections allowed. Set to 0 to disable.
  • maxMulticastReflectors sets the maximum number of multicast reflectors allowed. Set to 0 to disable.
  • maxRegisteredStreams sets the maximum number of streams that may be registered. Set to 0 to disable.
  • maxStreamPaths sets the maximum number of stream pathways that may exist. Set to 0 to disable.
  • maxRxPacketsPerSec sets the maximum number of received packets per second allowed. Set to 0 to disable.
  • maxTxPacketsPerSec sets the maximum number of transmitted packets per second allowed. Set to 0 to disable.
  • maxRxBytesPerSec sets the maximum number of received bytes per second allowed. Set to 0 to disable.
  • maxTxBytesPerSec sets the maximum number of transmitted bytes per second allowed. Set to 0 to disable.
  • maxQOpsPerSec sets the maximum number of queue operations per second allowed. Set to 0 to disable.

statusReport

This object contains details about the status report generation as described earlier.

  • enabled set to true to enable - default is false.
  • fileName is the full path name of the JSON file where the status report should be written. If you leave this element blank, no status report will be produced. Include "${id}" to have the Rallypoint ID inserted into the name.
  • intervalSecs is the interval (in seconds) at which the status report should be produced. While you can set this as low as 1 second, we recommend lower than 5 seconds. 30 seconds is generally a reasonable value.
  • includeLinks indicates whether link information is to be included - default is false.
  • includePeerLinkDetails set to true to include details about peer links if includeLinks is true - default is false.
  • includeClientLinkDetails set to true to include details about client links if includeLinks is true - default is false.

linkGraph

Contains details for creation of a Graphviz file with a graphical representation of links.

  • enabled set to true to enable - default is false.
  • fileName is the full path name of the Graphiviz file to be written. If you leave this element blank, no file will be produced. Include "${id}" to have the Rallypoint ID inserted into the name.
  • minRefreshSecs is the minimum time (in seconds) between updates to the file.
  • includeDigraphEnclosure set to true to surround the output in a "strict digraph" enclosure - default is true.
  • includeClients set to true to include client links - default is false.
  • coreRpStyling Graphiviz styling to be use to represent a core Rallypoint node.
  • leafRpStyling Graphiviz styling to be use to represent a leaf Rallypoint node.
  • clientStyling Graphiviz styling to be use to represent a client node.

externalHealthCheckResponder

Contains information for external health checker interoperability. "Health checkers" are generally network entities such as load balancers and network management systems that monitor the Rallypoint to determine their health and availability. In the most setups, though, health checkers do simple things like open a TCP connection to the process to verify that it's operational. And they generally close that connection right away.

  • listenPort is the TCP port that the Rallypoint should listen on for health check TCP connections. Set it to whatever is required/desired by your health checker.

  • immediateClose indicates whether to immediatelt close the connection. (Some health checkers require this.)

certificate

Security is fundamental to Rallypoints, and that security is ensured largely thanks X.509 certificates. The certificate (and associated key) in this section are vital to the operation of the Rallypoint.

  • certificate is the PEM content of the X.509 certificate to be used. If the value if this element starts with @ followed by a file name, the Rallypoint will, instead, read the PEM contents from the file path following the @ sign. If the content starts with "@certstore://", the Rallypoint will load the content from the certificate store element by that name.

  • key is much the same as the PEM content of the X.509 certificate but, in this case, the private key associated with the certificate. The same logic with the @ sign applies.

tls

Further to security, this section deals with TLS connections with entities such as clients as Rallypoint peers.

  • verifyPeers indicates whether the Rallypoint should ask for, and verify, the far-end's X.509 certificate - ensuring mutual authentication/verification as well as certificate-based message-signing. It's a REALLY good idea to enable this.

  • allowSelfSignedCertificates indicates whether the Rallypoint will accept certificates from entities that have been self-signed - i.e. not issued by a known certificate authority. You should generally leave this disabled.

  • caCertificates is an array of CA certificates that the Rallypoint will use to verify client certificates. As with other certificate setting, these strings can be the actual PEM text or references to files or certificate store elements - both using the @ nomenclature.

  • crlSerials is an array of strings - each representing the serial number of a revoked certificate that the Rallypoint should deny (in essence, a Rallypoint-specific Certificate Revocation List). The strings are not case-sensitive but must be in the format "xx:xx:xx:xx:xx:xx:xx:xx" - i.e. a : seperating the hexidecimal representation of each byte of the serial number. Please note, the examples shown are not necessarily real serial numbers - they are for demonstration purposes only.

fipsCrypto

Used to configure FIPS140-2 settings.

  • enabled activates FIPS mode when set to true.

  • path is the absolute path name to the directory where the rts-fips FIPS module is located. Note that this value is for the directory, not the actual file name itself.

  • curves specifies a list of elliptic curves to be supported in FIPS mode. The singular default is the currently highest level NIST-approved secp521r1 curve, although secp384r1 and secp256r1 may also be used. For example: to specify all NIST-approved curves, this setting would be secp521r1:secp384r1:secp256r1.

  • ciphers specifies a list of ciphers to be supported in FIPS mode. The singular default is the NIST-approved TLS_AES_256_GCM_SHA384 cipher, although TLS_AES_128_GCM_SHA256 may also be used. For example: to specify all NIST-approved curves, this setting would be TLS_AES_256_GCM_SHA384:TLS_AES_128_GCM_SHA256.

txOptions

These settings have to do with how packets are sent from the Rallypoint to the far-end.

  • priority sets the QoS-related priority for transmitted packets. See Engage and Network Quality Of Service for more information.

  • ttl sets the IP Time-To-Live value for packets. This may have differing effects on different operating systems.

multicastTxOptions

These settings have to do with how packets are sent from the Rallypoint over multicast and are only applicable for situations like multicast reflecting.

  • priority sets the QoS-related priority for transmitted packets. See Engage and Network Quality Of Service for more information.

  • ttl sets the IP Time-To-Live value for packets. Be sure that you understand what changing the TTL value is for your multicast environment.

multicastRestrictions

This object describes how multicast addresses are restricted. You can either restrict multicast to only a set of elements, or exclude a set of elements. More on this later in the Multicast Reflection section.

  • type indicates the elements are to be treated. 1 as a "whitelist", 2 as a blacklist.
  • elements is an array of objects that, each, describe a multicast address and port pairing for RX and TX.

groupRestrictionAccessPolicyType

This setting (values of 0 or 1) has an important impact on the method by which registrations for groups are allowed on the Rallypoint. A value of 0 (the default) indicates that, unless otherwise specified in groupRestrictions or extendedGroupRestrictions (see below), registration for that group will be allowed. In other words, a value of 0 denotes a permissive access policy. A value of 1 on the other hand, enforces a strict access policy whereby registration for a group is denied by default unless it is, at minimum, listed in groupRestrictions and, if desired, further extended in extendedGroupRestrictions.

groupRestrictions

This object describes how group identifiers are restricted. You can either restrict groups to only a set of elements, or exclude a set of elements.

  • type indicates the elements are to be treated. 1 as a "whitelist", 2 as a blacklist.
  • elements is an array of strings - each being a valid Engage group ID.

extendedGroupRestrictions

This object extends the basic group restrictions that are defined in groupRestrictions. The basic idea here is that if groupRestrictions is viewed as sort of a blunt instrument that simply allows or denies access to groups, the extendedGroupRestrictions object places more fine-grained control within those groups. For example, while you may want to allow registration of {58e3e468-c0e1-4ad8-86d2-9931251e6ea0} , you may want to restrict access only to certain sets of users - ideally based on the X.509 certificates they present when connecting. See below for a more detailed discussion on this subject.

  • id is the ID of the group we're working with.
  • restrictions is the container object for the group's specialized restriction set.
  • type indicates the elements are to be treated. 1 as a "whitelist", 2 as a blacklist.
  • elementsType indicates how the Rallypoint should interprety the list of elements. See blow
  • elements is an array of strings - each interpreted as per elementsType.

staticReflectors

This is an array of multicast reflectors that need to be maintained for the lifetime of the Rallypoint process. More on this later in the Multicast Reflection section.

Meshing

OK great, you've setup a Rallypoint and you have lots of users connecting to it. But you're running our of CPU because you have thousands of users talking away like crazy. Or, you have have groups of people in different parts of the world that need to connect to their own Rallypoints - but they all still want to communicate with each other. Or, you have a need to provide additional Rallypoints for failover and redundancy purposes. Or, something else ...

Well, the solution here is generally to just add more Rallypoints.

But, you want all those Rallypoints to interconnect with each other. That's where Rallypoint meshing comes in.

For this example we're going to assume we want to setup a bunch of Rallypoints in a cloud environment such as Amazon Web Services (AWS). We'd like to make those Rallypoints available to our users spread around the world. And we don't want to have those users connect to particular Rallypoints. Rather, we want any user to connect to any available Rallypoint and have the Rallypoints take care of forwarding traffic amongst themselves to create what looks like a big, centralized, cloud-based Rallypoint to our users.

First, we're going to need something that front-ends all these Rallypoints; offering a single DNS name (or single IP address) that all our users connect to. Let's call it cloudrp.example.com. Also, for purposes of this discussion, we'll say we're going to do all this in Amazon Web Services, taking advantage of Amazon's Elastic Load Balancer.

In an ideal situation, we'd use Rallypoints' ability to forward traffic to a multicast backbone to create something like below. (Notice how clients 1, 2, and 3 connect to the load balancer which, in turn, passes those IP connections onwards to one of the three Rallypoints.)

                                                          |
                              (c1)      +------+          |
                        +-------------> |rp0001| <------> |
                        |               +------+          |
                        |                                 |
                        |                                 |
                 +-------------+                          | multicast
  c1 ----------> |             |  (c2)  +------+          | backbone
  c2 ----------> |Load Balancer| -----> |rp0002| <------> |
  c3 ----------> |             |        +------+          |
                 +-------------+                          |
                        |                                 |
                        |                                 |
                        |      (c3)     +------+          |
                        +-------------> |rp0003| <------> |
                                        +------+          |
                                                          |

But, sadly, our cloud provider does not support IP multicast (and that is, in fact, true of AWS as well as most of cloud providers). Also, multicast networks can be difficult to setup and maintain so sometimes its just easier to go with unicast.

So, we'd still like to have the logical setup we described above but we have to figure a way to use something other than a multicast bakbone. The answer is to create a Rallypoint Mesh.

A mesh is simply just a configuratio where Rallypoints connect directly to each other for purposes of traffic forwardng. This is not too much different from the way in which IP multicast works anyway - just that the Rallypoints themselves are doing "multicasting" rather than utilizing the IP network for that purpose.

As you can see below, what we've done is to have each Rallypoint in our cloud connect to every other Rallypoint. Now, when a client connects (indirectly) to a Rallypoint, it's traffic is forwarded to other Rallypoints - just like multicast. It's actually rather straightforward.


                              (c1)      +------+         (pc)
                        +-------------> |rp0001| <---------------+
                        |               +------+                 |
                        |                   ^                    |
                        |                   | (pc)               |
                 +-------------+            |                    |
  c1 ----------> |             |  (c2)      +------->  +------+  |
  c2 ----------> |Load Balancer| ------------------->  |rp0002|  | (pc)
  c3 ----------> |             |            +------->  +------+  |
                 +-------------+            |                    |
                        |                   | (pc)               |
                        |                   v                    | 
                        |      (c3)     +------+                 |
                        +-------------> |rp0003| <---------------+
                                        +------+         (pc)

I bet you have some questions though ...

  • You might be wondering if all traffic from all Rallypoints is forwarded to all other Rallypoints - right? Yup, that would be kinda silly and inefficient so what Rallypoints do is "subscribe" to each other for traffic for individual streams. So if client 1 connected to RP1 and client 3 connected to RP3 are registered/subscribed for the same stream, that stream's traffic only flows between RP1 and RP3 - bypassing RP2.
  • What about security on these links between Ralypoints? Well, they're TLS connections just like from clients to Rallypoints and are subject to the same level of X.509 mutual authentication and TLS encryption.
  • But latency must increase - correct? Well, yes, but only by a miniscule amount. Remember that Rallypoints are packets routers - and just that! They do not process traffic payloads, and therefore the only latency introduced by traffic going a Rallypoint is on the order of microseconds.

Configuration

Configuration for meshing is pretty straightforward. All that the Rallypoints need is some certificate information and information about all the Rallypoints in the mesh.

Here's an example

{
   "peers":[
      {
         "id": "cloudrp0001",
         "enabled":true,
         "host": 
         {
            "address": "cloudrp0001.example.com",
            "port": 7443
         },

         "certificate": 
         {
            "certificate":"@certstore://someOtherCert",
            "key":"@certstore://someOtherCert"
         }
      },

      {
         "id": "cloudrp0002",
         "enabled":false,
         "host": 
         {
            "address": "cloudrp0002.example.com",
            "port": 7443
         }
      }
   ]
}

peers

This is an array of peer objects, each describing an individual peer in the mesh as follows:

  • id is the unique ID of the peer and should match the id field in the Rallypoint's configuration.

  • enabled indicates whether this peer is enabled - i.e. whether other Rallypoints should connect to it. This element is useful if you want to disable the connection to peer without removing the information from the mesh file.

  • host/address is the DNS name or IP address of the peer.

  • host/port is the peer's TCP port to connect to and must match the listenPort in that peer's configuration.

  • certificate allows you to specify a particular X.509 certificate and private key for this peer - i.e. not use the default.

Multicast Reflection

As we saw above in Meshing, a Rallypoint has the ability to connect to the local multicast network to use that network as a backbone for passing traffic between the nodes in the mesh. Well, that's not where it ends. In fact, the Rallypoint can forward any type of traffic on multicast. And this is especially useful when you want to exchange traffic between unicast endpoints (such as Engage Engines and other Rallypoints) and multicast endpoints.

But before we get into that, let's quickly remind ourselves that Engage entities (such as mobile and desktop apps) can exchange multicast voice traffic with non-Engage entities - provided those entities support industry standard protocols such as RTP and CODECS such as G.711, AMR, and so on.

A classic use-case is when you have an entity such as a two-way radio gateway that "speaks" multicast and you and need Engage entities to talk to that system. This is easily accomplished by setting up a group on Engage to use the codec the gateway is configured for and to use the same multicast IP address and port configured on the gateway. Assuming the multicast flows cleanly between the gateway and the clients (generally because they're all on the same network), all works fine.

So, something like the following where we have three Engage "clients" (c1, c2, and c3) on the same multicast network as the gateway. And the gateway is configured to forward a single talk group/channel/frequency on the radio system to bi-directional multicast at 234.5.6.7 with a port number of 15000 over standard RTP. And the gateway is configured to use a CODEC that Engage supports - in this case, G.711 ulaw. (We'll also assume that the gateway is not performing encryption of the RTP traffic (which is pretty typical these days.)

           multicast network
-------------------------------------
        ^              ^   ^    ^  
        |              |   |    |
        v              v   v    v
   +---------+        c1   c2   c3  
   | gateway |          
   +---------+          
        ^               
        |               
        v
 +--------------+
 | radio system |
 +--------------+

Alright, that's pretty straightforward - we're really just matching CODECs and multicast addressing with the gateway. All's good.

Something to be aware of, though, is that under the covers Engage is using a group ID associated with that group. Keep this in mind for later on.

Simple, Local Reflection

Let's contrive this example a little more and say that we DON'T want our Engage users (c1,c2,c3) on the multicast. Rather, we want them to connect over unicast to a Rallypoint which is on the multicast network with the gateway.

Actually, while seemingly contrived, this is not entirely unlikely as these client devices may be unable to join multicast networks (even though they're local) because of an older operating system, administrative policy, etc.

So, we want something like this:

           multicast network
-------------------------------------
        ^                    ^
        |                    |
        v                    v
   +---------+          +---------+
   | gateway |          |         | <---------> c1
   +---------+          |   rp    | <---------> c2
        ^               |         | <---------> c3
        |               +---------+
        v
 +--------------+
 | radio system |
 +--------------+

Also, pretty straightforward except for ... How does the Rallypoint know what multicast and port to use? And what Engage group does that map to?

Well, the Rallypoint can "learn" from the clients! That's right - when a client connects to the Rallypoint and registers for a group, it passes multicast info along with that registration. (Remember the client was configured earlier with the multicast info.) If the Rallypoint has been configured to allow multicast forwarding, and forwarding is permitted for the address and port pairing (see below), the Rallypoint will setup a "reflector". The reflector is simply a construct on the Rallypoint that "reflects" unicast traffic from clients (and other Rallypoints) to the multicast - and vis-a-versa.

So ... when the first client connects (let's say c1), the Rallypoint automatically sets up a reflector to the local multicast and keeps it going until all references to it go away. Subsequent registrations (say from c2 and c3) will not result in additional reflectors being setup - the Rallypoint will already have it. Once all clients that are "interested" in that group (and by association, the multicast) have disconnected; the reflector is stopped.

Pretty cool huh!? You don't really need to do too much on the Rallypoint to make this magic happen - other than to allow multicast reflection (which is disabled by default due to security and bandwidth utilization considerations).

But ... bear in mind that you do need to be using the same group ID on all the clients. You cannot have different groups that use the same multicast addresses and ports. Each of those may be able talk to the radio system, but they won't talk to each other!

Getting A Little More Sophisticated

What's that you say? Your Engage clients are OUTSIDE the corporate network on the Internet but need to talk to your radio system? Hmm, how could we possibly make that happen ... !?

Pretty easy in fact. Have exactly the same setup as before but this time make sure that your clients outside the corporate network have access to the Rallypoint through your firewall. You can do this by putting your Rallypoint in a DMZ zone with only in the inbound TCP port for Rallypoint connections allowed from the outside world and assign that inbound to the interfaceName setting in your Rallypoint configuration file, and set multicastInterfaceName in the configuration to bind to a different NIC. Or you could have them on the same NIC, or you can setup your Rallypoint to connect to an external Rallypoint that clients connect to and are then proxied to the Rallypoint on the multicast. Basically, there's a number of ways to do this. But, right now we'll go with the first one where we assume the clients have access through the firewall to the Rallpoint as depicted below.

                                     firewall
      internal network                  |
           multicast network            |  internet
-------------------------------------   |
        ^                    ^          |
        |                    |          |
        v                    v          |
   +---------+          +---------+     |
   | gateway |          |         | <--------> c1
   +---------+          |   rp    | <--------> c2
        ^               |         | <--------> c3
        |               +---------+     |
        v                               |
 +--------------+                       |
 | radio system |                       |
 +--------------+                       |
                                        |

Now, in the same way that things worked when the clients were inside the corporate network, it works when the clients are outside - no changes necessary from what we learnt before.

Static Reflection

We're doing well but now we want to get away from the clients knowing anything about multicast addressing on the LMR network. That's a real pain to manage - particularly if we have tons of these gateway setups all over the world. What we want here is for only the Rallpoint to know about multicast addressing and leave the clients to simply configure a group that the Rallypoint maps to the local multicast. So ... we have to "teach" the Rallypoint a little. And we do that using the staticReflectors section of the Rallypoint configuration.

We're going to need the multicast address and port (obviously) as well as the ID that the group has been assigned by the person or entity that creates the group configuration distributed to our clients. This is a little tricky and varies with different vendors' implementations of the Engage system. For our purposes, though, we'll assume we know that the group ID is {1e351ac4-7915-4144-9545-82d60c9cfe4e}.

Once we have this information, we edit the Rallypoint's configuration file and modify the staticReflectors section as follows:

   .
   .
   "staticReflectors": 
   [
      {
         "id":"{1e351ac4-7915-4144-9545-82d60c9cfe4e}",
         "rx":
         {
            "address":"234.5.6.7",
            "port":15000
         },
         "tx":
         {
            "address":"234.5.6.7",
            "port":15000
         }
      }
   ]
   .
   .

Now, we'll restart the Rallypoint. The reflector will be setup for the group identified as {1e351ac4-7915-4144-9545-82d60c9cfe4e} and will stay active for the lifetime of the Rallypoint. Any clients registering for {1e351ac4-7915-4144-9545-82d60c9cfe4e} will receive traffic for that group from other clients as well as from the multicast. Also, traffic from clients will be sent to the multicast as well.

Architecturally, this looks exactly the same as the previous diagram. The only difference is that the Rallypoint knows about the multicast at startup time, rather than only when "learning" about it when the first client connects.

Connecting Into A Mesh

All this is good and well and works fine for simple implementations. But what if we want to get even more sophisticated and connect our statically reflecting Rallypoint into a core Rallypoint mesh and make the multicast traffic available to clients connecting into that mesh.

Well, that functionality comes free with the setup we've just made. Simply point the Rallypoint to the mesh and connect the clients to the mesh instead. Your Engage Engines and Rallypoints will work together to deliver the traffic to where its needed - and nobody except the "local" Rallypoint needs to know anything about the gateway's multicast setup.

You can easily build something like this where you have two (or more) radio systems, each with their own gateway on their own metwork and their own multicast addressing. Each radio system (radio 01 and radio 02) would be different groups as far as the clients (c1, c2, and c3) goes - with a seperate group ID for each of course. On each Rallypoint (rp01 and rp02) you'd setup a static reflector just for that local radio system and mark each Rallypoint as being a "mesh leaf" by setting its isMeshLeaf setting to true. On each Rallypoint you'd peer it into the cloud mesh in much the sme way that you'd setup the core peers in the mesh as described above. But, instead of your Rallpoint peering to each of your core mesh Rallypoints, you'd simply peer it to your load balancer fronting the core mesh. Everything else will be automatically taken care of.

---------------------------------
        ^               ^
        |               |
        v               v
   +---------+     +---------+
   | gateway |     |   leaf  |
   +---------+     |   rp01  |<--------------+
        ^          |         |               |
        |          +---------+               |
        v                                    |
 +--------------+                            |
 | radio 01     |                            v
 +--------------+                       +---------+ 
                                        |         | <----> c1
                                        | cloud   | <----> c1                                        
                                        | mesh    | <----> c1
                                        |         |
                                        +---------+
                                             ^
---------------------------------            |
        ^               ^                    |
        |               |                    |
        v               v                    |
   +---------+     +---------+               |
   | gateway |     |   leaf  |               |
   +---------+     |   rp02  | <-------------+
        ^          |         |
        |          +---------+
        v                       
 +--------------+
 | radio 02     |
 +--------------+

Its very important that you set your "local" Rallypoint as a mesh leaf using the isMeshLeaf property. If you don't, the core mesh will not forward traffic as expected and you may experience one-way traffic flow.

Whitelisting And Blacklisting Using Restrictions

To further optimize operation, reduce bandwidth utilization, and guard against potential security threats; Rallypoints have the notion of "restrictions". A restriction is simply a rule that governs how the Rallypoint is to allow (or disallow) access to something.

In the case of multicasts, we can specify which multicast addresses and ports are to be allowed for use - by setting type in the JSON object to 1. That way, the Rallypoint will only allow those multicasts to operate. Conversely, if we want to allow all multicasts except for some very specific ones, we set type to 2 in the JSON and list the multicasts we want excluded.

In the same way, we can place restrictions on which group IDs are either specifically allowed on a Rallypoiint, or not allowed.

Multicast Restrictions

In our example JSON above, we had the following:

   .
   .
   "multicastRestrictions": 
   {
      "type":1,
      "elements":
      [
         {
            "rx": 
            {
               "address":"234.1.2.3",
               "port":25000
            },
            "tx": 
            {
               "address":"234.1.2.3",
               "port":25000
            }
         },

         {
            "rx": 
            {
               "address":"234.5.6.7",
               "port":17222
            },
            "tx": 
            {
               "address":"234.5.6.7",
               "port":17222
            }
         }
      ]
   },
   .
   .

What we've done here is to tell the Rallypoint that we ONLY want to allow multicasting for 234.1.2.3:25000 and 234.5.6.7:17222. All other multicasting attempts will fail. We did this by setting type to 1. Pretty simple - huh! Now, if we rather wanted to allow all multicasting EXCEPT for these, we'd simply set type to 2.

NOTE: If you set type to 1 and do not provide any multicasts; you are effectively turning off all multicasting. The Rallypoint will log a warning for this at startup but, as this is not a critical issue, will continue execution.

Group Restrictions

In much the same way as we define retrictions for multicasting, we can do the same with group IDs. As per the example:

   .
   .
    "groupRestrictions":
   {
      "type":1,
      "elements":
      [
         "{58e3e468-c0e1-4ad8-86d2-9931251e6ea0}",
         "{b7694a4f-9724-44a6-ae57-c63232ad1f57}"
      ]
   },
   .
   .

Here we're instructing the Rallypoint that the ONLY group IDs allowed on this Rallypoint are {58e3e468-c0e1-4ad8-86d2-9931251e6ea0} and {b7694a4f-9724-44a6-ae57-c63232ad1f57}. Attempts by Engage clients to register for any other stream will be denied. Also, any other Rallypoint peer attempting to register for any group ID other than the ones specified will also be denied.

Plus ... there's a cool bonus here ... When Rallypoints first connect to each other, they include their group restrictions in the initial handshake. This allows these peers to filter the registrations they convey across a Rallypoint mesh - greatly reducing bandwidth utilization and operational overhead.

And, of course, we can turn this upside down and allow all groups except some specific ones by simply setting type to 2 - thereby blacklisting those group identifiers.

NOTE: If you set type to 1 and do not provide any group identifiers; you are effectively turning off all group-related operations. And, because the Rallypoint's function is to provide packet routing for groups, this will basically make it a dead entity. Hence, if this issue is encountered at startup, the Rallypoint will log a fatal error and abort.

Also ... this stuff has bearing on staticReflectors as well. If you setup static reflection on a Rallypoint, and you're using restrictions, make sure that the static reflector's multicast addressing is included in your multicastRestrictions (or at least not excluded from your multicastRestrictions). Similarly, make sure that you'll be allowing the group IDs in groupRestrictions.

Extended Group Restrictions

Group restrictions are a simple way for your Rallypoint to allow or deny access to groups. But with simplicity comes lack of flexibility. For example, by whitelisting {58e3e468-c0e1-4ad8-86d2-9931251e6ea0} and {b7694a4f-9724-44a6-ae57-c63232ad1f57} above, we allow ANYONE to access those groups (assuming their X.509 certificate checks out when they connect). But what if we want to specialize a little here and place some restrictions on WHO can access those groups?

Well, the Rallypoint doesn't support the notion of user IDs, passwords, or other such old-school stuff. Rather, it uses X.509 extensively for all kinds of security-related concerns. So we figured we'd use those certificates for the notion of "group-level firewalling". Alright, let's image that that group {58e3e468-c0e1-4ad8-86d2-9931251e6ea0} is a group used in the education sector - say in schools. We don't want just anyone with an Engage client and a valid certificate getting onto a group where there's discussions about kids, their schedules, interests, and so on. Rather, what we want is only for people - say like teachers - to participate in that group. So we'll place a set of restrictions on the group that only allows educators, teachers, and instructors can access. With something like this:

"extendedGroupRestrictions": [
{
   "id": "{58e3e468-c0e1-4ad8-86d2-9931251e6ea0}",
   "restrictions": [
      {
         "type": 1,
         "elementsType": 2,
         "elements": [
            "-educators",
            "-teachers",
            "-instructors"
         ]
      }
   ]
},

And how do we know that a user is a educator, teacher, or instructor ... ? Well, we're going to look for that "role" as a tag in their X.509 certificate that their Engage client gave to the RP when it connects. Later on we'll talk about IANA Object Identifiers and other such things that give us the ability to embed these "tags" in the client certificate. For now, just assume that the client's certificate contains one or more of those tags - being -educators, -teachers, or -instructors. If it does, the Rallypoint will allow the client access to it. If none of tags exist in the client certificate; that client will not be allowed to registrer for that group (but it may well be able to register for others of course).

But, let's say that there's a secret discussion going on regarding schools in Washington or Idaho (maybe there's a surprise party being planned - just go with it ... OK!). And we want to exclude folks in Washington and Idaho. Those folks would already have one or more of -educators, -teachers, or -instructors tags in their certificates - which would grant them access. But we don't want to do that right now. So, we add another rule that specifically excludes them - in this case based on the State that their certificate was issued for (this is in the Subject field of their certificates in the ST item). Our extendedGroupRestrictions for {58e3e468-c0e1-4ad8-86d2-9931251e6ea0} now looks as follows:

"extendedGroupRestrictions": [
{
   "id": "{58e3e468-c0e1-4ad8-86d2-9931251e6ea0}",
   "restrictions": [
      {
         "type": 1,
         "elementsType": 2,
         "elements": [
            "-educators",
            "-teachers",
            "-instructors"
         ]
      },
      {
         "type": 2,
         "elementsType": 5,
         "elements": [
               "(ST=WA)|(ST=ID)"
         ]
      }
   ]
},

The first object initially grants (type=1) access for any certificate containing the tags -educators, -teachers, or -instructors. The second object denies (type=2) access to certificates whose ST item in the Subject (elementsType=5) is WA or ID.

The elementsTypes field is really important here because it tells the Rallypoint how to interpret the strings in the elements list. They are as follows:

  • 0 - the elements are literal group IDs
  • 1 - the elements are group ID regex patterns
  • 2 - the elements are generic access tags regex patterns in the client certificate
  • 3 - the elements are X.509 serial number regex patterns for the client certificate
  • 4 - the elements are X.509 fingerprint regex patterns for the client certificate
  • 5 - the elements are regex patterns the client certificate's subject' field
  • 6 - the elements are regex patterns the subject field of the CA certificate that issued the client certificate

Except for a elementsType of 0, all of these are processed as regular expressions - or "regex" - which are immensely powerful search criteria for processing data. However regex is a black art and seen by many (this author included) as too difficult to comprehend. Nonetheless, you can place regex in the elements array and the Rallypoint will dutifully process them.

If you want to know more about regex - and have about 6 months of available time - start at Wikipedia and go from there.

NOTE: As if regex wasn't already complicated enough in general; there are different implementations of regex that have sufficient subtle differences to make it hard to keep up. So, we decided to use regex as implemented in the the PCRE2 (Perl Compatible Regular Expressions) library used by many popular applications. Check it out on Wikipedia

Regex Gotchas

If you look at the JSON above and pay attention to the section regarding tags (like -educators, -teachers, and -instructors), your geeky self may see a bug (and you'd be right).

Let's take the tag -educators for example ... This will certainly match -educators but it will also match -educatorsInPhysics and -educatorsWhoDriveCars. If we really want to match just -educators then we must tell the Rallypoint that the tag must be EXACT - and we do that with regex by indicating a word boundary with the special \b notation. So, we'd specify -educators\b (notice the \b in there) rather than just -educators.

REMAINDER OF DISCUSSION OF EXTENDEDGROUPRESTRICTIONS IS PENDING

Monitoring

Every organization has policies that dictate how services such as Rallypoints should be monitored. We've tried to be as flexible as possible to allow for that. And will provide examples for different environments as they come to light.

Bash script for periodic status display

Here's a script we developed for a Rallypoint mesh residing in Amazon Web Services. Feel free to use it in your environment.

#!/bin/bash

#-------------------------------------------------------------------------
# Rallypoint Status Display
# Copyright (c) 2020 Rally Tactical Systems, Inc.
#
# This script periodically reads a status file produced by a Rallypoint
# and displays useful statistics.  We're making use of JQ in this
# script so if you don't already have JQ, install it.
#
# This script has been tested on RedHat-style distros and works quite
# well, your mileage may vary (slight) on other distros.  Feel free
# to modify accordingly.
#
# In fact, this script was developed for RTS' internal use for a Rallypoint
# mesh hosted in Amazon Web Services.  So there's a little bias toward AWS
# here.  Mostly, though, this script should work on most Linux distros.
#
# Here's an example of a JSON status file.  We will be using some of the
# elements from it in our script.
#
#{
#  "connections": {
#    "active": 49,                              <-- Number of active network connections
#    "total": 574                               <-- Total network connections for the lifetime of the process
#  },
#  "healthChecks": {
#    "count": 10823,                            <--- Number of healthcheck connections we've had from the load balancer
#    "rate": 0.834,                             <--- Instantaneous rate of health check connections per second
#    "rateEma": 1.353                           <--- Exponential moving average of the rate of health check connections
#  },
#  "id": "i-07ff7082e6969e259",                 <--- The ID of the Rallypoint
#  "links": {
#    "clients": {
#      "count": 48                              <--- Number of active client connections
#    },
#    "peers": {
#      "configuredConnectedCount": 1,           <--- Number of peer (mesh) connections the RP has connected that were statically configured
#      "configuredCount": 1,                    <--- Number of peer (mesh) connections the RP is statically configured for
#      "count": 1,                              <--- Number of active peer (mesh) connections
#      "leafConnectedCount": 0,                 <--- Number of connected leaf RP nodes
#      "list": [                                <--
#        {
#          "address": "172.31.11.22:7443",      <--- Address of a peer
#          "id": "i-0aaa5cefaa5cbd870",         <--- ID of a peer
#          "state": 2,                          <--- Connection state of the peer (0=NotLinked, 1=InProgress, 2=LinkedOutbound, 3=LinkedInbound)
#          "type": 1                            <--- Connection type of the peer (1=Core, 2=Leaf)
#        }
#      ]
#    }
#  },
#  "queue": {
#    "avgExecNanos": 0,                         <--- Average operation execution time in nanoseconds
#    "depth": 0,                                <--- Number of operations pending in the queue (should be 0 or close to it)
#    "maxDepth": 172,                           <--- Maximum number of operations that backed up in the queue duering the process' lifetime
#    "maxExecNanos": 0,                         <--- Maximum operation execution time in nanoseconds
#    "minExecNanos": 0,                         <--- Minimum operation execution time in nanoseconds (should always be 0)
#    "ops": {
#      "count": 168145,                         <--- Number of operations processed during the process' lifetime
#      "rate": 38.867,                          <--- Instantaneous rate of operations per second
#      "rateEma": 5.035                         <--- Exponential moving average of rate of operations per second
#    },
#    "spuriousWakeups": 16243,                  <--- Number of times the queue woke up with no work to do
#    "wakeUps": 155164                          <--- Total number of times the queue woke up
#  },
#  "routing": {
#    "blobs": {
#      "rx": {
#        "bytes": 41455816,                     <--- Incoming bytes of routed streamed data
#        "packets": 123354                      <--- Incoming packets of routed streamed data
#      },
#      "tx": {
#        "bytes": 1252696896,                   <--- Outgoing bytes of routed streamed data
#        "packets": 3815596                     <--- Outgoing packets of routed streamed data
#      }
#    },
#    "streams": 7                               <--- Number of streams/groups registered for routing
#  },
#  "rx": {
#    "bytes": 43615050,                         <--- Overall incoming bytes of data (routed and otherwise)
#    "packets": 139700,                         <--- Overall incoming packets of data (routed and otherwise)
#    "rate": 59306.667,                         <--- Overall RX byte rate/second
#    "rateEma": 3011.565                        <--- Exponential moving average of overall RX byte rate/second
#  },
#  "ts": 1584653616,
#  "tx": {
#    "bytes": 1256490445,                       <--- Overall outgoing bytes of data (routed and otherwise)
#    "packets": 3831874,                        <--- Overall outgoing packets of data (routed and otherwise)
#    "rate": 2679872,                           <--- Overall TX byte rate/second
#    "rateEma": 65829.295                       <--- Exponential moving average of overall TX byte rate/second
#  }
#}   
#-------------------------------------------------------------------------   


# This script is running on Amazon Web Services, for we use this magic little curl
# call to retrieve this instance's ID.  This same instance ID is used by the Rallypoint
# to identify itself in our mesh.  For you environment, your Rallypoint ID may be something
# else - like a Kubernetes cluster instance ID, a host name, an IP address, or
# some other uniquie identifier
RP_ID=`curl -s http://169.254.169.254/latest/meta-data/instance-id`
if [ "${RP_ID}" == "" ]; then
        echo "ERROR: Cannot determine Rallypoint instance ID"
        exit 1
fi

# Our Rallypoint periodically writes it's status to the /tmp directory into a JSON
# file named with the instance ID followed by "_status.json".  The default is
# 30 seconds but your configuration may be different.
FN="/tmp/${RP_ID}_status.json"

# We will check the JSON file every so often for changes
UPDATE_CHECK_SECS=10

# Just some internal variables
SHOW_TABLE_HEADER=1
LAST_TS=""

# Load our values
function loadVals()
{
        # Determine how long the process has been running
        UPTIME=`ps axo etime,%cpu,%mem,cmd | grep 'rallypointd -id' | grep -v grep | awk '{split($0,a); print a[1];}'`

        # Parse our JSON into an array of strings
        VALUE_ARRAY=(`jq '.ts,.connections.active,.connections.total,.healthChecks.rate,.links.clients.count,.links.peers.count,.routing.streams,.routing.blobs.rx.packets,.routing.blobs.tx.packets,.queue.depth,.queue.maxDepth,.queue.ops.count,.queue.ops.rate,.queue.ops.rateEma' "${FN}"`)

        # Grab the elements from the array
        TS=${VALUE_ARRAY[0]}
        CONN_ACTIVE=${VALUE_ARRAY[1]}
        CONN_TOTAL=${VALUE_ARRAY[2]}
        CONN_HC_RATE=${VALUE_ARRAY[3]}
        CONN_CLIENTS=${VALUE_ARRAY[4]}
        CONN_PEERS=${VALUE_ARRAY[5]}

        RT_STREAMS=${VALUE_ARRAY[6]}
        RT_RX_BLOBS=${VALUE_ARRAY[7]}
        RT_TX_BLOBS=${VALUE_ARRAY[8]}

        Q_DEPTH=${VALUE_ARRAY[9]}
        Q_MAX_DEPTH=${VALUE_ARRAY[10]}
        Q_OP_COUNT=${VALUE_ARRAY[11]}
        Q_OP_RATE=${VALUE_ARRAY[12]}
        Q_OP_RATE_EMA=${VALUE_ARRAY[13]}
}

# Display our pretty table
function showTable()
{
        # Only show the table header the first time this function is called
        if [ "${SHOW_TABLE_HEADER}" == "1" ]; then
                SHOW_TABLE_HEADER=0
                echo "Rallypoint Status Display for node ${RP_ID}"
                echo "Copyright (c) 2020 Rally Tactical Systems, Inc."

                echo "------------------------------ ------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------"
                echo "                     Timestamp       Uptime Actv Conns  Tot Conns    HC Rate    Clients      Peers    Streams   RX Blobs   TX Blobs    Q Depth      Q Max      Q Ops     Q Rate Q Rate EMA"
                echo "------------------------------ ------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------"
        fi

        # Make a human readable as-of timestamp
        ASOF=`date -d@${TS}`

        printf "%30s %12s %10d %10d %10.2f %10d %10d %10d %10d %10d %10d %10d %10d %10.2f %10.2f\n" \
                "${ASOF}" \
                "${UPTIME}" \
                "${CONN_ACTIVE}" \
                "${CONN_TOTAL}" \
                "${CONN_HC_RATE}" \
                "${CONN_CLIENTS}" \
                "${CONN_PEERS}" \
                "${RT_STREAMS}" \
                "${RT_RX_BLOBS}" \
                "${RT_TX_BLOBS}" \
                "${Q_DEPTH}" \
                "${Q_MAX_DEPTH}" \
                "${Q_OP_COUNT}" \
                "${Q_OP_RATE}" \
                "${Q_OP_RATE_EMA}"
}

# We'll go round and round here
while [ true ]; do
        # Load values from the JSON file
        loadVals

        # Is the timestamp different from the last time we read it?  If not,
        # there's no need to print a new line in the table
        if [ "${TS}" != "${LAST_TS}" ]; then
                LAST_TS="${TS}"

                # Show the table (actually just one line in the table)
                showTable
        fi

        # Go to sleep for a little while
        sleep ${UPDATE_CHECK_SECS}
done

Example output from this script looks as follows (in this case Engage clients we connecting disconnecting frequently and only two peers were configured in the mesh):

$ /mnt/efs/shared/rallypointd/rpstatus.sh

Rallypoint Status Display for node i-07ff7082e6969e259
Copyright (c) 2020 Rally Tactical Systems, Inc.
------------------------------ ------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
                     Timestamp       Uptime Actv Conns  Tot Conns    HC Rate    Clients      Peers    Streams   RX Blobs   TX Blobs    Q Depth      Q Max      Q Ops     Q Rate Q Rate EMA
------------------------------ ------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
  Fri Mar 20 02:15:13 UTC 2020     08:08:58         40       1921       0.93         39          1          5     691490   26112909          0        172     847366      73.10       6.91
  Fri Mar 20 02:15:43 UTC 2020     08:09:18         46       1927       0.80         45          1          5     693859   26214356          0        172     849964      86.60       6.91
  Fri Mar 20 02:16:13 UTC 2020     08:09:49          2       1934       0.90          1          1          5     695355   26285940          0        172     851725      58.70       6.91
  Fri Mar 20 02:16:43 UTC 2020     08:10:19         11       1943       0.97         10          1          5     695834   26288993          0        172     852348      20.77       6.92
  Fri Mar 20 02:17:13 UTC 2020     08:10:49         16       1948       0.60         15          1          5     696703   26299841          0        172     853350      33.40       6.92
  Fri Mar 20 02:17:43 UTC 2020     08:11:19         21       1953       0.97         20          1          5     697811   26318936          0        172     854607      41.90       6.92
  Fri Mar 20 02:18:13 UTC 2020     08:11:49         28       1960       0.97         27          1          5     699118   26350193          0        172     856093      49.53       6.92
  Fri Mar 20 02:18:43 UTC 2020     08:12:19         38       1970       0.67         37          1          5     700926   26409198          0        172     858123      67.67       6.93
  Fri Mar 20 02:19:13 UTC 2020     08:12:49         46       1978       1.03         45          1          5     703316   26507774          0        172     860747      87.47       6.93
  Fri Mar 20 02:19:43 UTC 2020     08:13:19          3       1985       0.83          2          1          5     704682   26573009          0        172     862369      54.07       6.94
  Fri Mar 20 02:20:13 UTC 2020     08:13:49         12       1994       0.93         11          1          5     705205   26576656          0        172     863034      22.17       6.94
  Fri Mar 20 02:20:43 UTC 2020     08:14:19         21       2003       0.77         20          1          5     706156   26591777          0        172     864154      37.33       6.94
  Fri Mar 20 02:21:14 UTC 2020     08:14:49         28       2010       0.80         27          1          5     707295   26618490          1        172     865472      43.93       6.94
  Fri Mar 20 02:21:45 UTC 2020     08:15:19         36       2018       1.03         35          1          5     708779   26663941          0        172     867168      56.53       6.95
  Fri Mar 20 02:22:15 UTC 2020     08:15:49         40       2022       0.73         39          1          5     710849   26739301          0        172     869438      75.67       6.95
  Fri Mar 20 02:22:45 UTC 2020     08:16:19         45       2027       0.87         44          1          5     713398   26844522          0        172     872205      92.23       6.95
  Fri Mar 20 02:23:15 UTC 2020     08:16:49          3       2036       0.83          2          1          5     714856   26912651          0        172     873919      57.13       6.96
  Fri Mar 20 02:23:45 UTC 2020     08:17:19          8       2041       0.93          7          1          5     715374   26915291          0        172     874548      20.97       6.96
  Fri Mar 20 02:24:15 UTC 2020     08:17:49         15       2048       0.87         14          1          5     716251   26925560          0        172     875570      34.07       6.96
  Fri Mar 20 02:24:45 UTC 2020     08:18:20         23       2056       0.83         22          1          5     717385   26946167          0        172     876875      43.50       6.96
  Fri Mar 20 02:25:15 UTC 2020     08:18:50         33       2066       1.00         32          1          5     718847   26984451          0        172     878544      55.63       6.97
  Fri Mar 20 02:25:45 UTC 2020     08:19:20         39       2072       0.87         38          1          5     720672   27048113          0        172     880577      67.77       6.97
  Fri Mar 20 02:26:15 UTC 2020     08:19:50         46       2079       0.77         45          1          5     723055   27146906          0        172     883192      87.17       6.98
  Fri Mar 20 02:26:45 UTC 2020     08:20:20          2       2084       1.00          1          1          5     724372   27208731          0        172     884746      51.80       6.98
  Fri Mar 20 02:27:15 UTC 2020     08:20:50         10       2092       0.93          9          1          5     724956   27212535          0        172     885465      23.97       6.98
  Fri Mar 20 02:27:45 UTC 2020     08:21:20         18       2100       0.90         17          1          5     725827   27224674          0        172     886493      34.27       6.99
  Fri Mar 20 02:28:16 UTC 2020     08:21:50         25       2107       0.87         24          1          5     727009   27249273          0        172     887845      45.07       6.99
  Fri Mar 20 02:28:46 UTC 2020     08:22:20         32       2114       0.97         31          1          5     728467   27291089          0        172     889499      55.13       7.00
  Fri Mar 20 02:29:16 UTC 2020     08:22:50         38       2120       0.93         37          1          5     730261   27353807          0        172     891498      66.63       7.01

Troubleshooting

Connections & Registrations

If your Engage client (or peering Rallypoint) is having trouble reaching a Rallypoint, a quick check of the connection is helpful. Now, you can certainly comb through logs and such to track down the problem but the first thing to do is to see if you can even get to the far end. The simplest way to do this is to open a TLS connection to the Rallypoint in question.

We can do this by using a tool that uses TLS connections such as a web browser or the command-line openssl tool.

For example: Let's say we're trying to reach rp.example.com. Unless you've told the RP to listen for incomimg connections on a different port than the default; it'll listen on port 7443. So we'll specify 7443 as the port.

Use a web browser

Enter the URL as follows (be sure to specify https to tell the browser to use TLS):

https://rp.example.com:7443

Ideally we'd get something like this (Chrome in this case):

This site canโ€™t provide a secure connection

rp.example.com uses an unsupported protocol.

ERR_SSL_VERSION_OR_CIPHER_MISMATCH

Unsupported protocol

The client and server don't support a common SSL protocol version or cipher suite.

This means that the browser resolved the host name (rp.example.com), managed to open a TCP connection to port 7443, and began TLS negotiation. But because the RP is not a web server (it really isn't), the browser isn't going to be able to complete the TLS negotiation. That's OK - we've proven that the RP can be reached.

Use the OpenSSL command-line

openssl s_client -host rp.example.com -port 7443

Here we'd like to see something like this:

CONNECTED(00000006)
depth=0 C = US, ST = Washington, L = Seattle, O = "Rally Tactical Systems, Inc.", OU = "(c) 2019 Rally Tactical Systems, Inc. - For authorized use only", CN = Rallypoint Factory Default Certificate, emailAddress = [email protected]
verify error:num=20:unable to get local issuer certificate
.
.
.

Just like with the browser test, openssl will yell about not being to verify certificates and such. But that's OK - we're not expecting the TLS connection to actually be setup and finalized. We just want to make sure we can reach the RP.

If it didn't work...

If you don't get any joy with the above then you're going to need to check the error the browser or openssl reports. Some of the following are possible.

Process-related

  • The RP may not be running on the machine. Check to see if the rallypointd process is actually up.

  • The RP process may be hung or in some other suspended state. Try restarting the process.

Network-related

  • DNS may not be resolving the host name. Try using the host's IP address.

  • The machine where the RP resides is blocking incomimg TCP connections to port 7443. Open that port on the RP's firewall.

  • The TCP pathway between your machine and the RP may be blocking TCP and/or port 7443 traffic. Check things like routers, switches, VLANs, proxies, and the like. (Yeah, that's a pain and its often the daily experience of network admins everywhere. So give them all a thought while you're at it.)

That worked but there's still no connection

If your client (or peering RP) still can't connect after all this, then we need to delve a little deeper.

What we mean by CONNECT is that a TCP connection is established but drops almost right away.

  • There may be a security issue such as the RP not accepting the connecting entity's X.509 certificate, or the connecting entity not accepting the RP's X.509 certificate. Check the logs on both sides to see what gives. Also, using that openssl example from above, have a look at the certificate the RP is presenting. You should know what your client-side will accept, so check out that RP's certificate.

  • The RP may not be allowing connections because it's experiencing a load beyond the limits that it's been given. Check the limits section of your rallypointd_conf.json file as well as the RP's logs and status file.

  • Run a Wireshark or tcpdump capture to see at what point the link is dropped.

The RP link does come up but some (or all) groups don't connect

Alright, we've confirmed that we can actually reach the RP (networking is good!). We've also confirmed that we can connect to the RP (the link stays up!). This is good.

But, some (or maybe even all) our groups are not getting connected.

There's two possible reasons for this.

  • Registration of a group may exceed the limits that have been set by the RP's administrator. Check the limits section of your rallypointd_conf.json file as well as the RP's logs and status file.

  • The RP may have had restrictions placed on what group identifiers can be registered. Check the groupRestrictions and extendedGroupRestrictions sections of your rallypointd_conf.json file.

โš ๏ธ **GitHub.com Fallback** โš ๏ธ