How to setup Grafana instance to monitor multiple IBM Storage Scale clusters running in a cloud or mixed environment - IBM/ibm-spectrum-scale-bridge-for-grafana GitHub Wiki
It’s possible to manage the connection configuration for multiple IBM Storage Scale clusters (data sources) in Grafana by adding a YAML config file in the provisioning/datasources directory. Follow reading Provision an IBM Storage Scale data source to manage the connection configuration of multiple IBM Storage Scale clusters running on bare metal in Grafana.
In a cloud environment, data source provisioning can be performed by deploying DataSource CR maintained by RedHat Grafana-Operator. Starting with version 5, Redhat Grafana Operator supports the management of cross-namespace data sources instances. This new feature provides the ability to monitor with a single Grafana instance multiple systems running in a cloud environment. An example of usage such such feature might be the CNSA AFM regional DR setup.
/images/Openshift/multiple_ocps_management.png
The same configuration opportunities can be used for a MIXED environment.
/images/Openshift/mixed_environment_management.png
The placement and number of managed Grafana instances in each particular environment depends on the business strategy: 1. Centralized or distributed monitoring 2. Grafana instance running in a container environment or running externally
Connecting a grafana-bridge running in a remote Openshift cluster to a Grafana instance running outside this cluster
Let's make out that the Openshift cluster where the Grafana instance will be deployed and running is called "local-ocp". Another Openshift cluster where the remote gafana-bridge instance already deployed and running is called “remote-ocp”.
We start with grafana-bridge running in remote-ocp. A route to the grafana-bridge service path must be deployed to enable external access to the grafana-bridge.
kind: Route
apiVersion: route.openshift.io/v1
metadata:
name: grafanabridge
namespace: ibm-spectrum-scale
labels:
app.kubernetes.io/instance: ibm-spectrum-scale
app.kubernetes.io/name: grafanabridge
annotations:
openshift.io/balance: source
spec:
to:
kind: Service
name: ibm-spectrum-scale-grafana-bridge
weight: 100
port:
targetPort: https
tls:
termination: passthrough
In the local-ocp deploy a Grafana kind resource if not already done.
Grafana instance requires ssl connection data to communicate with a grafana-bridge. With Grafana Operator V.5 it is possible to apply ssl/tls connection data during datasource connection dynamically from a tls secret.
Create in the local-ocp a kind secret for storing the ssl cerificate and key from grafana-bridge running in remote-ocp cluster.
apiVersion: v1
data:
tls.crt: ''
tls.key: ''
kind: Secret
metadata:
name: grafana-bridge-tls-cert-remote
type: kubernetes.io/tls
From "remote-ocp" copy and temporarily store the grafana-bridge SSL connection data available in the ibm-spectrum-scale-grafana-bridge-service-cert secret.
TLS_CERT=`oc get secret ibm-spectrum-scale-grafana-bridge-service-cert -n ibm-spectrum-scale -o json |jq '.data["tls.crt"]' | tr -d \"`
TLS_KEY=`oc get secret ibm-spectrum-scale-grafana-bridge-service-cert -n ibm-spectrum-scale -o json |jq '.data["tls.key"]' | tr -d \"`
Update the grafana-bridge-tls-cert-remote secret with TLS_CERT and TLS_KEY variables content.
# oc get secrets grafana-bridge-tls-cert-remote -n $NAMESPACE -o json | jq ".data[\"tls.key\"] |= \"$TLS_KEY\"" | jq ".data[\"tls.crt\"] |= \"$TLS_CERT\""| oc apply -f -
Create a GrafanaDatasource kind object with a reference to the grafana-bridge-tls-cert-remote secret and the grafana-bridge route url in the "local-ocp".
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDatasource
metadata:
name: bridge-grafanadatasource-remote
spec:
valuesFrom:
- targetPath: "secureJsonData.tlsClientCert"
valueFrom:
secretKeyRef:
key: "tls.crt"
name: "grafana-bridge-tls-cert-remote"
- targetPath: "secureJsonData.tlsClientKey"
valueFrom:
secretKeyRef:
key: "tls.key"
name: "grafana-bridge-tls-cert-remote"
datasource:
access: proxy
editable: true
isDefault: true
jsonData:
httpHeaderName1: Authorization
timeInterval: 5s
tlsAuth: true
tlsSkipVerify: true
tsdbVersion: '2.3'
name: grafana-bridge-remote
secureJsonData:
tlsClientCert: ${tls.crt}
tlsClientKey: ${tls.key}
type: opentsdb
url: < grafana-bridge service route url >
instanceSelector:
matchLabels:
dashboards: my-dashboards
Verify status of the fresh deployed GrafanaDataSource instance.
# oc get GrafanaDataSource bridge-grafanadatasource-remote -n grafana-for-cnsa -o json | jq '.status'
{
"hash": "a209926ee6db83847d16e0ac08fcf71542c298064a5575ca0d25a519d7d2900d",
"lastResync": "2023-11-20T19:18:53Z",
"uid": "504a1eed-a6c0-4a7a-b4ec-fc650d4fd4a4"
}
Monitoring Performance Data from multiple CNSA clusters
To make usage of monitoring the performance data from multiple CNSA clusters, deploy a Grafana instance and a GrafanaDatasource kind resource on the Openshift cluster running CNSA ("local-ocp").
On the different Openshift cluster running CNSA deploy grafana-bridge route. After that done switch back to the "local-ocp" and create a tls data secret and a GrafanaDatasource kind instance pointing to the grafana-bridge route from "remote-ocp". For more details please check the section above.
Verify both GrafanaDatasource instances are up and running.
# oc get GrafanaDatasource -n grafana-for-cnsa
NAME NO MATCHING INSTANCES LAST RESYNC AGE
bridge-grafanadatasource 62s 52d
bridge-grafanadatasource-remote 62s 50d
Now, you should be able to see both data sources in Grafana web explorer.
/images/Openshift/openshift_multiple_bridge_conn.png
As next you can provision a Dashboard kind instance from example dashobards or create your own.
Performance Monitoring of an AFM RegionalDR setup configured over 2 CNSA clusters (primary and secondary side)
RegionalDR provides asynchronous replication of PVs between two CNSA clusters
- A primary (cashe) cluster that hosts the application
- A secondary (home) cluster that is a passive standby for the application
Data is transferred between primary and secondary via NFSv4.1 protocol. The secondary site runs one or more NFS servers that export the independent filesets representing the replication targets. The gateway nodes on the primary site mount those exports and write the data to them.
The End-to-End AFM RegionalDR Performance Monitoring requires metrics observation on both sides: primary and secondary. The health of AFM filesets is mapped to the AyncReplication CR Healthy Condition on the primary side. The health of the NFS servers is mapped to the RegionalDR CR Healthy Condition on the secondary side.
Provision the "cnsa-afm-over-nfs-dashboard" GrafanaDashboard:
# oc apply -f https://raw.githubusercontent.com/IBM/ibm-spectrum-scale-bridge-for-grafana/master/examples/openshift_deployment_scripts/examples_for_grafana-operator_v5/provision_dashboard/cnsa-afm-over-nfs-dashboard-v5.yml -n grafana-for-cnsa
grafanadashboard.grafana.integreatly.org/cnsa-afm-nfs-dashboard created
# oc get GrafanaDashboard -n grafana-for-cnsa
NAME NO MATCHING INSTANCES LAST RESYNC AGE
cnsa-afm-nfs-dashboard 23s 23s
Open the "cnsa-afm-over-nfs-dashboard" from dashboards view in Grafana web explorer.
/images/Openshift/afm_in_dashboards_view.png
You should be able to select your primary side CNSA as cacheCluster source, and secondary side CNSA as homeCluster source.
/images/Openshift/cnsa_afm_over_nfs_collapsed.png
In the CACHE CLUSTER section you can check the total number of bytes written to the remote system as a result of cache updates, the number of messages that are currently enqueued or used memory in bytes by the messages enqueued.
In parallel, you can check NFS throughput/IO Rate in the HOME CLUSTER section.