Addon Framework Operator - stolostron/search-v2-operator GitHub Wiki

Certificate Authorization in Operator Addon Framework

Addon framework part of Search operator

When the search-operator pod runs, it also starts the addon (addon-framework part), which is in charge of approving CertificateSigningRequest for search-collectors on the managed clusters.

https://github.com/stolostron/search-v2-operator/blob/3d9731858bc288564a506abcb868648d49dcd145/addon/addon.go#L260C12-L260C33

When it starts, operator logs a message as below:
2024/06/27 12:01:38 [INFO] Starting Search Addon

The klusterlet-addon-search-xxxx-xxxx search pod in the open-cluster-management-agent-addon namespace on the managed cluster uses the search-collector-hub-kubeconfig secret on the managed cluster to connect to the hub. Here is how to check if the certificate is valid:
kubectl get secret search-collector-hub-kubeconfig -o jsonpath='{.data.tls\.crt}' | base64 --decode | openssl x509 -noout -dates

Managed to hub connection Certificate expiration

When the certificate expires, the klusterlet-agent on the managed cluster will create a CertificateSigningRequest (csr), named addon-xxx-search-collector-xxxx, on the hub to request a new hub-kubeconfig. The search controller (addon-framework part) will approve the csr on the hub cluster and prints log:
I0627 16:55:45.237106 1 csr_helpers.go:174] CSR approved , and the klusterlet-agent will regenerate the secret with a new certificate after the csr is approved.

If there is an error in the certificate rotation, the ManagedClusterAddOn that exists in the managed cluster namespace on the hub will capture it.

kind: ManagedClusterAddOn
  name: search-collector
  namespace: <managed cluster name>

In the ManagedClusterAddOn status, it would capture the ClusterCertificateRotated status.

status:
  addOnConfiguration: {}
  addOnMeta: {}
  conditions:
  - lastTransitionTime: "2023-12-01T23:08:05Z"
    message: Failed to rotated client certificate unable to get csr "addon-xxxx-search-collector-xxxx".
      It might have already been deleted
    reason: ClientCertificateUpdateFailed
    status: "False"
    type: ClusterCertificateRotated

Here, it points out that the cert rotation failed.

So, the search-collector on the managed cluster won't be able to connect to the hub. The logs will show errors such as:

2024-06-21T10:13:38.173260502Z E0621 10:13:38.173247       1 sender.go:256] Sync sender error. 401 Unauthorized
2024-06-21T10:13:38.173260502Z E0621 10:13:38.173255       1 sender.go:308] SEND ERROR: 401 Unauthorized
2024-06-21T10:13:38.173302350Z W0621 10:13:38.173265       1 sender.go:321] Error during last sync. Resending in 10m0s.

This will result in stale data showing up in Search results as the managed cluster cannot connect to the hub.

Solution:

If Search CR is paused, it is recommended to remove the pause.

oc annotate search search-v2-operator search-pause-

Once the pause is removed, check if search-operator has the log Starting Search Addon. After confirming that the addon part of search-operator is started, delete the search-collector-hub-kubeconfig secret on the managed cluster which should then generate the certificate again. So, collector should then reload the new certificate and resend the latest payload.

Once the certificate is regenerated, check the addon-search-xxx-xxx pod logs to see if it is connecting to the hub successfully.

sender.go:195] Sending Resources { request: 136247, add:  0, update:  3, delete:  0, edge add:  0, edge delete:  0 }```