Notes:Mat:2024‐09‐es‐grafana‐upgrade - Safecast/reporting GitHub Wiki

Latest is 11.2.0, rob upgraded to 11.1.1 in https://github.com/Safecast/reporting/blob/main/Dockerfile.grafana#L1

https://github.com/grafana-toolbox/panodata-map-panel is no longer maintained and only tested to grafana 9. May need to switch panels or fork panodata to support grafana 11.

Unclear if new ES is connected or not. Can't seem to find the data source configuration on new grafana.

Looks like new grafana doesn't recoginize us as org admins anymore. I can still get into the old admin account, so using that to update the ES connection.

Opened https://github.com/Safecast/reporting/pull/13 for dev updates.

New grafana required origin pass for posts, so added that to nginx config.

Also per https://grafana.com/docs/grafana/latest/setup-grafana/configure-security/configure-authentication/github/#map-roles-using-github-teams it looks like the admin assignment syntax has changed a bit. Hopefully this fixes admin access.

Comment from Rob (2024-09-23). I did not update the Grafana. I tried many ways but was not successful. Eito-san did the upgrade. I propose we switch the GeoMap that is maintained. Works fine for me.

sept 30

Switching focus to ingest and getting data into ES.

Looks like the platform may have been upgraded on the ingest workers to Ruby 3.2 running on 64bit Amazon Linux 2023/4.0.12 - the instance has been rebooted a couple times, but seems just offline. Terminating to get a fresh start up to see what might be missing.

Looks like this is failing

2024/09/30 05:37:28.954951 [INFO] Running command: /opt/aws/bin/cfn-init -s arn:aws:cloudformation:us-west-2:985752656544:stack/awseb-e-sss3qhsn6h-stack/a863e520-ffd5-11ee-b62a-028434d2a9b3 -r AWSEBAutoScalingGroup --region us-west-2 --configsets Infra-EmbeddedPreBuild
2024/09/30 05:37:30.189094 [INFO] Error occurred during build: Command parameters_cache failed

Opened https://github.com/Safecast/ingest/pull/120 to fix. Failed on build. Looks like circle lost access to the repo, this was apparently from deploy keys being removed. So I re-added it to the repo.

The updated env is also using ruby 3.2.2, so might need to upgrade ingest to handle that as well.

Next hit this:

 Could not find a version that satisfies the requirement botocore<1.36.0,>=1.35.0 (from awsebcli)

Looks like the version of pip we have in the container won't install botocore at a version that awsebcli is happy with. Looks like the circleci image for ruby 2.6.6 only has python 3.6 which botocore dropped support for last year. Will see if I can get ingest upgraded to a recent ruby and maybe move to github actions build.

Got 3.8 working on the ruby image, so maybe don't need to upgrade. Next dealing with a mismatch between the latest ebcli and package.py. Might need to try https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/eb3-appversion.html

oct 16

Still need to fix the ingest build. Will work on an alternative build process using updated ebcli.

Got a command that should work in https://github.com/Safecast/ingest/pull/120 but circleci iam doesn't have the right access. Updating operations/terraform for that, but the locked version isn't available on Apple Silicon, so need to update terraform as well.

https://github.com/Safecast/infrastructure/pull/45 opened, applied, merged to update terraform to latest. This allowed creation of the app version, but I can't deploy it to the new env because of a ruby version mismatch so ruby update for the ingest app is also required.

I added a ruby version bump to https://github.com/Safecast/ingest/pull/120 but so far the tests won't run locally, so there's more work to do to get the app compatible with the latest-available ruby platform on beanstalk.

Nov 29

Looks like the dev version of postgres for ingest is still on 9, but safecastingest-prd is 11.22, so will need a new dev image to upgrade ingest. The RDS is still running postgis 2.5.1 which doesn't appear to be availble in the postgres:11 image. Will try what's available from apt.

[ec2-user@ip-172-31-12-53 ~]$ psql
psql (11.5、サーバ 11.22)
SSL 接続 (プロトコル: TLSv1.2、暗号化方式: ECDHE-RSA-AES256-GCM-SHA384、ビット長: 256、圧縮: オフ)
"help" でヘルプを表示します。

safecast=> \dx
                                        インストール済みの拡張一覧
   名前    | バージョン |  スキーマ  |                                説明                                 
-----------+------------+------------+---------------------------------------------------------------------
 plpgsql   | 1.0        | pg_catalog | PL/pgSQL procedural language
 postgis   | 2.5.1      | public     | PostGIS geometry, geography, and raster spatial types and functions

Per https://serverfault.com/a/1130167 looks like postgres:11 image being based on debian 9 means we're unable to currently build a dev container for 11 with postgis. Will try the latest postgres versions. If it works I can use this as a chance to also update the DB.

Made some progress on 16.1 which is available on RDS. If I can get everything passing I'll try deploying, if needed I"ll also upgrade RDS.

Nov 30

Build passed https://app.circleci.com/pipelines/github/Safecast/ingest/225/workflows/6e5680a1-80a9-4e06-9bda-5725a5a842f0/jobs/235

Attempting to build new worker env

Dec 1

Client libs on eb were still attempting to install 11. Trying to update those to 16.1 as well.

2024/11/30 21:48:38.863122 [INFO] CommandService Response: {"status":"SUCCESS","api_version":"1.0","results":[{"status":"SUCCESS","msg":"Engine execution has succeeded.","returncode":0,"events":[{"msg":"Instance deployment completed successfully.","timestamp":1733003318863,"severity":"INFO"}]}]}

2024/11/30 21:48:38.864220 [INFO] Platform Engine finished execution on command: app-deploy

So that's a healthy worker 4. Looks like the latest AMI is systemd, so upstart services for sqs reading will have to be updated to systemd units.

Additionally it seems like env vars aren't getting populated into the ssh env which is surprising. Not sure how db migrate worked without them.

Got what should be the basics going by yanking env vars from /opt/elasticbeanstalk/deployment/env and put a raise into the worker script to avoid it actually writing and clearing queues for now.

The next issue has to do with the ES version change. Pretty sure _template isn't supported on 8 and we'll have to change to an ingest pipeline or stream.

I, [2024-11-30T22:11:22.532888 #8509]  INFO -- Workers::ElasticCloud: Starting real-time worker from https://sqs.us-west-2.amazonaws.com/985752656544/ingest-measurements-to-elasticcloud-prd to 40ad140d461d810ac41ed710b5c7a5b6.us-west-2.aws.found.io
2024-11-30 22:11:22 +0000: PUT https://ingest:**************@40ad140d461d810ac41ed710b5c7a5b6.us-west-2.aws.found.io:9243/_template/ingest-measurements [status:410, request:0.039s, query:N/A]
⚠️ **GitHub.com Fallback** ⚠️