CouchDB Diagram - dmwm/WMCore GitHub Wiki

couch diagram

CMSWEB CouchDB

  • reqmgr_workload_cache (P) - request(workflow) data (inserted/updated by reqmgr, updated by GQ, wmagent)
  • reqmgr_config_cache (P) - config cache data which used by request (inserted by McM)
  • reqmgr_auxiliary (P) - meta data used in reqmgr (user, cmssw version, etc) (inserted/updated by reqmgr)
  • workqueue_inbox (T) - request(workflow) data inserted/updated by workqueue (data pulled down form reqmgr_workload_cache)
  • workqueue (T) - request(workflow) data inserted/updated by workqueue (data pulled down form workqueue_inbox and dbs) (partially replicated to and from local workqueue_inbox)
  • wmstats (T) - job and agent information (replicated from agents)
  • workloadsummary (P) - summary of request (workflow) when it is done (inserted/updated by agents)
  • acdcserver (P) - information about failed job/files (inserted/updated by agent)
  • wmdatamining (P) - temporal summary of wmstats (inserted/updated by reqmgr)
  • t0_request (P) - T0 equivalent of reqmgr_workload_cache
  • tier0_wmstats (T) - T0 equivalent of wmstats
  • t0_workloadsummary (P) - T0 equivalent of workloadsummary

WMAgent CouchDB

  • wmagent_jobdump/fwjrs - frame work job report information inserted/updated by agent
  • wmagent_jobdump/jobs - job status information inserted/updated by agent
  • stat_summary - summary snap shot of wmagent_jobdump/fwjrs inserted/updated by agent
  • wmagent_summary - summary inforamtion from wmagent_jobdump/jobs and stat_summary inserted/updated by agent (replicated to wmstats)
  • workqueue_inbox - partially replicated from and to global workqueue (updated by agent (WorkQueueManager))
  • workqueue - copy of workqueue_inbox (updated by agent (WorkQueueManager))
  • alert_soft - error log message inserted by agent (replicated to alertscollector)
  • alert_critical - error log message inserted by agent (replicated to alertscollector)

in diagram solid line indicates replication, dotted line indicates data flow (P) - permanent data, (T) - temporary data

Couch databases organization in CMSWEB

Our current central CouchDB setup relies on 4 different backends, where each backend has all the databases used in the workload management eco-system. Even though all the backends have all the databases, only a few databases have real data and, that is decided according to the CMSWEB frontend rules.

Full information of CMSWEB services and CouchDB databases can be found in the CMSWEB documentation: https://gitlab.cern.ch/cms-http-group/doc/blob/master/doc/activity.md

With the recent VM migration - to more powerful and SSD nodes - the organization of databases is the following:

backend databases
vocms0841 workqueue, workqueue_inbox
vocms0842 reqmgr_workload_cache, reqmgr_config_cache, acdcserver, reqmgr_auxiliary
vocms0843 wmstats, workloadsummary, wmstats_logdb
vocms0844 tier0_wmstats, t0_workloadsummary, t0_request, t0_logdb

while before the VM migration, this is how the databases were distributed:

backend databases
vocms0740 workqueue, workqueue_inbox
vocms0742 wmdatamining, reqmgr_workload_cache, reqmgr_config_cache, acdcserver, reqmgr_auxiliary
vocms0743 wmstats, workloadsummary, wmstats_logdb
vocms0744 tier0_wmstats, t0_workloadsummary, t0_request, t0_logdb

When we access the CouchDB Futon interface, the magic happens (possibly making a request for each database, out of _all_dbs) and it shows the data size and amount of documents from the correct databases.

Relevant changes in CouchDB 3.x

This section highlights some changes available in CouchDB 3.x with regards to CouchDB 1.6.1. Note that this is not an exhaustive list, but definitely features that need to be carefully validated and/or re-integrated within WMCore.

  1. The update sequences returned by the /db/_changes feed are no longer integers. They can be any JSON value. Applications should treat them as opaque values and return them to CouchDB as-is.

  2. Temporary views are no longer supported.

  3. The all_or_nothing option is no longer supported by the bulk_docs API

  4. The stale parameter for views and _find has been deprecated in favour of two new parameters: stable and update. The old stale=ok behaviour is equivalent to stable=true&update=false, and the old stale=update_after behaviour is equivalent to stable=true&update=lazy. The deprecated stale parameter will be removed in CouchDB 3.0.

  5. The new [httpd] max_http_request_size configuration parameter was added. This has the same behavior as the old couchdb/max_document_size configuration parameter, which had been unfortunately misnamed, and has now been updated to behave as the name would suggest. Both are documented in the shipped default.ini file. Note that the default for this new parameter is 64MB instead of 4GB. If you get errors when trying to PUT or POST and see HTTP 413 return codes in couchdb logs, this could be the culprit. This can affect couchup in-place upgrades as well.

  6. The default maximum document size has been reduced to 8MB. This means that databases with larger documents will not be able to replicate into CouchDB 3.0 correctly without modification. This change has been made in preparation for anticipated hard upper limits on document size imposed by CouchDB 4.0. For 3.x, the max document size setting can be relaxed via the [couchdb] max_document_size config setting.

  7. CouchDB 3.0 now requires admin-level access for the /_all_dbs endpoint.

  8. All databases are now created by default as admin-only. That is, the default new database _security object is now:

{
  "members" : { "roles" : [ "_admin" ] },
   "admins" : { "roles" : [ "_admin" ] }
}
  1. After upgrading all nodes in a cluster to 3.0, add [rexi] use_kill_all = true to local.ini to save some intra-cluster network bandwidth.

  2. Local endpoints for replication targets, which never functioned as expected in CouchDB 2.x, have been completely removed. When replicating databases, always specify a full URL for the source and target. In addition, the node local _replicator database is no longer automatically created.

  3. The disk_size and data_size fields have been retired from the database info object returned by GET /{db}/. These were deprecated in CouchDB 2.x and replaced by the sizes object, which contains the improved file, active and external size metrics. Fauxton has been updated to match.

A complete reference to the breaking changes and upgrade notes can be found at the following links:

Last but not least, note these DEPRECATION warning for the upcoming CouchDB 4.x, HERE