Synchronization - NCIOCPL/cgov-digital-platform GitHub Wiki

This was my initial statement of the problem:

We will need to decide how we will handle the situation in which either a non-production CDR tier or Drupal CMS instance is refreshed from the production system, or the CDR on a lower tier is linked to a different CMS instance than it had been. Will the CDR replace all of the PDQ content in the CMS to force it to reflect the upstream documents? If so, what happens to incoming links to PDQ content? Will refreshes of the CDR and the CMS from production be required to happen in tandem?

Digging deeper, the problem appears to be more complicated. At least for the initial launch, it has been decided that some of the content will continue to be served up as they have been in the existing system. For example, media files (images, audio clips, etc.) will eventually be stored and served from Akamai, but in the short run they we will point to them where they have been hosted in the past.

Let's consider some possible scenarios.

Scenario 1 - re-used PDQ content ID

The CDR DEV tier was refreshed from CDR PROD six months earlier, and is pointed to a Drupal CMS instance which was refreshed was the production CMS last month. PDQ Summary CDR800001 was created on CDR DEV three months ago for the Toenail Cancer Treatment Summary. The linked CMS has CDR800001 as the Kneecap Cancer Treatment Summary, and there is non-PDQ content in the CMS which links to this summary. During the first publishing job since CDR DEV was pointed to this CMS instance, it will need some mechanism to detect that it is no longer talking to the same CMS instance to which it connected for the previous publishing job. Assuming such a mechanism is in place, if the CDR client simply wipes out all of the PDQ content on the CMS and does a fresh full PDQ content load, the non-PDQ content linking to CDR800001 will not be aware that it's pointing to the wrong summary.

Scenario 2 - broken link

A variation on the previous scenario might have the same summary in the CDR and in the Drupal CMS, but under a different CDR ID. Once we have implemented software to recognize and track links from non-PDQ content items to PDQ content, the attempt to replace the PDQ content on the CMS would fail because deletion of the linked content would be blocked. Until such impact analysis software is in place, the PDQ content replacement would either fail (because PDQ node deletion has been forbidden), or leave behind broken links (in the absence of such a prohibition).

Scenario 3 - links to other CDR content

I don't really know where the other CDR documents will be coming from, so I don't know if (a) we will have as many separate sources for media documents (audio clips, images, etc.), terminology, dictionary entries, and so forth as we have tiers in the CDR. If we don't, we'll be unable to point to the right content from the PDQ summaries.