Content Flow Diagram Review - dedreval/xdps-docs GitHub Wiki

CDSR Flow Diagram Comparison

Comparison of GitHub Wiki Diagrams vs Actual Codebase Implementation

Based on analysis of both "New Review Flow" and "Updated Review Flow" diagrams compared to the actual codebase.


βœ… CORRECT Elements in Diagrams

Main Processing Flow

  • βœ… Downloading package from vendor's server - Correct (Archie/Aries/Import)
  • βœ… Unpacking content from package - Correct (Process 1004, CDSRPackageUnpackHandler)
  • βœ… Validating content - Partially correct (happens during unpack and conversion)
  • βœ… Converting JATS to Wiley ML 3G - Correct (Process 1005, JatsConversionHandler)
  • βœ… Validating result Wiley ML XML - Correct (Process 1008/1014, Wml3gValidationHandler)
  • βœ… Rendering XML to PDF - Correct (Process 1006, RenderFOPHandler)

Publishing Flow

  • βœ… Sending DS package - Correct (DSGenerator, DSSender)
  • βœ… Building DS package - Correct (part of DS generation)
  • βœ… Sending HW package - Correct (SemanticoGenerator, SemanticoSender)
  • βœ… Compiling package for HW - Correct (part of Semantico package generation)
  • βœ… Sending to WOLLIT - Correct (LiteratumGenerator, LiteratumSender, WolLoaderSender)

State Management

  • βœ… Waiting for response from HW - Correct (JMS message-based, HwResponder)
  • βœ… Waiting for response from WOLLIT - Correct (HTTP response-based)
  • βœ… Reprocess/Cancel loops - Correct (error handling mechanisms exist)

❌ MISSING or INCORRECT Elements

1. Missing Critical Steps

❌ Copy to Entire Database - MISSING

Actual Implementation:

  • After rendering completion, DeliveringService.finishUpload() is called
  • copyFromCurrentIssueToEntire() copies all content:
    • Source JATS β†’ /opt/efs/cochrane/clsysrev/entire/src/
    • ML3G β†’ /opt/efs/cochrane/clsysrev/entire/ml3g/
    • Rendered PDF β†’ /opt/efs/cochrane/clsysrev/entire/rnd_pdf_fop/
    • Rendered HTML β†’ /opt/efs/cochrane/clsysrev/entire/rnd_html/

Impact: This is a critical step that should be shown between rendering and publishing.

❌ Backup Creation - MISSING

Actual Implementation:

  • DeliveringService.makeBackup() creates backup copies in copy/ directory
  • Happens before processing records
  • Critical for recovery/rollback

❌ Metadata Extraction - NOT EXPLICITLY SHOWN

Actual Implementation:

  • Metadata (CDSRMetaVO) extracted during unpack step
  • Used throughout workflow for validation and processing

❌ Stats Data Handling - MISSING

Actual Implementation:

  • Stats data files processed during unpack
  • Stored separately from main content

2. Incorrect Flow Sequence

❌ "Compiling WOL package" in Main Flow - INCORRECT

Diagram Shows: Rendering β†’ Compiling WOL package β†’ Sending to WOLLIT

Actual Implementation:

  • "Compiling WOL package" is part of publishing workflow, not main processing
  • Publishing happens separately via Process 120 (SendToPublishHandler)
  • Publishing is optional and not automatic after rendering
  • Main flow ends after rendering completion and copy to entire

Correct Flow Should Be:

Rendering β†’ Copy to Entire β†’ [Optional Publishing: Compile Package β†’ Send]

❌ Publishing Steps Timing - INCORRECT

Diagram Shows: Publishing steps (DS, HW, WOLLIT) as part of main flow

Actual Implementation:

  • Publishing is a separate workflow triggered independently
  • Can happen:
    • After rendering (manual trigger)
    • Via Process 120 (SendToPublishHandler)
    • Via bulk publishing operations
    • Via WhenReady workflows
  • Publishing steps should be shown as parallel optional paths, not sequential

3. Missing Process Details

❌ Process IDs Not Shown

Should Include:

  • Process 114: UploadCDSR_JATS (main JATS flow)
  • Process 115: UploadCDSR_MeSH (MeSH update flow)
  • Process 116: ImportJATS (import flow)
  • Process 117: UpdateCDSR_Ml3G (ML3G update flow)
  • Process 120: SendToPublish (publishing workflow)

❌ Handler Classes Not Shown

Should Include:

  • CDSRPackageUnpackHandler (Process 1004)
  • JatsConversionHandler (Process 1005)
  • Wml3gValidationHandler (Process 1008)
  • RenderFOPHandler (Process 1006)
  • SendToPublishHandler (Process 120)

❌ Queue Types Not Shown

Should Include:

  • CMSProcessPartQueue (serial unpack)
  • CMSProcessPartBGQueue (parallel conversion/rendering)
  • CMSAcceptProcessPartQueue (rendering acceptance)
  • Publishing queues (JMS-based)

4. State Code Mismatches

❌ State Numbers Don't Match

Diagram Shows: state = 0, 2, 4, 6, 16, 1000, 1002, 1004, 1006

Actual Implementation:

  • Delivery File Statuses:

    • STATUS_UNZIPPED = 22
    • STATUS_QAS_STARTED = 16 βœ… (matches)
    • STATUS_RENDERING_STARTED = 18
    • STATUS_RND_FINISHED_SUCCESS = 10
    • STATUS_PUBLISHING_STARTED = 39
  • Record States:

    • STATE_WR_PUBLISHING = 2 βœ… (matches)
    • STATE_WR_PUBLISHED = 4 βœ… (matches)
    • STATE_CCH_PUBLISHED = 6 βœ… (matches)
    • STATE_HW_PUBLISHING = 10
  • Process IDs:

    • Process 1004 = PackageUnpack βœ… (matches)
    • Process 1006 = RenderFOP βœ… (matches)

Issue: State codes in diagram appear to mix delivery file statuses, record states, and process IDs, which is confusing.

5. Missing Storage Locations

❌ Storage Duplication Not Shown

Should Show:

  • Content stored in both issue-specific and entire database locations
  • Issue-specific: /opt/efs/cochrane/{issueId}/clsysrev/
  • Entire database: /opt/efs/cochrane/clsysrev/entire/
  • Repository rendering: /opt/efs/repository_rendering/

6. Missing Parallel Processing Indicators

❌ Batch Processing Not Shown

Actual Implementation:

  • JATS conversion: batch=5, capacity=4 (parallel processing)
  • Rendering: batch=5, capacity=4 (parallel processing)
  • Should show multiple records being processed simultaneously

7. Missing Error Handling

❌ Error Paths Not Complete

Should Show:

  • Package deletion on failure (delete-on-fail="true")
  • Failed record handling
  • Retry mechanisms
  • Error notifications

8. Missing External Service Interactions

❌ Rendering Service Not Shown

Actual Implementation:

  • Rendering is done by external rendering service
  • RenderFOPHandler initiates rendering via RenderingHelper.startRendering()
  • Results received via AcceptRenderQueue (JMS message)
  • Should show external service interaction

πŸ”„ Flow Sequence Corrections

Current Diagram Flow (INCORRECT):

Download β†’ Unpack β†’ Validate β†’ Convert β†’ Validate ML3G β†’ Render β†’ 
Compile WOL β†’ Send WOLLIT
[Parallel: DS package, HW package, Notifications]

Correct Flow Should Be:

Download → Unpack → Convert JATS→ML3G → Validate ML3G → Render PDF → 
Copy to Entire Database
[Separate/Optional: Publishing Workflow]
  β†’ Generate Packages (DS/HW/WOLLIT) β†’ Send Packages β†’ Wait for Responses

πŸ“‹ Recommended Diagram Improvements

1. Separate Main Flow from Publishing Flow

  • Main Processing Flow: Download β†’ Unpack β†’ Convert β†’ Validate β†’ Render β†’ Copy to Entire
  • Publishing Flow: Separate diagram or clearly marked as optional/parallel

2. Add Missing Steps

  • Copy to Entire Database (critical step)
  • Backup Creation
  • Metadata Extraction
  • Stats Data Handling

3. Clarify State Codes

  • Use consistent state code system
  • Distinguish between:
    • Delivery File Statuses (IDeliveryFileStatus)
    • Record States (RecordEntity.STATE_*)
    • Process IDs (process.xml)

4. Show Process IDs and Handlers

  • Add process IDs (114, 115, 116, 117, 120)
  • Add handler class names
  • Add queue names

5. Show Storage Locations

  • Indicate where files are stored at each step
  • Show duplication (issue-specific vs entire)

6. Show Parallel Processing

  • Indicate batch processing
  • Show multiple records in parallel

7. Show External Services

  • Rendering service interaction
  • External system interactions (HW, WOLLIT)

8. Improve Error Handling

  • Show error paths
  • Show retry mechanisms
  • Show failure notifications

🎯 Specific Issues by Diagram

"New Review Flow" Diagram Issues:

  1. ❌ Shows "Compiling WOL package" in main flow (should be in publishing)
  2. ❌ Shows "Sending to WOLLIT" in main flow (should be optional publishing)
  3. ❌ Missing "Copy to Entire Database" step
  4. ❌ Publishing steps shown as sequential, not optional/parallel
  5. ❌ State codes don't clearly map to actual status codes

"Updated Review Flow" Diagram Issues:

  1. ❌ Same issues as "New Review Flow"
  2. ❌ Shows "Compiling package for HW" in main flow (should be in publishing)
  3. ❌ Missing "Copy to Entire Database" step
  4. ❌ Doesn't show that publishing is optional/separate workflow

βœ… What Diagrams Do Well

  1. βœ… Show parallel paths for DS, HW, and notifications
  2. βœ… Show waiting states for external responses
  3. βœ… Show reprocess/cancel mechanisms
  4. βœ… Show trigger types (scheduler, Process Manager, UI, JMS)
  5. βœ… Show main processing sequence correctly (up to rendering)

πŸ“ Summary

Overall Assessment: The diagrams capture the high-level flow reasonably well but have critical gaps and incorrect sequencing:

  1. Missing Critical Step: Copy to Entire Database (happens after rendering)
  2. Incorrect Sequencing: Publishing steps shown as part of main flow (they're separate/optional)
  3. Missing Details: Process IDs, handlers, queues, storage locations
  4. State Code Confusion: Mixed use of different state code systems
  5. Missing Parallel Processing: Batch processing not indicated

Recommendation:

  • Separate main processing flow from publishing workflow
  • Add missing "Copy to Entire Database" step
  • Clarify that publishing is optional and separate
  • Add process IDs, handlers, and storage locations
  • Fix state code mappings