Content Flow Diagram Review - dedreval/xdps-docs GitHub Wiki
CDSR Flow Diagram Comparison
Comparison of GitHub Wiki Diagrams vs Actual Codebase Implementation
Based on analysis of both "New Review Flow" and "Updated Review Flow" diagrams compared to the actual codebase.
β CORRECT Elements in Diagrams
Main Processing Flow
- β Downloading package from vendor's server - Correct (Archie/Aries/Import)
- β
Unpacking content from package - Correct (Process 1004,
CDSRPackageUnpackHandler) - β Validating content - Partially correct (happens during unpack and conversion)
- β
Converting JATS to Wiley ML 3G - Correct (Process 1005,
JatsConversionHandler) - β
Validating result Wiley ML XML - Correct (Process 1008/1014,
Wml3gValidationHandler) - β
Rendering XML to PDF - Correct (Process 1006,
RenderFOPHandler)
Publishing Flow
- β
Sending DS package - Correct (
DSGenerator,DSSender) - β Building DS package - Correct (part of DS generation)
- β
Sending HW package - Correct (
SemanticoGenerator,SemanticoSender) - β Compiling package for HW - Correct (part of Semantico package generation)
- β
Sending to WOLLIT - Correct (
LiteratumGenerator,LiteratumSender,WolLoaderSender)
State Management
- β
Waiting for response from HW - Correct (JMS message-based,
HwResponder) - β Waiting for response from WOLLIT - Correct (HTTP response-based)
- β Reprocess/Cancel loops - Correct (error handling mechanisms exist)
β MISSING or INCORRECT Elements
1. Missing Critical Steps
β Copy to Entire Database - MISSING
Actual Implementation:
- After rendering completion,
DeliveringService.finishUpload()is called copyFromCurrentIssueToEntire()copies all content:- Source JATS β
/opt/efs/cochrane/clsysrev/entire/src/ - ML3G β
/opt/efs/cochrane/clsysrev/entire/ml3g/ - Rendered PDF β
/opt/efs/cochrane/clsysrev/entire/rnd_pdf_fop/ - Rendered HTML β
/opt/efs/cochrane/clsysrev/entire/rnd_html/
- Source JATS β
Impact: This is a critical step that should be shown between rendering and publishing.
β Backup Creation - MISSING
Actual Implementation:
DeliveringService.makeBackup()creates backup copies incopy/directory- Happens before processing records
- Critical for recovery/rollback
β Metadata Extraction - NOT EXPLICITLY SHOWN
Actual Implementation:
- Metadata (CDSRMetaVO) extracted during unpack step
- Used throughout workflow for validation and processing
β Stats Data Handling - MISSING
Actual Implementation:
- Stats data files processed during unpack
- Stored separately from main content
2. Incorrect Flow Sequence
β "Compiling WOL package" in Main Flow - INCORRECT
Diagram Shows: Rendering β Compiling WOL package β Sending to WOLLIT
Actual Implementation:
- "Compiling WOL package" is part of publishing workflow, not main processing
- Publishing happens separately via Process 120 (
SendToPublishHandler) - Publishing is optional and not automatic after rendering
- Main flow ends after rendering completion and copy to entire
Correct Flow Should Be:
Rendering β Copy to Entire β [Optional Publishing: Compile Package β Send]
β Publishing Steps Timing - INCORRECT
Diagram Shows: Publishing steps (DS, HW, WOLLIT) as part of main flow
Actual Implementation:
- Publishing is a separate workflow triggered independently
- Can happen:
- After rendering (manual trigger)
- Via Process 120 (
SendToPublishHandler) - Via bulk publishing operations
- Via WhenReady workflows
- Publishing steps should be shown as parallel optional paths, not sequential
3. Missing Process Details
β Process IDs Not Shown
Should Include:
- Process 114: UploadCDSR_JATS (main JATS flow)
- Process 115: UploadCDSR_MeSH (MeSH update flow)
- Process 116: ImportJATS (import flow)
- Process 117: UpdateCDSR_Ml3G (ML3G update flow)
- Process 120: SendToPublish (publishing workflow)
β Handler Classes Not Shown
Should Include:
CDSRPackageUnpackHandler(Process 1004)JatsConversionHandler(Process 1005)Wml3gValidationHandler(Process 1008)RenderFOPHandler(Process 1006)SendToPublishHandler(Process 120)
β Queue Types Not Shown
Should Include:
CMSProcessPartQueue(serial unpack)CMSProcessPartBGQueue(parallel conversion/rendering)CMSAcceptProcessPartQueue(rendering acceptance)- Publishing queues (JMS-based)
4. State Code Mismatches
β State Numbers Don't Match
Diagram Shows: state = 0, 2, 4, 6, 16, 1000, 1002, 1004, 1006
Actual Implementation:
-
Delivery File Statuses:
- STATUS_UNZIPPED = 22
- STATUS_QAS_STARTED = 16 β (matches)
- STATUS_RENDERING_STARTED = 18
- STATUS_RND_FINISHED_SUCCESS = 10
- STATUS_PUBLISHING_STARTED = 39
-
Record States:
- STATE_WR_PUBLISHING = 2 β (matches)
- STATE_WR_PUBLISHED = 4 β (matches)
- STATE_CCH_PUBLISHED = 6 β (matches)
- STATE_HW_PUBLISHING = 10
-
Process IDs:
- Process 1004 = PackageUnpack β (matches)
- Process 1006 = RenderFOP β (matches)
Issue: State codes in diagram appear to mix delivery file statuses, record states, and process IDs, which is confusing.
5. Missing Storage Locations
β Storage Duplication Not Shown
Should Show:
- Content stored in both issue-specific and entire database locations
- Issue-specific:
/opt/efs/cochrane/{issueId}/clsysrev/ - Entire database:
/opt/efs/cochrane/clsysrev/entire/ - Repository rendering:
/opt/efs/repository_rendering/
6. Missing Parallel Processing Indicators
β Batch Processing Not Shown
Actual Implementation:
- JATS conversion: batch=5, capacity=4 (parallel processing)
- Rendering: batch=5, capacity=4 (parallel processing)
- Should show multiple records being processed simultaneously
7. Missing Error Handling
β Error Paths Not Complete
Should Show:
- Package deletion on failure (
delete-on-fail="true") - Failed record handling
- Retry mechanisms
- Error notifications
8. Missing External Service Interactions
β Rendering Service Not Shown
Actual Implementation:
- Rendering is done by external rendering service
RenderFOPHandlerinitiates rendering viaRenderingHelper.startRendering()- Results received via
AcceptRenderQueue(JMS message) - Should show external service interaction
π Flow Sequence Corrections
Current Diagram Flow (INCORRECT):
Download β Unpack β Validate β Convert β Validate ML3G β Render β
Compile WOL β Send WOLLIT
[Parallel: DS package, HW package, Notifications]
Correct Flow Should Be:
Download β Unpack β Convert JATSβML3G β Validate ML3G β Render PDF β
Copy to Entire Database
[Separate/Optional: Publishing Workflow]
β Generate Packages (DS/HW/WOLLIT) β Send Packages β Wait for Responses
π Recommended Diagram Improvements
1. Separate Main Flow from Publishing Flow
- Main Processing Flow: Download β Unpack β Convert β Validate β Render β Copy to Entire
- Publishing Flow: Separate diagram or clearly marked as optional/parallel
2. Add Missing Steps
- Copy to Entire Database (critical step)
- Backup Creation
- Metadata Extraction
- Stats Data Handling
3. Clarify State Codes
- Use consistent state code system
- Distinguish between:
- Delivery File Statuses (IDeliveryFileStatus)
- Record States (RecordEntity.STATE_*)
- Process IDs (process.xml)
4. Show Process IDs and Handlers
- Add process IDs (114, 115, 116, 117, 120)
- Add handler class names
- Add queue names
5. Show Storage Locations
- Indicate where files are stored at each step
- Show duplication (issue-specific vs entire)
6. Show Parallel Processing
- Indicate batch processing
- Show multiple records in parallel
7. Show External Services
- Rendering service interaction
- External system interactions (HW, WOLLIT)
8. Improve Error Handling
- Show error paths
- Show retry mechanisms
- Show failure notifications
π― Specific Issues by Diagram
"New Review Flow" Diagram Issues:
- β Shows "Compiling WOL package" in main flow (should be in publishing)
- β Shows "Sending to WOLLIT" in main flow (should be optional publishing)
- β Missing "Copy to Entire Database" step
- β Publishing steps shown as sequential, not optional/parallel
- β State codes don't clearly map to actual status codes
"Updated Review Flow" Diagram Issues:
- β Same issues as "New Review Flow"
- β Shows "Compiling package for HW" in main flow (should be in publishing)
- β Missing "Copy to Entire Database" step
- β Doesn't show that publishing is optional/separate workflow
β What Diagrams Do Well
- β Show parallel paths for DS, HW, and notifications
- β Show waiting states for external responses
- β Show reprocess/cancel mechanisms
- β Show trigger types (scheduler, Process Manager, UI, JMS)
- β Show main processing sequence correctly (up to rendering)
π Summary
Overall Assessment: The diagrams capture the high-level flow reasonably well but have critical gaps and incorrect sequencing:
- Missing Critical Step: Copy to Entire Database (happens after rendering)
- Incorrect Sequencing: Publishing steps shown as part of main flow (they're separate/optional)
- Missing Details: Process IDs, handlers, queues, storage locations
- State Code Confusion: Mixed use of different state code systems
- Missing Parallel Processing: Batch processing not indicated
Recommendation:
- Separate main processing flow from publishing workflow
- Add missing "Copy to Entire Database" step
- Clarify that publishing is optional and separate
- Add process IDs, handlers, and storage locations
- Fix state code mappings