Pega Case Archival - dmcphail/rw-pega-knowledge GitHub Wiki

Pega Case Archival Setup Guide (InMemory Pipeline Method)

🧭 Overview

This guide provides step-by-step instructions for configuring case archival in Pega using the InMemory Pipeline archival method. This approach offers a high-performance, resilient, and simplified archival process by executing all stages within a single job.


🚀 Benefits of InMemory Pipeline Archival

  • Simplified Setup: A single job handles all archival stages—Crawler, Copier, Indexer, and Purger—eliminating the need for multiple job schedulers.
  • Improved Resiliency: Reduces database fragmentation by minimizing updates to the pr_metadata table and truncating it at the end of each cycle.
  • Enhanced Performance: InMemory processing replaces most accesses to the pr_metadata table, leading to faster archival operations.
  • Easier Adoption: Fewer parameters to configure, with most settings optimized by default.
  • Fail-Safe Mechanisms: If archival failures persist after several retries, the job exits gracefully and generates a PDC alert to prevent runaway scenarios.

📋 Prerequisites

  • Pega Platform Version: Compatible with Pega 8.7.3, 8.7.6, 8.8.3, and Pega Infinity '23.
  • Archival Licensing: Ensure that case archival is licensed and enabled in your environment.
  • External Storage Configuration: Set up an external repository (e.g., S3, Azure Blob, GCP Cloud Storage) for storing archived cases.
  • System Settings: Ensure background processing nodes are properly configured.
  • Access Role: Admin privileges or appropriate Dev Studio access are required.

⚙️ Configuration Steps

1. Disable Legacy Archival Jobs

To prevent conflicts, disable the following legacy job schedulers:

  • pyPegaArchiver
  • pyPegaIndexer
  • pyPegaPurger

2. Enable InMemory Pipeline Archival

Set the following Dynamic System Setting (DSS) in the Pega-Engine class:

  • dataarchival/batchPipelineEnabled = true

3. Schedule the InMemory Pipeline Job

Configure the pyPegaArchiverUsingPipeline job scheduler:

  • Start Time: Set to run during off-peak hours.
  • Pipeline Duration: Specify the duration (in minutes) for each archival run.
  • Frequency: Schedule the job to run regularly based on your archival needs.

4. Adjust Performance Parameters (Optional)

Optional tuning:

  • maxCrawlerRequestors
  • maxCopierRequestors

✅ Testing Case Archival

🔎 Key Classes and Reports

Class Description
Data-Retention-Policy Contains policy criteria for archival; configured in Case Type > Settings
Log-ArchivalSummary Runtime metadata from archival jobs; use pyInstanceList report definition

Key properties to monitor:

  • .pyTaskName, .pyTaskStartTime, .pyTaskEndTime
  • .pyCaseType, .pyDuration, .pyCasesProcessed, .pyCasesUnsuccessful
  • .pyRecordsProcessed, .pyRecordsUnsuccessful

🧪 Refined Steps to Test Case Archival

  1. Create a dedicated branch ruleset and include it in the application stack.
  2. Check out the case type rule to be archived.
  3. Enable archival under Case Type → Settings → Archival, set retention in minutes.
  4. Check in the case type changes.
  5. Check out and modify pyPegaArchiverUsingPipeline (Job Scheduler), set interval to minutes.
  6. Check in scheduler changes and deploy branch.
  7. Create and resolve test cases.
  8. Run Log-ArchivalSummary.pyInstanceList to confirm archival status.
  9. Verify results:
    • Case is removed from main table.
    • Repository path: repository/archive/archivedclasses/{class}/YYYY/MM/DD/ArchivalFile*.zip
  10. (Optional): Restore a case to confirm end-to-end flow.

▶️ Manual Execution of Archival Pipeline

Activity: Data-ArchivalMetadata.pzPerformArchiveUsingPipeline

Use this to trigger archival on-demand.

Parameter Type Description
Pipelineduration Integer (min) How long to run the archival pipeline (e.g., 5)
Sleepduration Integer (ms) Delay between cycles (e.g., 5000 for 5 seconds)

🗑️ Data Expunging and Retention Policies

Pega supports configuring data retention and expunging policies to manage how long archived cases are stored before permanent deletion.

🔧 How to Configure Expunging Policies

  • Retention Period: Defined in Case Type settings.
  • Expunge Timeline: Specify a secondary timeline to delete archived cases after a defined number of days.
  • Setup Location: Access via Case Type → Settings → Archival

📌 Key Notes

  • Archived content is deleted from the repository when expunge triggers.
  • Logs and metrics are generated for tracking.
  • Default behavior can be customized via policies.

🛡️ Compliance Best Practices

  • Align policies with regulations like GDPR, CPPA, HIPAA.
  • Document retention/expunging timelines for audit.
  • Periodically validate enforcement of retention rules.

🔄 Restoring Archived Cases

  1. Set DSS: archival/enableRestore = true
  2. Use Restore in Admin or Dev Studio.
  3. (Optional) Schedule Restore-Case job for automated restores.

🛠️ Troubleshooting

Issue Possible Cause Resolution
Permission denied error on storage Misconfigured storage repository Validate repository credentials and access policies.
Cases not being archived Job scheduler misconfigured Check retention rules and job settings.
Archive data missing Incorrect class mapping or serialization Verify case type setup and check tracer logs.
Restore fails silently Restore DSS not enabled or archive corrupted Enable archival/enableRestore; validate logs.
Runaway archival process Repeated failures without job halt Check PDC alerts; manually stop or fix root cause.

📚 References