Pega Case Archival - dmcphail/rw-pega-knowledge GitHub Wiki
Pega Case Archival Setup Guide (InMemory Pipeline Method)
🧭 Overview
This guide provides step-by-step instructions for configuring case archival in Pega using the InMemory Pipeline archival method. This approach offers a high-performance, resilient, and simplified archival process by executing all stages within a single job.
🚀 Benefits of InMemory Pipeline Archival
- Simplified Setup: A single job handles all archival stages—Crawler, Copier, Indexer, and Purger—eliminating the need for multiple job schedulers.
- Improved Resiliency: Reduces database fragmentation by minimizing updates to the
pr_metadata
table and truncating it at the end of each cycle. - Enhanced Performance: InMemory processing replaces most accesses to the
pr_metadata
table, leading to faster archival operations. - Easier Adoption: Fewer parameters to configure, with most settings optimized by default.
- Fail-Safe Mechanisms: If archival failures persist after several retries, the job exits gracefully and generates a PDC alert to prevent runaway scenarios.
📋 Prerequisites
- Pega Platform Version: Compatible with Pega 8.7.3, 8.7.6, 8.8.3, and Pega Infinity '23.
- Archival Licensing: Ensure that case archival is licensed and enabled in your environment.
- External Storage Configuration: Set up an external repository (e.g., S3, Azure Blob, GCP Cloud Storage) for storing archived cases.
- System Settings: Ensure background processing nodes are properly configured.
- Access Role: Admin privileges or appropriate Dev Studio access are required.
⚙️ Configuration Steps
1. Disable Legacy Archival Jobs
To prevent conflicts, disable the following legacy job schedulers:
pyPegaArchiver
pyPegaIndexer
pyPegaPurger
2. Enable InMemory Pipeline Archival
Set the following Dynamic System Setting (DSS) in the Pega-Engine
class:
dataarchival/batchPipelineEnabled
=true
3. Schedule the InMemory Pipeline Job
Configure the pyPegaArchiverUsingPipeline
job scheduler:
- Start Time: Set to run during off-peak hours.
- Pipeline Duration: Specify the duration (in minutes) for each archival run.
- Frequency: Schedule the job to run regularly based on your archival needs.
4. Adjust Performance Parameters (Optional)
Optional tuning:
maxCrawlerRequestors
maxCopierRequestors
✅ Testing Case Archival
🔎 Key Classes and Reports
Class | Description |
---|---|
Data-Retention-Policy |
Contains policy criteria for archival; configured in Case Type > Settings |
Log-ArchivalSummary |
Runtime metadata from archival jobs; use pyInstanceList report definition |
Key properties to monitor:
.pyTaskName
,.pyTaskStartTime
,.pyTaskEndTime
.pyCaseType
,.pyDuration
,.pyCasesProcessed
,.pyCasesUnsuccessful
.pyRecordsProcessed
,.pyRecordsUnsuccessful
🧪 Refined Steps to Test Case Archival
- Create a dedicated branch ruleset and include it in the application stack.
- Check out the case type rule to be archived.
- Enable archival under Case Type → Settings → Archival, set retention in minutes.
- Check in the case type changes.
- Check out and modify
pyPegaArchiverUsingPipeline
(Job Scheduler), set interval to minutes. - Check in scheduler changes and deploy branch.
- Create and resolve test cases.
- Run
Log-ArchivalSummary.pyInstanceList
to confirm archival status. - Verify results:
- Case is removed from main table.
- Repository path:
repository/archive/archivedclasses/{class}/YYYY/MM/DD/ArchivalFile*.zip
- (Optional): Restore a case to confirm end-to-end flow.
▶️ Manual Execution of Archival Pipeline
Activity: Data-ArchivalMetadata.pzPerformArchiveUsingPipeline
Use this to trigger archival on-demand.
Parameter | Type | Description |
---|---|---|
Pipelineduration |
Integer (min) | How long to run the archival pipeline (e.g., 5 ) |
Sleepduration |
Integer (ms) | Delay between cycles (e.g., 5000 for 5 seconds) |
🗑️ Data Expunging and Retention Policies
Pega supports configuring data retention and expunging policies to manage how long archived cases are stored before permanent deletion.
🔧 How to Configure Expunging Policies
- Retention Period: Defined in Case Type settings.
- Expunge Timeline: Specify a secondary timeline to delete archived cases after a defined number of days.
- Setup Location: Access via Case Type → Settings → Archival
📌 Key Notes
- Archived content is deleted from the repository when expunge triggers.
- Logs and metrics are generated for tracking.
- Default behavior can be customized via policies.
🛡️ Compliance Best Practices
- Align policies with regulations like GDPR, CPPA, HIPAA.
- Document retention/expunging timelines for audit.
- Periodically validate enforcement of retention rules.
🔄 Restoring Archived Cases
- Set DSS:
archival/enableRestore = true
- Use Restore in Admin or Dev Studio.
- (Optional) Schedule
Restore-Case
job for automated restores.
🛠️ Troubleshooting
Issue | Possible Cause | Resolution |
---|---|---|
Permission denied error on storage | Misconfigured storage repository | Validate repository credentials and access policies. |
Cases not being archived | Job scheduler misconfigured | Check retention rules and job settings. |
Archive data missing | Incorrect class mapping or serialization | Verify case type setup and check tracer logs. |
Restore fails silently | Restore DSS not enabled or archive corrupted | Enable archival/enableRestore ; validate logs. |
Runaway archival process | Repeated failures without job halt | Check PDC alerts; manually stop or fix root cause. |
📚 References
- Pega InMemory Archival Documentation
- Case Archival & Expunge Details
- Pega Support Home
- Pega Community Articles