Pega Case Archival - dmcphail/rw-pega-knowledge GitHub Wiki

Pega Case Archival Setup Guide (InMemory Pipeline Method)

🧭 Overview

This guide provides step-by-step instructions for configuring case archival in Pega using the InMemory Pipeline archival method. This approach offers a high-performance, resilient, and simplified archival process by executing all stages within a single job.

🚀 Benefits of InMemory Pipeline Archival

Simplified Setup: A single job handles all archival stages—Crawler, Copier, Indexer, and Purger—eliminating the need for multiple job schedulers.
Improved Resiliency: Reduces database fragmentation by minimizing updates to the pr_metadata table and truncating it at the end of each cycle.
Enhanced Performance: InMemory processing replaces most accesses to the pr_metadata table, leading to faster archival operations.
Easier Adoption: Fewer parameters to configure, with most settings optimized by default.
Fail-Safe Mechanisms: If archival failures persist after several retries, the job exits gracefully and generates a PDC alert to prevent runaway scenarios.

📋 Prerequisites

Pega Platform Version: Compatible with Pega 8.7.3, 8.7.6, 8.8.3, and Pega Infinity '23.
Archival Licensing: Ensure that case archival is licensed and enabled in your environment.
External Storage Configuration: Set up an external repository (e.g., S3, Azure Blob, GCP Cloud Storage) for storing archived cases.
System Settings: Ensure background processing nodes are properly configured.
Access Role: Admin privileges or appropriate Dev Studio access are required.

⚙️ Configuration Steps

1. Disable Legacy Archival Jobs

To prevent conflicts, disable the following legacy job schedulers:

pyPegaArchiver
pyPegaIndexer
pyPegaPurger

2. Enable InMemory Pipeline Archival

Set the following Dynamic System Setting (DSS) in the Pega-Engine class:

dataarchival/batchPipelineEnabled = true

3. Schedule the InMemory Pipeline Job

Configure the pyPegaArchiverUsingPipeline job scheduler:

Start Time: Set to run during off-peak hours.
Pipeline Duration: Specify the duration (in minutes) for each archival run.
Frequency: Schedule the job to run regularly based on your archival needs.

4. Adjust Performance Parameters (Optional)

Optional tuning:

maxCrawlerRequestors
maxCopierRequestors

✅ Testing Case Archival

🔎 Key Classes and Reports

Class	Description
`Data-Retention-Policy`	Contains policy criteria for archival; configured in Case Type > Settings
`Log-ArchivalSummary`	Runtime metadata from archival jobs; use `pyInstanceList` report definition

Key properties to monitor:

.pyTaskName, .pyTaskStartTime, .pyTaskEndTime
.pyCaseType, .pyDuration, .pyCasesProcessed, .pyCasesUnsuccessful
.pyRecordsProcessed, .pyRecordsUnsuccessful

🧪 Refined Steps to Test Case Archival

Create a dedicated branch ruleset and include it in the application stack.
Check out the case type rule to be archived.
Enable archival under Case Type → Settings → Archival, set retention in minutes.
Check in the case type changes.
Check out and modify pyPegaArchiverUsingPipeline (Job Scheduler), set interval to minutes.
Check in scheduler changes and deploy branch.
Create and resolve test cases.
Run Log-ArchivalSummary.pyInstanceList to confirm archival status.
Verify results:
- Case is removed from main table.
- Repository path: repository/archive/archivedclasses/{class}/YYYY/MM/DD/ArchivalFile*.zip
(Optional): Restore a case to confirm end-to-end flow.

▶️ Manual Execution of Archival Pipeline

Activity: Data-ArchivalMetadata.pzPerformArchiveUsingPipeline

Use this to trigger archival on-demand.

Parameter	Type	Description
`Pipelineduration`	Integer (min)	How long to run the archival pipeline (e.g., `5`)
`Sleepduration`	Integer (ms)	Delay between cycles (e.g., `5000` for 5 seconds)

🗑️ Data Expunging and Retention Policies

Pega supports configuring data retention and expunging policies to manage how long archived cases are stored before permanent deletion.

🔧 How to Configure Expunging Policies

Retention Period: Defined in Case Type settings.
Expunge Timeline: Specify a secondary timeline to delete archived cases after a defined number of days.
Setup Location: Access via Case Type → Settings → Archival

📌 Key Notes

Archived content is deleted from the repository when expunge triggers.
Logs and metrics are generated for tracking.
Default behavior can be customized via policies.

🛡️ Compliance Best Practices

Align policies with regulations like GDPR, CPPA, HIPAA.
Document retention/expunging timelines for audit.
Periodically validate enforcement of retention rules.

🔄 Restoring Archived Cases

Set DSS: archival/enableRestore = true
Use Restore in Admin or Dev Studio.
(Optional) Schedule Restore-Case job for automated restores.

🛠️ Troubleshooting

Issue	Possible Cause	Resolution
Permission denied error on storage	Misconfigured storage repository	Validate repository credentials and access policies.
Cases not being archived	Job scheduler misconfigured	Check retention rules and job settings.
Archive data missing	Incorrect class mapping or serialization	Verify case type setup and check tracer logs.
Restore fails silently	Restore DSS not enabled or archive corrupted	Enable `archival/enableRestore`; validate logs.
Runaway archival process	Repeated failures without job halt	Check PDC alerts; manually stop or fix root cause.