Business continuity and disaster recovery policy - codemagic-ci-cd/company-handbook GitHub Wiki

Overview

The Business Continuity and Disaster Recovery (BCDR) policy establishes procedures that will enable Nevercode to restore business operations expediently following disruptions such as cyber incidents, system failures, or other unforeseen challenges. This policy is maintained by the Nevercode Security Officer (Mar - [email protected]) and Privacy Officer (Martin Remmelgas - [email protected])

Policy Statements

BCDR procedures are maintained and updated at least annually, or following major changes/events.
The BCDR plan is reviewed and tested at least once per year. Documentation must be generated following the tests in order to identify areas of improvement to the plan.
The BCDR plan must be executed in alignment with existing security controls and procedures to prevent secondary security incidents. However, it's recognized that in certain critical situations, exceptions may be necessary to restore service, provided they are documented and reviewed post-incident.
Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) are determined and reviewed annually during the Disaster Recovery Testing period to ensure alignment with business requirements and technical capabilities.

Controls and Procedures

BCDR Objectives

The BCDR plan has the following high-level objectives

Identify the impact on company resources, systems, and operations.
Identify the correct activities and procedures to carry out in order to keep the business operating during the disruption.
Assign recovery and documentation responsibilities to relevant company personnel.
Ensure all affected customers and clients are kept up-to-date with the recovery process.
Ensure that the recovery processes meet the defined Recovery Time Objective (RTO) and Recovery Point Objective (RPO) as determined during the annual Disaster Recovery Testing period.

BCDR Plan

Notification/Activation Phase:

a. Incident Detection:
- Implement comprehensive monitoring systems to detect anomalies in real-time, including network activity, system performance, and application behavior.
b. Assessment:
- Once an incident is detected, immediately convene a response team to assess the severity, potential impact, and categorize the incident type (e.g., hardware failure, cyber-attack, natural disaster).
- Communicate the incident assessment to all relevant stakeholders.
c. Plan Activation:
- Based on the assessment, activate the appropriate disaster recovery procedure, specifying roles and responsibilities, and initiate the response protocol.
- Notify customers, if necessary, based on the incident's impact, adhering to any legal or regulatory requirements regarding breach notification.
Recovery Phase:

a. Establish Temporary IT Operations:
- Redirect network traffic to backup systems or utilize cloud-based services to maintain business operations.
- Retrieve data from off-site backups or cloud storage to facilitate temporary operations.
- Establish a temporary helpdesk or communication channel for users experiencing issues.
b. Damage Recovery:
- Initiate damage control measures to prevent further loss, such as isolating affected systems, applying emergency patches, or shutting down specific services.
- Start system restoration processes using backups, ensuring data integrity and security.
- Continuously communicate progress updates to stakeholders and customers as appropriate.
- Prioritize system restoration based on the predefined RTO and RPO, focusing on critical systems and data first.
- Continuously monitor the recovery progress against the RTO and RPO, and adjust the recovery strategy if necessary to meet the objectives.
Reconstitution Phase:

a. System Restoration:
- Once the ongoing disruption is mitigated, begin the process of restoring full services from the temporary operations to the primary systems.
- Conduct a thorough system audit and verification to ensure all systems are secure and fully operational.
- Confirm the integrity and confidentiality of data, ensuring no unauthorized alterations were made.
b. Return to Normal Operations:
- Formally document the return to normal operations and communicate this to all stakeholders.
- Gradually phase out temporary measures and ensure users experience a smooth transition back to regular operations.
c. Post-Incident Review:
- Conduct a post-mortem analysis to understand the root cause of the incident, the effectiveness of the BCDR plan, and areas that require improvement.
- Update the BCDR plan based on lessons learned and improve training, resources, and protocols as necessary.
- Share relevant incident details and lessons learned with stakeholders and, where appropriate, the broader community to help prevent future incidents.