Business continuity & disaster recovery plan

Overview

Business continuity and disaster recovery is a process that enables to proactively identify and plan to minimize the impact of risks that could affect its objectives, operations and infrastructure, to be prepared to expected and unexpected situations.

Business continuity scenarios

Functional scenarios

  • JIRA users access to the plugin

  • Users interact with the plugin and it’s configuration

  • In case of DEISER’s Exporter for JIRA the user can download a file with the exported data

Business continuity organization

In order to coordinate the efforts to solve any incident and assure the continuity of the systems that support the business there are different teams involved, each of them with their leader:

  • Support Team

  • Development Team

  • QA Team

Preventive measures

System monitoring and alerting

Every server is monitored in order to detect any kind of issue as soon as possible. The measures are centralized in a dashboard to show an organized view of the information. The statistics, measures and checks can set an email alert and are the following:

  • Total processor utilization %

  • Memory available bytes

  • Memory page faults per second

  • Memory cache faults per second

  • Free disk space %

  • Processor temperature

  • Disk I/O usage

Each of these metrics have the possibility to trigger alerts when they reach some specific value. In addition it is possible to configure the level of these notifications in order to alert only to the responsible team, leader or employee.

Furthermore the sampling period of the parameters is configurable given the option to set different periods for each metric.

Disaster recovery

Recovery plan

Due to the platform design where all the information is constantly being copied or replicated, no loss of information is expected. However, in the exceptional case of force majeure that could detect any loss of information, the following procedure will be implemented in relation to the loss of information of players:

Procedure for action in case of loss of information concerning the obligations to players:

In case of a disruptive event (natural disaster, war or civil unrest), the following procedures shall apply:

  1. The Support Leader, from now SL will be informed immediately

  2. Exporter and Projectrak uses a Cloud base infrastructure in Google Cloud Platform (Belgium), all the problems that can be caused by a disruptive event are covered by the platform.

In case of loss of information, the following procedures shall apply:

  1. Shall be informed to Atlassian and the clients as soon as possible, making an estimate of the impact and the actions taken.

  2. DEISER will proceed to inform the users who have been affected by these losses to:

  • Evaluate data loss. (Type of lost data, number of users affected, evaluation of total time loss of information).

  • Define possible resolutions.

  • Inform them of their right to complain.

  • Depending on the scope of the disaster and its impact on the system, the problem will be detected by the monitoring system.

  • Physical threats (Natural: fire, flood) can be detected by equipment located in the data center or security officials responsible for the availability of the building.

  • Logical threats will be detected by our monitoring system. Each component of the infrastructure is constantly monitored: communications, storage, safety devices, servers and others.

  • The operators responsible for security or monitoring the system detect a problem and do a quick analysis to determine the potential impact on availability.

    • Facilities.

    • Hardware.

    • Software.

    • Availability.

    • Information.

    • They will contact the responsible for Servers, Storage & Backup and networks.

    • They ensure the resumption of services in case of unavailability, whenever possible measures are implemented.

Contingency plan phases

Detection of the problem that potentially require the activation of the contingency plan

  • Depending on the scope of the disaster and its impact on the system, the problem will be detected by the monitoring system.

  • Physical threats (Natural: fire, flood) can be detected by equipment located in the data center or security officials responsible for the availability of the building.

  • Logical threats will be detected by our monitoring system. Each component of the infrastructure is constantly monitored: communications, storage, safety devices, servers and others.

Analysis of the problem

  • The operators responsible for security or monitoring the system detect a problem and do a quick analysis to determine the potential impact on availability.

    • Facilities.

    • Hardware.

    • Software.

    • Availability.

    • Information.

  • They will contact the responsible for Servers, Storage & Backup and networks.

  • They ensure the resumption of services in case of unavailability, whenever possible measures are implemented.

Notification to everybody involved about the necessity to start the recovery process

If after analysis of the incident it is determined to activate the backup environment, the fact is urgently communicated to the affected clients and Regulator, indicating a loss assessment and the plan of measures to be applied as well as an estimate of the impact and estimated recovery time.

Business continuity plan testing and improvement

Checklist

Determine if the plan is current, adequate backup systems are up to date, correct telephone numbers are available, emergency forms are available, and copies of the plan and any supplemental documentation are present and accessible to all involved people.

Theoretical simulation of different situations in order to detect issues on the BC plan.

These tests simulate possible scenarios and apply the BC plan, theoretically following the steps. Inconsistences in the documentation and missing steps are detected, and the procedures are updated.

  • Period execution: Once per year.

Testing of automated parts of the disaster recovery process

These tests cover scripts, software and tools used in the recovery process. Their focus is to find as many defects (things that are wrong with the automated process of the plan) as possible. They are useful to detect possible changes of behavior.

  • Period execution: during a release process.

Review BC plan after each disaster recovery.

After a disaster, when the situation is recovered, it is necessary review all the process.

  • Period execution: in case of disaster.