What is a IT Disaster Recovery Runbook?
A DR runbook is a working document, unique to every organization, which outlines the necessary steps to recover from a disaster or service interruption. It includes primary and escalation contacts along with infrastructure and process instructions for team members to follow.
It is an essential component of your company’s business continuity plan.
What is the Difference Between Business Continuity and Disaster Recovery?
While Business Continuity and Disaster Recovery planning are sometimes used synonymously, they are different. Business Continuity (BC) refers to the ability of an entire organization to continue critical functions and processes in the event of a disaster. Disaster Recovery (DR) is a documented and tested process to not only restore systems to production, but also to return the business’ IT functions to their pre-disaster state. As such, DR is an essential component of your company’s business continuity plan.
When should I create a DR Runbook?
Well, now. Before a disaster strikes, every organization should have documented processes to maintain their IT environment.
The COVID-19 crisis made many businesses painfully aware of their readiness to react to public disasters, and many businesses hit with ransomware have learned the necessity of testing multiple backups and setting processes and timelines for failover and failback.
At the very least, you should create and/or update your DR Runbook when you:
Our Top 10 Runbook Requirements
At the outset, your DR runbook should provide your organization a clear plan of action for recovering from a disaster.
While every organization is different the DR Runbook should include:
Make sure to include screenshots, diagrams, graphs, and/or tables to support the written documentation.
While it’s great to have the big picture spelled out with your governing IT methodology and marketing strategy, your DR Runbook should read more like a checklist than a white paper.
At its core, your DR Runbook is a list of tasks directed toward achieving a very specific goal. Each task should be clear, discrete and executable. Tasks that don’t serve that goal should be removed.
As much as possible, the runbook should be written for an end user instead of an IT specialist. Don’t assume everyone knows the right directories, scripts, or servers where certain functions live. Spell out each task as one bullet point or line.
That said, sometimes a sentence or two of context next to the task can also be helpful.
You need to ensure your DR Runbook is easily accessible. At the same time, it can’t be so secure that an end user can’t find it when it is needed. Prepare both soft and hard copies of your runbook, and give the hard copy a prominent position near the equipment that it serves.
Soft-copies of your DR Runbook should be accessible through a password-protected cloud portal. Make sure to provide links to it in the opening pages of the portal so it is cellphone accessible.
Review your team authorizations to make sure your team has the right permission to access which runbooks.
Note: for cyber security reasons, never store network credentials in a runbook. That information should be tied to a contact responsible for the device.
As much as possible, include searchable metadata to your runbook sections for reference. For example, each section of your runbook might have:
– Purpose or description (for an incident, scheduled maintenance, development)
– Creation date and time
– Latest update date and time
– Major systems referenced
Another good practice is to identify sections based on the type of alert so that your least-technical team member is able to find the correct tasks without having to review the whole runbook.
If your DR Runbook is out-of-date, your team members might experience the following during disaster recovery:
To make sure your DR Runbook is accurate, review it regularly. This sounds obvious but as organizations struggle to achieve a “zero-downtime” environment, they tend to focus all of their available time confirming that updates or added services function correctly, and even backup correctly, without never considering the impact on their DR processes.
On a consistent rotation, test your DR Runbook with an end user and allow them to give feedback on its accuracy.
Keep track of when a runbook was last updated, if possible, when it was last run.
Remember, your DR Runbook should not become the gathering point for a variety of processes. It should remain focused on its single goal. Don’t be afraid to split runbooks and then insert references if required.
Make sure there is one runbook, and only one, for each process. Instead of adding post-its or written edits on hard copies, update the official DR Runbook and reprint with the correct “Last update.” Discard the older version.
As needed, reference and link to other runbooks for certain processes. However, if there are multiple runbooks for a given scenario, you’ll want to combine them into one and make sure the other is archived.
All software and team roles change over time, so your runbook must change too. Otherwise they will become neither accurate nor actionable. Ways to encourage adaptability include:
Your runbook should not only include all of your IT resources, but it should also consider a variety of disaster scenarios. Many who were prepared for an internal technical crisis, like a disk failure, were caught flatfooted by the COVID-19 pandemic that forced many to work remotely.
Disasters could be man-made (ransomware or other cybercrime, intentional or unintentional sabotage by employee), as well as natural or environmental (water main break, power outage, earthquake, etc).
7. Compliant and Audit Ready
Your organization may need to be compliant with one or more industry standards, such as NIST, HIPPA, FINRA, PCI, CCPA, CSET/DHS or others. Make sure you refer to the latest documentation releases for specifics.
AllConnected can help guide your organization’s DR strategy toward compliance.
Several industry standards also have independent auditing requirements. Every organization needs to maintain adequate records, including lists of contacts, hardware and software vendors, dates reflecting upgrades and changing business practices.
Make sure to store copies of your DR Runbook on and offsite, and are made available to those who require them.
– examine records, billings, and contracts to verify that your organization is legally compliant.
– test your procedures to determine their effectiveness and make sure they meet your company objectives.
Auditing your Business Continuity and Disaster Recovery plans through a third-party also provides a validation to stakeholders that your documentation is complete and accurate.
8. Able to Delegate
Just as your DR Runbook should be comprehensive enough to consider multiple disaster scenarios, it should be clear and accessible so that subordinates can execute the plan in the event that your primary resources are not available.
Consider both escalation levels (for more difficult or wide-spread crises, as well as delegation levels for key team members). These levels should be reflected in your department training as well.
Your DR Runbook is of little use if it is not tested. AllConnected’s “validated recovery” service for DR enables your organization to not only confirm the integrity of backups, but run comprehensive tests to ensure that your entire IT environment can failover to a secondary site if your production environment is compromised.
10. Integrated into Your Corporate Culture
Runbooks should not be a one-time report that grows stale. It should be an integral part of your business processes.
Your DR Runbook should be revisited at every major corporate acquisition, at every new product launch, and at every system improvement procedure.
Once your DR Runbook has passed the above requirements, you can have peace of mind knowing that it has been vetted, approved, and delivers on the promise of disaster recovery.
If you want to learn more, contact us through the form below to receive a free customized template for IT Disaster Recovery, your own DR Runbook.