In this post I want to clarify the difference between High Availability and Disaster Recovery from a DBA point of view. If you current DR plan looks like Dilbert’s best you read this post.
Things we will cover:
· What is disaster recovery
· Why do we always need to plan for a disaster
· Tools we can use to make disaster recovery easier
· High availability tools SQL provides us
· Recovery facilities
What is disaster recovery?
Disaster Recovery is the process you follow to recover your environment from scratch. A disaster can be anything from a flood, fire, power outage, explosion to a human error that affects your primary data centre and forces your organisation to run on minimum resources (not fun. Been there done that… hope not to do it soon again).
Why we always need to plan for a disaster
Ask yourself: Do you want to stay in business?
If your answer is yes then you understand why a DR and HA plan is important to the organisation. A plan doesn’t only allow others to understand your thought process but helps you recover your systems faster as you know exactly what needs to be done and can relay your message faster.
Recovery from a disaster can take a while depending on the processes your organisation has in place. As the DBA you are responsible for your company’s SQL Servers and as a DBA you always have to think ahead. Here are a few things to help you in setting up a DR plan
A few things to help aid your recovery:
Talk to your business users first. Why, you may ask? They are the ones who will give you an indication of which systems are important and what they require in order to function with minimal resources. Sit with your team and visualise your tier 1 systems (for purpose of this blog we focus on tier 1 servers). By this I mean draw a picture and put it up against the walls of your cubicle.
This way you will not forget how the databases and applications fit together. You also need to include the SQL jobs (please script this out or useBiDocumenter), script out all your logins, I would recommend Idera’s SQL permissions extractor.
One of the key things you require is to keep your documentation up to date. Use a third party documentation tool like BiDocumenter by Pragmatic Works. It documents the entire SQL Server stack. If all hell breaks loose at least you will be able to recover your database structures as the documentation includes all the structures and a sample of the data. Below you see such a sample. If you cannot use this software I would recommend using Red-gate‘s virtual restore. It allows you to restore databases to your local machine using minimal space (I think it was 10mb), the database is readable. I used it a few times when our production restores to our support server were running late. It allowed me to buy some time until the restores completed.
Always test your backups
Test your backups regularly, whether they are disk or tape backups. This ensures that you have a valid backup set. Remember, a DBA is only as good as his/her last backup.
This will be your first step when your start the recovery process.
Most organisations test the DR plan twice a year (at least the clients I worked for). This allows you to always improve the plan and to ensure that the plan fullfills all audit requirements set by your organisation’s risk management division.
From my own experience: Back in 2009 our data centre got flooded and almost all our servers were taken out. Luckily my colleague and I did not panic and recovered our tier 1 database servers in a few hours. We had tested our DR strategy twice a year and for a few years we had no worries until the disaster arrived.
Over the last 13 years I have performed and designed a few Data Recovery (DR) plans as well as recover systems based on those same DR plans. High availability is your top priority. As a DBA you are responsible for the design and its implementation. Many organisations use a combination of hardware and software technologies to ensure that they meet their SLA’s.
What is high availability?
There are still some people who believe that HA and DR are the same thing. They are not!
In a nutshell, High Availability is the mechanism used via various designs and technologies that ensures that your applications are up when your server(s) malfunctions. It also helps IT departments to better adhere to their SLA’s.
There are lots of ways to ensure the 99.9% uptime that most SLA’s demand.
Hardware vendors like HP provide SAN replication software allowing you to copy the .ldf and .mdf files to a certain point in time. A couple of years ago we used the software to successfully restore one of our production systems to our DR site. It took 15 minutes to restore a system that usually took me 8 hours to restore without using the software (Happy dance).
From a server side you can also look at Hyper-V, VMware, Windows failover clustering or a combination of them. This you will need to discuss with the system stakeholders and figure out which option suits you best. The best way to find out which option that is, is to play around with the various technologies and find your flavour.
SQL Server provides us with some built-in HA features
Always ON High Availability groups: leverages off WSFC (Windows Failover Clustering) and allows you to add your precious cargo (databases) into HA groups and fails over automatically (faster than traditional cluster). Add an AG listener and your users won’t even know which server they are using.
Database Mirroring: is a solution for increasing the availability of a SQL Server database. Mirroring is used on a per-database basis and works only with databases that use the full recovery model.
LogShipping: log shipping operates at a database level. You can use log shipping to maintain one or more warm standby databases (referred to as secondary databases) for a single production database that is referred to as the primary database.
Replication: is a set of technologies for copying and distributing data and database objects from one database to another and then synchronizing between databases to maintain consistency. Using replication, you can distribute data to different locations and to remote or mobile users over local and wide area networks, dial-up connections, wireless connections, and the Internet.
For more information on SQL Server HA options please get in touch with us.
So what is the big difference?
Disaster recovery is the process we follow when having to recover systems from scratch in the event of a catastrophe. High availability can best be described as the business continuity process we follow when (one of) our server(s) goes down and we need to failover automatically with minimum impact to the business users.