When Superstorm Sandy slammed into New Jersey and flooded lower Manhattan with a record-breaking tidal surge, many companies suddenly discovered they had woefully underestimated how much a storm like this could hurt them. The storm was large enough, for example, that some companies had both their headquarters’ data center and their backup emergency data center knocked out.
This “Frankenstorm” has exposed the hidden costs of disaster recovery and disaster planning — costs companies either didn’t think through in their planning, found too difficult to quantify, or dismissed as being intangible. But now, some of these intangible or difficult-to-quantify costs are very real and very large.
In the aftermath of the storm, Evolve IP has been highly focused on servicing customers that have been impacted. We are also talking to non-customers, trying to help restore their businesses because they didn’t plan properly. Many of these non-customers have fallen victim to the hidden costs of disaster recovery.
The unexpected costs these companies are incurring can be a guide for all of us as we assess our own disaster readiness. This is the second hurricane to hit the East Coast in a year, and the insurance industry tells us that, although these disasters might or might not be getting more frequent, they are certainly getting more expensive.
The top four hidden costs that we are seeing among the companies we are working with in the aftermath of Sandy are:
- Longer than expected recovery time for the restoration of production data from traditional tape backup
- Higher than expected recovery costs
- Lost revenue and sales
- Downtime per employee per hour
Some of these hidden costs might appear obvious on the surface, but they become troubling when you drill into them and see how damaging they really are. In this post, I’ll lay out how you can most accurately uncover and predict these hidden costs, and how working with a cloud provider like Evolve IP can help to minimize them.
First, you need to consider the two most important factors of disaster planning: First, What are your target Recovery Time Objectives (RTO) on an application by application basis? Second, what are your target Recovery Point Objectives (RPO) on an application by application basis? The recovery time is simply how long it takes you to get each application back up and running and into a usable production environment.
A typical target might be to have everything back online in 24 to 48 hours. For critical applications, the window is almost always much shorter, in some cases, less than one hour before it starts to have detrimental impacts to the business or revenues.
The recovery point deals with how much of your up-to-the-moment information you can afford to lose. In other words, what is the maximum age of the data? This also must be measured by application as the requirements will be very different for a transactional database or email than for a development web server or simple file storage.
For example, if your disaster recovery depends on restoring from a backup tape, then anything that happened after your most recent tape backup will be lost. The hidden costs of disaster recovery all relate back to those two decisions and the steps you take (or fail to take) in planning to achieve those objectives.
Hidden cost #1: Longer than expected recovery time for restoring production data from traditional tape backup
What it is: Companies are suddenly discovering that it is taking them longer to recover than they had planned. This is particularly true of those who are relying on tape backup. For disaster recovery purposes, tape backup has a number of built-in delays:
- You have to get the tape from wherever it was being held off-site and to the location where it can be restored into production. This alone can cause a delay of 24 to 36 hours, depending on the location of the storage. It takes longer when the highways are closed.
- You need a tape drive and auto-loader to recover the data off of the tapes. Some companies had both their primary and emergency tape drive systems knocked out by the storm, which meant they had to find, purchase, and install a tape drive and auto-loader at a off-site location before they could begin recovering the data. Add another 24 to 36 hours.
- Tape drives can only go so fast. There is a reason tape backups are performed at night — their data transfer rates are much slower.The same is true when you try to recover from tape. What happens to your overall recovery time (RTO) if 12 hours of recovering data from tape has only restored 10 percent of your data due to the volume?
- Tapes are almost always out of date. Tape backup is a point-in-time data protection strategy, so the restore point of tape backup is never up-to-the-minute or what many would consider near real-time. Anything that happened after the last backup is lost.
Why it matters: The amount of time it takes to recover data to restore critical applications into a production is critical. Data equals business, and 99 percent of companies can’t operate without their data. Any delay in recovery ends up having a cascading effect, causing a lot of other unplanned costs. Backing-up data to any media (tape, disk, etc.) is necessary to maintain historical copies of data and is best suited for recovering files, corrupt databases, or even full machine images. But they are only copies of a particular point-in-time. Backing-up is a data protection strategy; it is only a small part of meeting business continuity objectives in the face of a disaster. In a disaster, restoring from backups is a method of last resort, not the method of first resort.
How to address it: Compare the vagaries of on-premise, tape-based recovery to the cloud alternatives, where you have multiple data centers all interconnected with a fast network connections. The restoration process in the cloud is generally measured in minutes to a couple of hours, rather than days. If the connection is fast enough, the data can be mirrored or replicated from an on-site data center to a cloud data center in near real-time, so that the restore point provides a greater level of concurrency. With Evolve IP, these restoration processes are built into our service offering as either general DR procedures, data protection options, or as continuity service offerings to meet these types or requirements. We can simplify, and in some cases, automate many of the restoration processes for companies with more stringent needs.
Take into account: The potential risk that on-premise technology is obsolete. It’s not that you can’t keep using older equipment, but if you can’t read your backup tapes because your tape unit is destroyed and you can’t find a replacement without ordering one from the manufacturer, then this delay will extend the amount of time you are out of business.
Hidden cost #2: Higher than expected recovery costs
What it is: The costs of the third-party services you rely on to get yourself back in business.
Why it matters: When disaster strikes, a lot of the essential resources you use every day are no longer available to you. Even if they are, the daily resources are in place to operate under normal course of business. Most companies do not have excess resources sitting around waiting to jump into disaster mode. As a result, you often have to rely on third parties to supplement your needs, especially as it pertains to application work for critical system. Not only are you paying employees, but you are also now paying for the time of the third-party staff and outside vendors to get back in business. Those rates aren’t cheap to start with, and you are now paying overtime and perhaps even premium rates.
Another potential cost is the hike in insurance premiums the insurance companies will impose on those companies that don’t plan ahead. Up to 30 percent of the estimated $7 billion to $20 billion in claims expected for Sandy will be for business interruption claims from companies forced to shut down because of wind damage, flooding, or power outages.
How to address it: Have a good, fully informed disaster recovery plan in place that accounts for every contingency and every potential cost. For applications that are deemed critical to the business, ensure a continuity or avoidance strategy is employed. If you have 20 applications running, it is likely the business can continue to operate with perhaps three or four of those applications still in production. Trying to protect the entire business is a big undertaking. Focus on the applications that are most critical and figure out the best way to ensure the maximum downtime does not exceed your recovery time objectives.
Once you understand your true costs and your true risks, you can make good decisions about the resources you need. Even more importantly, begin thinking less in terms of disaster recovery and more in terms of business availability. It is probably far cheaper to ensure that your business remains available than it is to try to recover it once it has been taken down for an extended period of time by a disaster. With Evolve IP, disaster recovery and business continuity is a consideration that is factored into the available service options when the customer signs with us.
Take into account: Everything is more difficult and will take longer under disaster conditions. If the disaster is widespread, then everyone else will be competing for the same scarce resources.
Hidden cost #3: Downtime per employee per hour
What is it: The money it costs you to pay employees while the business is closed. The median salary for an employee in Manhattan is $50,000 per year ($23,000 for the U.S. median). If you have a company of 500 median people sitting around waiting for you, that’s over $12,000 an hour in employee downtime, at a time when the company isn’t generating revenue.
Why it matters: Even though an organization can’t do business, the employees are still on payroll. (You can force salaried employees to use their vacation days for the time you are closed, but that’s not good for employee relations — and just imagine how popular that will make the IT department.)
How to address it: Again, work closely with others in the organization to understand the true cost of employee downtime so that it is fully accounted for in your disaster recovery planning. Cloud, again, can help you avoid this issue altogether because the right cloud provider (with redundant data centers) and comprehensive recovery options can ensure that your business infrastructure stays available no matter what is happening to your company headquarters.
Take into account: Technologies, such as virtual desktops and cloud-based call centers, enable businesses to continue as if nothing has happened, all while employees stay safely at home with their families.
Hidden cost #4: Lost revenue and sales
What it is: This one’s obvious of course, but I’ll say it anyway: The money you can’t make because you can’t be open for business.
Why it matters: The lifeblood of a business is its revenue. Every hour that you are not open for business is costing your company money.
How to address it: These are not numbers that the IT department controls, so it’s often overlooked or underestimated when it comes to IT disaster planning. Work with your CFO and your finance department to get a solid understanding of the revenue impact per head of each hour of downtime. This can help you make more informed decisions about the appropriate recovery time and the investment required to ensure that you meet that deadline.
We are getting a lot of calls from potential customers who, before the storm, had been talking with us about financial justification for moving to the cloud. Now, when the disaster is causing them to lose revenue, they realize that being in the cloud would’ve avoided this loss of business. The financial justification of the cloud almost becomes a non-issue.
Take into account: When it comes to avoiding the hidden costs of disaster recovery, it’s better to be in the cloud than on-premise during a disaster like Sandy. But it’s also important for more than just a storm like Sandy; there are disaster risks every day in fire hazards, pipes bursting, rolling blackouts in heat waves during the summer, you name it.
In the cloud, as long as you have power and Internet, you can be up and running. Essentially, your disaster recovery plan becomes your business continuity, or “disaster avoidance,” plan. You should never be down, period.
The major benefits of being in the cloud during a disaster are availability, peace of mind, and cost savings. Or at least, those should be the benefits, if you have chosen your cloud provider carefully. Many cloud providers lack the redundancy in their facilities or have poor organization and control processes around things like disaster recovery and continuity. Anyone can put together a piece of marketing material claiming that they are “highly available.” Events like Sandy separate the rookies from the pros.
As for the businesses that were down for a week or more without power, perhaps flooded, with employees stuck at home, they lost over 100,000 of dollars per hour. Had they enabled their workforce to be mobile and centralized with their infrastructure and communications in a protected, cloud-based data center, they would have been in business making money, not losing it.Categories: Cloud Computing