Help - Search - Members - Calendar
Full Version: Scheduled Recovery Testing
The CD Forum > After Hours > Rest of the World
singchung
I worked in a company where IT aduiting is very stringent. There is a requirement for scheduled recovery testing. I do find testng the recovery of data not much of an issue but testing recovery of system partition (referring to the OS + installed software) will be an issue. The requirement specifys that recovery systems must be fully functional and tested to work just like its original server. Recovery testing of course cannot be recovery of system partitions (using Ghost or ImageX in my case) onto the live server where the image was originally backup from. So it has to recover to testing servers. Since there are 20 live servers, in order to recover to servers of the same HAL, I presume at least 10 testing servers will be needed (as some servers have similar hardware configuration). Moreover testing server if recovered to be exact replica of an existing will cause conflict (ip address, domain membership) and if the recovered testing server is a DC it will not be functioning as DC as it is being recovered back to some months in time. I just wonder how best to implement such testing of recovery to full functionality while not affecting the operation of existing servers. When I asked this among peers, I got answers that testing recovery not done. Any ideas.
Nuno Brito
A method for crash recovery that I've seen at work involves virtual machines.. character52.gif

Not on the scale of 10 or 20 machines but a friend of mine is responsible for a server machine that powers an application that is used around the other offices spread in the country.

This machine can't get offline otherwise people won't simply work since there would be no way to add more data.

At the time there was a lot of discussion about creating an array of machines all working together like we hear in bigger work groups but this added so much confusion along with complete lack of experience for them to administer such a "beast" that another solution needed to be found.

The solution was fairly simply when compared to running multiple machines altogether.

Running VirtualBox on a quad-core 8Gb machine and fiber link was more than enough horse power to handle by itself all the external data requests.

Timed backups of the virtual disk image ensure it is not outdated and in case of hardware malfunction, another machine that runs the emulated environment under the same conditions with the latest image snapshot takes it's place.

VirtualBox replicates the exact same hardware conditions, MAC / IP address, etc and windows OS under an AD network doesn't complain about SID or anything alike as long as we don't try to run two of these images at the same time. (obviously). This emulated server works at a fairly good speed saving costs to buy and support more than one machine for this task.

-----

I was very interested in learning more about this method of working and bothered their team with a lot of questions, it's not fully automatic but I think this feature could also be added so that a secondary machine could assume the primary position emulating the server machine if the other machine would be detected as offline for some reason, but there was no need for such automation in this case and they didn't went further along this road.

smile.gif
singchung
So there is no established methodology available to fit the auditing requirement of "scheduled recovery testing to fully functional server onto test hardware"?
Nuno Brito
Guess most people prefer to use their own "established methodology" because not all shoes fit the same foot.


To make something properly "established" requires documentation written by your staff to suit your company requirements (or audit inspection in this case).

You should write the documentation yourself for several reasons.

- Regional Language differences (English is sometimes hard to understand and some companies might wish to see everything on their own language)
- Different network environments / equipments with "special" features
- Write the appropriate list of steps to help your staff members replicate the same results
- Predict overall impact of offline situation and data corruption
- Fill the check-list of requirements for auditors
- etc, etc..


The method I've described above is very simple to document and based on free tools which is a plus on the place where I work.

I've seen (and tested) a few attempts from big sized companies to create a standard way to handle this sort of events but beware that no situation should be seen as "standard" and you end up getting your hands "dirty" anyway when making things work the way you need.

Read here for a joint MS / Acronis product:
http://download.microsoft.com/download/c/7...8.0%20FINAL.doc

There may be many others but I stumbled on this one some time ago.

-----

If you don't have enough time to properly test, implement and document all the procedures required for the audit then I strongly suggest to hire someone capable of doing this for you.


smile.gif
Ed_P
QUOTE (singchung @ Jun 5 2008, 05:17 AM) *
I worked in a company where IT aduiting is very stringent. There is a requirement for scheduled recovery testing. I do find testng the recovery of data not much of an issue but testing recovery of system partition (referring to the OS + installed software) will be an issue. The requirement specifys that recovery systems must be fully functional and tested to work just like its original server.

Sounds like a bank. And the concept is really a good business practice, you don't want to have a disaster only to find your recovery procedure doesn't work. Already the requirement has exposed issues, matching hardware requirements.

Rather than test inhouse, find a recovery site and test there. Inhouse testing has limited value in a disaster that takes out the room or building the servers are in anyways. The disaster recovery plan needs to take into account a new location. And a professional disaster recovery site will supply you with exactly the equipment you need.
singchung
Not bank, just a trading house, it is because that the company is public listed, so is subjected to auditing policies.
Ed_P
QUOTE (singchung @ Jun 11 2008, 09:15 PM) *
Not bank, just a trading house,

Same difference, imo.


Nuno Brito's suggestion to set up Virtual Machines for the systems you need is a good one. A VM backup should be able to run anywhere you can install the Virtual Machine sw and then load your backups to it.

Please keep us posted as to which direction you go in. And how the audit goes.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.