A method for crash recovery that I've seen at work involves virtual machines..
Not on the scale of 10 or 20 machines but a friend of mine is responsible for a server machine that powers an application that is used around the other offices spread in the country.
This machine can't get offline otherwise people won't simply work since there would be no way to add more data.
At the time there was a lot of discussion about creating an array of machines all working together like we hear in bigger work groups but this added so much confusion along with complete lack of experience for them to administer such a "beast" that another solution needed to be found.
The solution was fairly simply when compared to running multiple machines altogether.
Running VirtualBox on a quad-core 8Gb machine and fiber link was more than enough horse power to handle by itself all the external data requests.
Timed backups of the virtual disk image ensure it is not outdated and in case of hardware malfunction, another machine that runs the emulated environment under the same conditions with the latest image snapshot takes it's place.
VirtualBox replicates the exact same hardware conditions, MAC / IP address, etc and windows OS under an AD network doesn't complain about SID or anything alike as long as we don't try to run two of these images at the same time. (obviously). This emulated server works at a fairly good speed saving costs to buy and support more than one machine for this task.
-----
I was very interested in learning more about this method of working and bothered their team with a lot of questions, it's not fully automatic but I think this feature could also be added so that a secondary machine could assume the primary position emulating the server machine if the other machine would be detected as offline for some reason, but there was no need for such automation in this case and they didn't went further along this road.