An important part of IT service delivery is IT service continuity management, also known as disaster recovery. (One place to look for guidance on IT service continuity management is ITIL’s Service Design book.) Service continuity is all about ensuring you have working plans to restore service in the event of a significant service interruption: a server rack catching on fire, an earthquake, a flood, or even a hard-to-diagnose long-lasting firewall issue that disrupts the entire server room.
A disaster recovery plan should list the steps IT staff should take to restore service. In a disaster people may be panicking, and that’s not the best time to have to think on your feet. Having a written plan also helps in case key staff are not available to restore service, and can address complex dependencies: DNS must be restored before DHCP, for example.
In building your disaster recovery plan, where have you put your IT service management (ITSM) tool? This is the tool where you track service requests, incidents, and many other things, notably changes.
In the event of a disaster, many IT staff are going to be making lots of changes. They may stay up all night or work for several days, exhausted and unable to apply their usual level of thoughtfulness to solving complex problems. Some of the changes they make may be irreversible: deleting old data to free up space on a server, or upgrading a network switch on the fly to get new functionality they feel is necessary to restore service, or changing the name of the tape backup server.
Somewhere all these changes need to be tracked. They need to be written down.
I’m not saying you need to follow a formal change management process for everything, but there should be version control on code changes and a paper trail for every server reconfiguration. Heck, even server restarts should be logged: knowing when a server rebooted can explain why other servers stopped working.
Or, if you have a more advanced capability in place such as IT service and asset management, you may need to refer to your “map” of service components to understand dependencies as part of restoring service. You may need to update that map, too.
So in your disaster recovery plan, make sure you prioritize your ITSM tool. Hopefully it’s one of the first tools you restore, and you have a simple process for at least logging changes so people can pull a report later of what happened.