Strategy & Management

Disaster Recovery For Complex Automation

Software Assists in Recovery of Complex Automation Systems

15.04.2010 -

Big Bucks - When disasters happen, a typical refinery may lose as much as €35,000 per hour of lost production. As plants become more complex and reliance upon automation becomes greater, failures in automation may cause as much production impact as a physical disaster. Furthermore, this evolving complexity of automation makes recovering from disasters a moving target. PAS has combined vendor-neutral consultancy with new software providing comprehensive disaster recovery for automation systems.

Modern plant automation comprises multiple, disparate systems, such as distributed control systems (DCS); programmable logic controllers (PLCs); safety instrumented system (SIS); and historians, all of which have been independently designed to accomplish a specific mission but are integrated together to enable seamless operations. These systems are in a constant state of change as they are platforms for continuous process improvement. As such, the value of the engineering time and effort expended in configuring and managing changes often exceeds the physical cost of the system. Capture of this embedded knowledge is crucial to a timely restart after a disaster.
Disaster recovery is often complicated by the fact that these individual systems may be the responsibility of different organizations within the plant. For example, backup schedules and procedures vary among the organizations, making it difficult to ensure synchronized restoration images for all of the individual systems in the event of a disaster. Since they are highly integrated and interoperable, restoring systems individually from non-synchronized images may result in restart problems which can lead to costly delays.

Real Life Example

In response to a hurricane, one major North American petrochemical company's on-site team decided to power down the DCS system to minimize damage in case of flooding. Unknown to the team, the DCS backup mechanism had been disabled for the past year. During plant restart, they discovered that all changes to the DCS over the past year had been lost and therefore site personnel spent hundreds of hours researching management of change (MOC) documentation to reconstruct the DCS database. This resulted in more than a week of lost plant production.

Planning For Disaster Recovery

If you agree that Automation Disaster Recovery is a valid business concern, how then do you plan for it? While automation disaster recovery is necessary for a full range of scenarios, including the natural disaster mentioned above, the most likely ones are quite mundane. Leaky roofs, lightning strikes, physical accidents and human errors are all much more likely to occur and affect automation. Proper planning for the worst case scenario will handle the lesser events.
One might think that the principles involved in disaster planning for IT assets (servers, workstations, mainframes) would fully address DCSs and similar automation assets. This is not the case. While major portions of a DCS may appear to be based on commercial computer technology, the underlying structure and configurations have significant differences and require special attention. Modern industrial control systems, while mostly operating on Windows, still contain proprietary configurations that are vastly different from one brand to the next. As such, these systems require specialized methods and interfaces for communication, data backup, and system recovery in real time. Of course, the older generation of automation systems presents yet a greater challenge as many of them run on proprietary and closed operating systems.
In most companies, IT is responsible for disaster recovery and backups of conventional computer assets. However, due to its complexity and specialized nature, automation asset disaster recovery is usually the domain of the control engineers. Therefore, there is usually an accountability line differentiating IT and control assets, sometimes drawn at the interface between the process control network and the business network. It is important to identify this line for disaster planning.

DCS: The Devil Is In the Details

At first glance, all modern control systems, regardless of their manufacturer, share many attributes, such as similarity of system architecture and operating system. It is in the details that differences become apparent and significant. A detailed disaster recovery plan therefore requires comprehensive knowledge of the specific control systems involved. While the steps in creating a disaster recovery plan for automation assets are straightforward and can be adapted to any kind of control system, the skills required to develop a proper plan are specific.

Disaster Recovery Plan

PAS, an industrial automation software company in Houston, TX, has developed a comprehensive five-step methodology that provides a robust roadmap for planning and implementing a disaster recovery system:

  1. Site Assessment and Automation Asset Inventory
  2. Design the Data Capture and Recovery Process
  3. Configure and Test the Data Capture and Recovery Process 
  4. Implement the Data Capture and Recovery Process
  5. Monitor, Maintain and Control the System

Plan Implementation

PAS' experienced professionals help plants around the world to implement disaster recovery solutions. First, they take a complete survey of all automation assets within the plant and create a site-specific recovery plan. This plan includes the details of the backup and restoration processes and a comprehensive restoration procedure for all of a plant's automation systems. Data collection and archiving is then scheduled for each system based upon its individual need using the company's Integrity Software.
Essential measures are also put in place as part of the recovery plan to protect and validate recovery data as systems evolve. A disaster recovery change management process is established, together with recurrent validations of the procedure to ensure successful execution on demand and as needed. The Integrity Software stores backup images both onsite and at secure remote facilities, and performs regular health checks on the backup images to ensure their viability. This enables plants to be restarted as quickly as possible and in the same state as before the emergency.
It can be a complex and challenging process to implement a comprehensive automation disaster recovery plan and to ensure that all the necessary data is captured and up to date. Fortunately, disasters on the scale of hurricanes and floods are rare, but mistakes and mishaps are common. So be prepared.

Contact

PAS Inc.

6055 Space Center Blvd
77062 Houston,TX
Germany

+1 281 2040885
+1 281 2866767