Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / Incident Investigations

by Greg Hutchins Leave a Comment

Incident Investigations

Incident Investigations

Guest Post by Bill Pomfret (first posted on CERM ® RISK INSIGHTS – reposted here with permission)

Investigations of industrial accidents have found that a large number occurred during an interruption of production while an operator was trying to maintain or restart production. In each case the dangerous situation was created by a desire to save time and ease operations. In each case, the company’s safety rules were violated.

The best and most redundant safety layers can be defeated by poor or conflicting management practices. Numerous examples have been documented in the chemical industry. One accident in a polymer processing plant occurred after operations bypassed all alarms and interlocks to increase production by 5%. In another, interlocks and alarms failed—at a normal rate—but this was not known because management had decided to eliminate regular maintenance checks of the safety instrumentation.

James Reason [Ref. 1] has described how organizational accidents happen when multiple safety layers fail. Shows the design intent of multiple layers. If all the layers are effective (i.e., solid, and strong), a failure will not propagate through them. However, the layers are not solid. They’re more like Swiss cheese. The holes are caused by flaws due to management, engineering, operations, maintenance, and other errors. Not only are there holes in each layer, but the holes are also constantly moving, growing, and shrinking, as well as appearing and disappearing. It’s now easy to visualize how, if the holes line up properly (Figure 1), a failure can easily propagate through all of them.

Fig 1 shows how the generic Swiss cheese model can be adapted 

safety layers, some of which are prevention layers, others which are mitigation layers. The basic concept is simple: “don’t put all your eggs in one basket.” Some refer to this as “defense in depth.”

Risk is a function of the probability (or frequency, or likelihood) of an event and its severity (or consequences). Multiple safety layers in any facility are designed to reduce one or the other. Prevention layers are implemented to reduce the probability of a hazardous event from ever occurring. Mitigation layers are implemented to reduce the consequences once the event has already happened. The following discussions on prevention and mitigation layers are examples only. The listing is not intended to be viewed as all possible layers that may be implemented or should be used in any one facility. [Ref. 2] Prevention Layers

Prevention layers are implemented to reduce the probability or likelihood of a hazardous event from ever occurring.

Figure 2, above often referred to as “the onion diagram,” appears in several different formats in most safety documents. It shows how there are various safety layers, some of which are prevention layers, others which are mitigation layers. The basic concept is simple: “don’t put all your eggs in one basket.” Some refer to this as “defense in depth.”

People have been, and will continue to be, directly responsible for some accidents. Some in the Process industry, have done an excellent job documenting such case histories, however as a professional safety auditor, who reviewed hundreds of Investigations, and found that 80% never found the underlying causes. Hopefully, the rest of the industry will learn from these examples and not repeat them. However, I believe “Accidents are not due to a lack of knowledge, but faillure to use the knowledge we already have.” Unfortunately, history has shown that many of the accidents recur.

For example, there have been cases where the operators saw the alarm, knew what it meant, and still took no action. Either the alarm was considered a nuisance alarm (“Oh we see that all the time”—this was one of many problems at Bhopal) or they waited to see if anything else would happen (sometimes with catastrophic results). The following are 4 principles that need to be remembered.

  1. He who ignores the past is condemned to repeat it.
  2. Success in preventing a loss is in anticipating the future.
  3. You are not in control if a loss must occur before you measure it
  4. The opportunity for loss is great, but so is the opportunity to prevent that loss

When things do go wrong, they tend to cascade and escalate. The author recalls one plant where there was a shutdown and the DCS printed out 7,000 alarm messages! Overwhelming the operators with this much information is obviously detrimental. Too much information is not a good thing.

When faced with life threatening situations requiring decisions within one minute, people tend to make the wrong decisions 99% of the time. This was determined from actual studies done by the military. In other words, during emergencies, people are about the worst thing to rely on, no matter how well trained they may be.

BIO:

Dr Bill Pomfret; MSc; FIOSH; RSP. FRSH;
Founder & President.
Safety Projects International Inc, &
Dr. Bill Pomfret & Associates.
26 Drysdale Street, Kanata, Ontario.K2K 3L3.
www.spi5star.com      pomfretb@spi5star.com
Tel 613-2549233

Filed Under: Articles, CERM® Risk Insights, on Risk & Safety

About Greg Hutchins

Greg Hutchins PE CERM is the evangelist of Future of Quality: Risk®. He has been involved in quality since 1985 when he set up the first quality program in North America based on Mil Q 9858 for the natural gas industry. Mil Q became ISO 9001 in 1987

He is the author of more than 30 books. ISO 31000: ERM is the best-selling and highest-rated ISO risk book on Amazon (4.8 stars). Value Added Auditing (4th edition) is the first ISO risk-based auditing book.

« Stop Being Blind to Risk
Reliability Centered Maintenance -Reengineered (RCM-R®) »

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

CERM® Risk Insights series Article by Greg Hutchins, Editor and noted guest authors

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • Today’s Gremlin – It’ll never work here
  • How a Mission Statement Drives Behavioral Change in Organizations
  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success
  • 3 Types of MTBF Stories

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy