Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / Concurrent Failure Analysis and Prevention

by André-Michel Ferrari Leave a Comment

Concurrent Failure Analysis and Prevention

Concurrent Failure Analysis and Prevention

Concurrent or simultaneous failures can happen with redundant or spared systems. This means that both spared equipment can fail at the same time leaving the operator with no production output. For example, we have two alternating pumps operating in a parallel configuration. Each one acts as a spare and at any one time can take over if the other one fails. This article is based on a question I was asked during a recent industry presentation. I thought the example was interesting and informative enough to share with the Maintenance and Reliability community.

The Question and Problem Statement

The conference attendee’s question was as follows:

I have two pumps working in parallel. They are set up as standby pumps. This means that I need only one pump to run at any one time for my process to work. The standby pump automatically takes over if the running pump fails. If both pumps are down at the same time, my process is down, and I lose money.

How should I organize pump running time so that they don’t fail at the same time? I fear that if I run them both for equal periods of time they will fail concurrently. What time intervals should I run each one? Should those be different?

Using a RAM Model to gauge the Probability of Concurrent Failures

Diagram 1 below illustrates the Reliability, Availability and Maintainability (RAM) model for the standby pump system. It is performed in the Reliability Block Diagram software Raptor 7.0™. In order to be functioning, the system requires at least one pump to be operational at any one time. This is also known as the “k out of n” configuration where k=1 and n=2.

Diagram 1 – 2 Standby Pumps configuration layout with a k=1 out of n=2 set up

The life characteristics of each pump are setup as follows:

  • Failure distribution that will govern unplanned failures of each pump: 2 parameter Weibull with Beta (Shape Parameter) = 1.2 and Eta (Scale Parameter) = 35,000 hours
  • Repair distribution that will govern the restoration tasks once the pump is failed: Triangular distribution with the following repair time – Minimum = 4 hours, Mean = 8 hours, and Maximum = 24 hours
  • The pump is a repairable system. It is NOT “as good as new” when repaired. It is 50% rejuvenated or renewed after each repair. So, it will degrade increasingly over time.
  • Neither production output nor cost are used in this example as it is not required.
  • This is essentially a “run to failure model”. There is no preventive maintenance task included.

When we run the model for 1,000 lifecycles for a mission time of 175,200 hours or 20 years, we get the following results below.

  • Mean System Reliability is 99.70%. This means that the probability of concurrent pumps failures over 20 years is 0.30%.
  • Mean Number of system failures is 0.003. That is when 2 pumps are down at the same time (concurrent failures).

In essence even without maintenance there is a “very low” chance of concurrent failures.

Sensitivity Test based on Pump Characteristic Life Value (Eta)

If we reduce the characteristic life of the pump, we essentially increase the failure frequency. In other words, we are trying to see how standby pump arrangements with higher failure rates decrease the reliability of the system. Successive models are run, and the results illustrated in Graph 1 below.

Graph 1 – Sensitivity Test for system failures based on Pump Characteristic Life variation.

Those results show that system failures over twenty years stay low even if the individual pump reliability decreases. Therefore, the conference attendee can be reassured that concurrent failures in their system are very unlikely to happen. Thought the risk is never zero.

Improving Reliability

Sometimes a reliability value might not be enough for an operator. Let’s say the conference attendee was operating an expensive satellite in space where 99.70% system reliability over 20 years was inadequate. In this case, especially when it comes to scenarios like satellite or airline operations, redundancy is a reliability improvement option. In the above case, we use the first example (i.e. Beta = 1.2 and Eta = 35,000 hours). And add a 3rd pump as illustrated in Diagram 2 below. The system needs only one pump to run at any one time.

Diagram 2 – 3 Standby Pumps configuration layout with a k=1 out of n=3 set up


With this configuration, System Reliability is 100% and we have zero system failures over the 20 years of operation.

In a normal industrial setting, adding redundancy can be very expensive. And not economically viable. In this case, we turn to maintenance strategies to better predict and avoid pump failures. Those include advanced CBM strategies and thorough analysis of the potential failure modes and their life characteristics.  However, every action taken needs to be justified financially as there is a cost attached to maintenance. As well as the risk of “maintenance induced failures”.

In summary, a RAM model is an excellent tool to answer the conference attendee’s question. And the correct answer will largely depend on the context and environment they are operating in.

Filed Under: Articles, on Maintenance Reliability, The Reliability Mindset

About André-Michel Ferrari

André-Michel Ferrari is a Reliability Engineer who specializes in Reliability Analytics and Modeling which are fundamental to improving asset performance and output in industrial operations.

André-Michel has approximately 30 years of industrial experience mainly in Reliability Engineering, Maintenance Engineering, and Quality Systems Implementation. His experience includes world-class companies in the Brewing, Semiconductor, and Oil & Gas industries in Africa, Europe and North America.

« The Many Ways of Data Analysis
Opportunities for Maintenance and Operations: Event Rates »

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

The Reliability Mindset logo Photo of André-Michel FerrariArticles by André-Michel Ferrari
in the The Reliability Mindset: Practical Applications in Industry article series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Posts

  • Leadership Values in Maintenance and Operations
  • Today’s Gremlin – It’ll never work here
  • How a Mission Statement Drives Behavioral Change in Organizations
  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy