Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / Reliability Techniques For Analyzing And Improving Fault Tolerance

by Bryan Christiansen Leave a Comment

Reliability Techniques For Analyzing And Improving Fault Tolerance

Reliability Techniques For Analyzing And Improving Fault Tolerance

When designing equipment and processes, engineers leave a safety margin that ensures equipment remains functional when a fault or defect is affecting it partially or wholly. Minor defects affecting production assets should not cause immediate breakdowns. A fault-tolerant system remains operational for predetermined intervals before undertaking corrective measures. Faults affecting the operation of different systems emanate from more than a single source.

A fault-tolerant system is beneficial in many ways. Companies enjoy predictable operations and can reorganize processes depending on the type and intensity of the fault. The main benefits of fault-tolerant systems are:

  • They are highly reliable: They remain operational despite the defects or faults.
  • The systems are safer: A fault-tolerant system is capable of averting workplace accidents. There is continuity of processes with minimal risks to the asset and operators. Advanced systems de-energize, stopping hazardous part movements.
  • Lower cost of ownership: The system has sufficient protective measures to limit the effects of faults.

Fault-tolerant systems have unique characteristics defining them. These characteristics determine how systems detect, display, and compensate for underlying faults. Systems with fault-tolerant systems should have:

  • Defect detection and display capabilities
  • Fault diagnosis and control mechanisms
  • Fault compensation measures (Is there a redundant system? Does its peak operation level drop?)

System designers and engineers must comprehensively evaluate systems and equipment to estimate the most probable types of faults, impacts, and the most effective rectification measures.

Failure mode and effect analysis (FMEA)

FMEA is a reliable fault tolerance analysis technique that enables engineering designers to:

  • Identify how faults and failures occur, the causes of asset failures, and their effects on reliability
  • Evaluate the intensity of asset failures
  • Prioritize failure correction measures

Engineers rely on FMEA to streamline system and equipment design. They examine systems and the flow of processes to predict what could go wrong during the asset’s life cycle. They estimate the probability of different failures occurring and the severity of each. Insights from these predictions enable the engineers to identify potential sources of faults and rectify them at the design stage. Improving system design reduces the probability of some defects occurring when an asset is operational. 

Engineers may not eliminate the possibility of faults occurring. In such situations, they devise ways to detect and compensate for defects. It could be through the design of redundant systems, fault alert mechanisms, and safety switches. They also list probable corrective measures, prioritizing each according to its effectiveness and potential to address failure modes and effects.

Fault tree analysis (FTA)

FTA is an effective technique for analyzing and improving the fault tolerance of equipment. Design engineers use it to visualize systems by mathematical or graphical representation. The process estimates the probability and frequency of failures occurring. The design engineers establish how a fault on a particular component affects the reliability of other elements in the system. In other words, they create graphical models to describe how defects and failures move from a single part to the equipment and the entire system.

Engineers rely on FTA results to build more reliable and fault-tolerant systems. They use fault tree analysis results to optimize the redundancy of assets, enhance fault containment measures and minimize the spread of minor defects to other system elements. That way, they ensure minute faults do not cause catastrophic asset failures. The technique allows engineers to focus on resolving a defect at a time. It also facilitates a comprehensive assessment of system interdependencies, creating a better understanding of the operation and reliability of mechanisms to reduce the escalation of minor defects and faults.

Markov processes 

Design engineers can leverage Markov models to predict the reliability of different systems over time. The model assumes that the future state of an asset is dependent on the current conditions and not events that occurred in the past. An existing fault or defect will impact the availability of an asset in the future. The Markov process is a powerful technique for predictive modeling, which estimates the impact of equipment faults over time. It enables the designer to identify interdependencies between events and quantify how equipment defects evolve with time. 

Markov processes represent system models as graphs or equations. In the graphical representation, design engineers define the current and the potential future states of assets. The graphic models contain arrows showing the transition between the states and the probability of faults transitioning from one state to the other. 

Boolean system theory

The theory depends on Boolean logic to analyze systems. The design engineers establish several rules around the occurrence and propagation of asset faults or defects. They optimize system reliability and redundancy by focusing on operational variables and validating interdependencies between events. It is a mathematical model that combines algebraic rules and logic gates to determine the flow of events and their effects on the escalation of asset defects. 

Concluding Remarks

Designers should develop critical equipment with fault tolerance in mind. They predict the type, frequency and impacts of diverse defects at the equipment design stage. They leverage engineering best practices to optimize designs and incorporate fault tolerance measures. Doing so enables them to improve the overall equipment effectiveness, enhance operational safety and contribute to lower maintenance costs. The designers can utilize any of the available reliability techniques to evaluate fault tolerance depending on the complexity of the systems. These techniques are either graphical, mathematical, or a combination of both. 

Filed Under: Articles, CMMS and Reliability, on Maintenance Reliability Tagged With: Failure Mode and Effects Analysis (FMEA), Fault tolerance, Fault/Success Tree Analysis (FTA/STA)

About Bryan Christiansen

Bryan Christiansen is the founder and CEO at Limble CMMS. Limble is a modern, easy to use mobile CMMS software that takes the stress and chaos out of maintenance by helping managers organize, automate, and streamline their maintenance operations. While his primary experience is in software engineering, developing Limble required him to gain a deep understanding of the maintenance industry.

« Types of Training
Subjective Fragility Function Estimation »

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Articles by Bryan Christiansen
in the CMMS and Reliability series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Posts

  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success
  • 3 Types of MTBF Stories
  • ALT: An in Depth Description
  • Project Email Economics

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy