Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / Why Effectively Communicating System Redundancy to Decision Makers Is So Important

by JD Solomon Leave a Comment

Why Effectively Communicating System Redundancy to Decision Makers Is So Important

Why Effectively Communicating System Redundancy to Decision Makers Is So Important

Effectively communicating system redundancy is important because redundancy touches system performance, risk management, disaster recovery, regulatory compliance, and customer & owner confidence. Getting the redundancy communication wrong produces blind spots and surprises. Getting it right produces a well-oiled, predictable machine. This article provides proven tips for effectively communicating system redundancy.

Redundancy and How to Apply It

Redundancy is the existence of more than one means for accomplishing a given function. Each means of accomplishing the function need not necessarily be identical.

Understanding How to Apply Redundancy to Facilities and Critical Infrastructure 

The FINESSE Fishbone Diagram as One Approach

Strategic, operational, and emergency situations require different communication approaches. FINESSE is a strategic communication approach for getting senior management and decision makers to understand.

Trial and error is not advisable for big decisions. 

Confirm Redundancy (Don’t Assume It)

We must confirm that it works if we rely on redundancy to minimize risk. In other words, tolerating failure only works when you confirm that the redundancy works. These are some key reasons that redundancy fails that I have encountered throughout my career. 

  • Poor switching
  • Missing equipment
  • Oversold designs
  • Non-documented modifications
  • Poorly understood systems 

Is there an alternative to redundancy? Yes, fault avoidance is the other branch of fault management. Fault avoidance includes simplicity, better parts, lower stresses, and training.

The reality is that good fault tolerance (redundancy or robustness) starts with good fault avoidance. 

Communicating with The Four Horsemen of Redundancy

The original Four Horsemen are:

  • Conquest, who rides a white horse and carries a bow.
  • War, who rides a red horse and wields a large sword.
  • Famine, who rides a black horse and holds a pair of scales.
  • Death, who rides a pale horse and is followed by Hades. 

The Four Horsemen of Redundancy are:

  • Complexity
  • Independence
  • Propagation
  • Human Error 

1. Complexity

Extra elements require further managerial systems to determine, indicate, and mediate failures. Redundancy can increase to the point where it is the primary source of unreliability. 

Key Points for Communicating Complexity

  1. More elements equal more points of failure.
  2. Too little or too much redundancy can make a system more fragile and unreliable.
  3. More training, support systems, and management processes come with redundancy. 

Key Approach

Show them the visuals! 

2. Independence

“Independent” means that the chances of one failing are not linked in any way to the chances of the other failing. Most things do not work or fail independently. Independence is a simplifying assumption in analysis (breaking things into parts) and design. 

Key Points for Communicating Independence

  1. Many redundancy calculations assume that redundant elements behave independently.
  2. Identical elements will likely wear in similar ways.
  3. Identical elements will likely fail at similar times. 

Key Approaches for Separation and Single Points of Failure

  • Use a simple visual.
  • Have a concise message.
  • Block diagrams and line drawings are good tools. 

Key Approaches for Diversification

  • Keep discussion as simple as possible (diversification is not easily accepted).
  • Use an example from the relevant industry.
  • Use a logical discussion that if you use redundancy, use it to the fullest extent possible.

3. Propagation

An unexpected failure mode or effect in an upstream system may unexpectedly impact (over-stress) the performance of downstream systems. The unexpected catastrophic failure of an upstream system may wipe out a downstream system. This is commonly called a cascading failure.

Another failure mode is an adjacent system not otherwise connected to the main system. However, a partial or complete failure of the adjacent system creates a failure of the main system. This behavior is referred to as system-of-systems (SoS) failure. 

Key Points for Communicating Propagation (cascading failures)

  1. Redundancy and robustness are not the same.
  2. Robustness is a system’s ability to handle a wide range of inputs, stresses, and unexpected conditions.
  3. Capacity erosion (loss of design capacity) is a real consequence of using redundancy to offset stresses from overloaded upstream systems. 

Key Approach

The problem frame, including system boundaries and definitions, must be clear. 

4. Human Error (Human Performance)

Failures of all kinds instigate human action. When redundancy is present, human interaction is easy to occur as a response to the designed potential for failure of a primary unit.

Technological failures in monitoring or control systems open the door to human error, even if the system is actually functioning as designed.

Errors are not automatically detected in many cases. Even worse, latent human errors can become ‘normalized’ over time, resulting in an unconscious reliance on redundancy. 

Key Points for Communicating Human Error

  1. Systems include people, equipment, processes, inputs, outputs, and feedback loops.
  2. Communicating that people are part of systems, especially those that tolerate failure, is part of an upfront message related to systems thinking.
  3. More often than not, everything goes into chaos when redundancy happens. That’s largely because humans intercede when something catastrophically fails, whether it’s part of the plan or not. People are an integral part of any system.
  4. Frontline staff are often incorrectly blamed in crises and poor failure evaluations. That’s unfortunate because more standards, training, and management systems are needed when the plan is to tolerate failure. 

Key Approach

Communicate systems, not blame. The root causes usually relate to management systems, training, and standards. Those aren’t ever perfect, but neither are humans. 

Tying Tips to a Communication Approach for Redundancy

Communicating to Senior Management

Communicating with FINESSE focuses on getting the boss’s boss to understand. In this case, tying tips into a communication approach is about communicating up to decision makers. 

FINESSE and the Seven Bones

Effective communication requires doing all seven bones well, but not necessarily perfect. Here, we boil down to one tip per bone for communicating system redundancy. Obviously, there is more than one tip per bone.

Frame: Explain the frame, including the system boundaries and key definitions.

Illustrate: Use block diagrams and line diagrams.

Noise reduction: Too many reliability calculations produce noise; keep discussions on interfaces and switches.

Empathy: No one wants failure, yet we design to have it. Senior management must understand that redundancy is a form of failure tolerance.

Structure: Discuss the weak points first (single points of failure or need for separation).

Synergy: Have one-on-one discussions before having a group meeting.

Ethics: Redundancy still requires high levels of safety, specification, and rigorous testing (validation). 

Are you Communicating with FINESSE? Apply systems thinking and the FINESSE fishbone diagram.
The seven cause-and-effect bones of the FINESSE Fishbone Diagram.

Communicating System Redundancy

There you have it! How to communicate system redundancy in less than 1200 words! You could make a whole webinar or short course out of this (wait a minute, we do). 

Why 1200 words? Because that’s about as much time (less than 10 minutes) as senior management has for you. 

Are you Communicating with FINESSE?

This article first appeared on www.communicatingwithfinesse.com


Founded by JD Solomon, Communicating with FINESSE is a not-for-profit community of technical professionals dedicated to being highly communicators as trusted advisors to senior management. Learn more about our publications, webinars, and workshops. Join the community for free.

Filed Under: Articles, Communicating with FINESSE, on Systems Thinking Tagged With: communication, Fault tolerance, Redundancy, Reliability, systems thinking

About JD Solomon

JD Solomon, PE, CRE, CMRP provides facilitation, business case evaluation, root cause analysis, and risk management. His roles as a senior leader in two Fortune 500 companies, as a town manager, and as chairman of a state regulatory board provide him with a first-hand perspective of how senior decision-makers think. His technical expertise in systems engineering and risk & uncertainty analysis using Monte Carlo simulation provides him practical perspectives on the strengths and limitations of advanced technical approaches.  In practice, JD works with front-line staff and executive leaders to create workable solutions for facilities, infrastructure, and business processes.

« Eight Disciplines (8D) in Root Cause Analysis (RCA)
Key Teaching Principle # 10: Application »

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Headshot of JD SolomonArticles by JD Solomon
in the Communicating with FINESSE article series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Posts

  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success
  • 3 Types of MTBF Stories
  • ALT: An in Depth Description
  • Project Email Economics

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy