Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / The Meaning of a Failure

by Fred Schenkelberg Leave a Comment

The Meaning of a Failure

The Meaning of a Failure

Every failure provides information. It provides time to failure, stress strength relationship, process stability and design margin types of information. In every case. Even failures directly related to human error.

A hardware intermittent failure observed by a firmware engineer should not be dismissed. Rather recorded, explored and examined.

A single intermittent failure, or glitch, may indicate nothing other than just a totally random glitch, or a design error that degrades over time causing 50% of units to fail in first three months.

For complex systems with the expectation of high reliability every failure seen during the prototype phase provides essential information if examined.

Statistics of single failure

Every failure has meaning. If you have 100 prototypes and one fails, that indicates a 1% defect rate. If we only have 10 prototypes, that one failure may indicate a 10% failure rate. And with two prototypes, a 50% failure rate. Sure, more prototypes would be nice along with the budget and time to thoroughly test each prototype.

That often doesn’t happen.

And, even with one prototype, we have failures. We learn something about the design or manufacturing process. The team adjusts and expects to minimize or eliminate the failure mechanism.

Without more samples or failures a single failure may indicate a very rare event or very common event. Detailed failure analysis will help understand the potential and the root cause, and we should know the expected failure rates before going to production.
Thus, every failure, even ones that are easy to explain away require investigation.

Here is another way to look at failures on limited prototypes. The chance that a rare or isolated event causes a failure is very rare. To have a reasonable chance to find a 1% defect rate we need to evaluate over 100 samples. So if only testing 10 samples, it is very likely that failure experiences will occur in 10% or more of the production units.

Do not ignore any failure during the prototype phase especially when hampered with very few prototypes to find the failures.

Indication of Variability

Single failure

A single failure in a complex system indicates that something is not right somewhere in the design or manufacturing process.

It is difficult to know from a single failure is it indicates a nominal behavior, a rare event, or something in-between. We also may not know if the issue is with the nominal performance centering or with the process variability.

Two failures

Two failures with the same component involved provide information on the nature of the problem.

The difference between the two item’s dimensions or performance is the range of the expected variation. Two values that are both below or above the specification indicates a very likely need to adjust the process to center the performance within the specification.

Two failures also indicate this was clearly not a one-time event.

Three or more failures

Three or more failures of an item allow us to calculate the standard deviation of the expected spread of the item dimension or performance.

Of course the more measurements we have available of items that lead to failure and are within specification the better. One tactic is to focus on gathering information of those items that have led a to failure, to gauge the nature and magnitude of the issue.

Indication of Importance

If a failure occurs and shuts down a system or device for a customer, that failure is important. A single failure of a prototype is likewise important — because it is a clear way to avoid failures that a customer may experience.

A prototype failure indicates the design margin or stress-strength relationship is not adequate. Of the one thousand parts in the system, this one failed. It marks that failed item as important. It may not be the only important part related to product reliability and may not even have been suspected as being important, yet it is on the important list now.

Every failure should bring attention to the item that failed, the root cause of the failure and specific ongoing steps to avoid the failures going forward.

Filed Under: Articles, Musings on Reliability and Maintenance Topics, on Product Reliability Tagged With: Failure analysis (FA), Failure Mode and Effects Analysis (FMEA)

About Fred Schenkelberg

I am the reliability expert at FMS Reliability, a reliability engineering and management consulting firm I founded in 2004. I left Hewlett Packard (HP)’s Reliability Team, where I helped create a culture of reliability across the corporation, to assist other organizations.

« Process Capability
Pre-Control Charts »

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Article by Fred Schenkelberg
in the Musings series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • Today’s Gremlin – It’ll never work here
  • How a Mission Statement Drives Behavioral Change in Organizations
  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success
  • 3 Types of MTBF Stories

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy