Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / Futility of Using MTBF to Design an ALT

by Fred Schenkelberg 8 Comments

Futility of Using MTBF to Design an ALT

Futility of Using MTBF to Design an ALT

Futility of Using MTBF to Design an ALT

Let’s say we want to characterize the reliability performance of a vendor’s device. We’re considering including the device within our system, if and only if, it will survive 5 years reasonably well.

The vendor’s data sheet lists an MTBF value of 200,000 hours. A call to the vendor and search of their site doesn’t reveal any additional reliability information. MTBF is all we have.

We don’t trust it. Which is wise.

Now we want to run an ALT to estimate a time to failure distribution for the device. The intent is to use an acceleration model to accelerate the testing and a time to failure model to adjust to our various expected use conditions.

Given the device, a small interface module with a few buttons, electronics, a display and enclosure, and the data sheet with MTBF, how can we design a meaningful ALT?

What to Measure

The data sheet and our system’s functionality relying on this device define a range of possible elements to measure. We could measure display brightness, button functionality, response times, life of the electronics, etc.

Before selecting what to measure in the ALT, we need to stop and ask what will limit the life of the device in our application? The provided reliability information doesn’t say. It just says the device has a suspiciously round number MTBF value of 200k hours.

An FMEA, risk analysis, or discussion with the development engineers may narrow down the possible elements of the device that will likely fail first. If time and resources permit, maybe running HALT to find weaknesses (ID failure mechanisms) is on order. Again, just having MTBF doesn’t help.

Which Stress to Apply

Knowing the likely failure mechanism to cause the device to fail is an essential first step to select the appropriate stress (temperature, vibration, power cycling, etc.) to accelerate that failure mechanism.

Not every failure mechanism responds to an increase in temperature. Applying the wrong stress will lead to poor results.

The data sheet might have some environmental or operating limits (power, voltage, temperature, etc.) Those may be clues as to important stresses to explore how they lead to failures.

Like when determining what to measures, we need to sort out which stress, or stresses, provide a means to accelerate the failure mechanism of interest.

Acceleration Model

Let’s say we estimate a rubber seal around the display is likely to fail and could be accelerated using higher temperatures.

Instead of the normal operating temperature of 25°C, let’s double it to 50°C. Ok, so? How much of an acceleration does that change in temperature cause? That is why we need an acceleration model.

The temperature increase might increase the chemical reaction between the material and oxygen and we can use the Arrhenius mo l, if we know or can estimate the activation energy.

Or, the temperature increase may increase the compression of the seal creating a mechanical deformation and damage over time. Here I’m not sure what model to use, yet the Arrhenius model would likely not be useful.

Of course, knowing MTBF provides no information on failure mechanisms other than to suggest the failures are repairable to keep the system running.

Time to Failure Model

Given MTBF we may assume the system has a constant failure rate, or not. Remember all life distributions have a mean value. Knowing the MTBF value doesn’t automatically imply a constant failure rate.

Therefore, if we assume an exponential distribution describes the time to failure pattern, we may be wrong, and most likely would be wrong.

Is the failure arrival pattern decreasing, increasing? We don’t know just knowing MTBF.

Knowing the failure mechanism and how an appropriate stress changes the failure rate is a great start. The design of the ALT includes sample sizes and how and when to make measurements. Knowing the expected pattern of failures given our samples allows us to monitor for failures as appropriate times.

Knowing the inverse of the average failure rate doesn’t really help us know when to expect failures to occur. Thus hampers our ability to design an efficient ALT.

Problems with MTBF Based Reliability Testing Formulas

An astute reader would probably wonder why we’re not using either time or failure truncated test planning and analysis. We have MTBF and that is all we need to design such life tests.

Well, the MTBF value is given and defines the testing. It doesn’t allow us to estimate the time to failure distribution. It may reveal if a system has poorer reliability then expected, yet now if it is better. Nor does such testing permit evaluation or understanding of the pattern of failures.

The MTBF based testing also assumes a constant failure rate. This means if we run 1,000 units for 20 hours, or 20 units run for 1,000 hours it has the same result. If the failure mechanism is wear out or a chemical degradation, then we are more likely to have failures in the units that run longer, and no or few failures in the group that runs for a few hours.

This approach is only appropriate if you know, without doubt, the dominant failure mechanism is best described by an exponential distribution and has an equal chance of failure every single hour of operation. If this is not a certainty, then running 20 or 1,000 units till you have sufficient failures to estimate the time to failure distribution is prudent.

Summary

Running an ALT is expensive. Let’s get the design of the ALT right. That starts by ignoring MTBF claims by vendors, and getting to know the failure mechanisms.

Filed Under: Articles, NoMTBF

About Fred Schenkelberg

I am the reliability expert at FMS Reliability, a reliability engineering and management consulting firm I founded in 2004. I left Hewlett Packard (HP)’s Reliability Team, where I helped create a culture of reliability across the corporation, to assist other organizations.

« Test Design for Component Life
Three Ways Great Facilitators Anticipate Trouble »

Comments

  1. Tim Gaens says

    December 29, 2017 at 11:37 AM

    Next question would be:
    How many MTBF did you proof with your ALT?

    Reply
    • Fred says

      December 29, 2017 at 11:52 AM

      Not sure I understand the question, Tim. Given that ALT’s tend to examine wear out type failure mechanisms, MTBF would not be a suitable metric to use.

      Reply
      • Tim Gaens says

        December 29, 2017 at 1:59 PM

        Sorry Fred,

        Was just playing the manager role here.
        I’m still having a hard time getting people away from MTBF.

        I agree on the article. Thanks for sharing.

        For a manager it is easier to understand MTBF, it simplifies stuf, but it is wrong.

        I like to see more examples how it should be done as common practices.
        e.g. in your article are still a lot of assumptions to be made for the Acceleration Model (and probably always need to be made, after FMEA, product history, field data from similar products, …)
        e.g. should we ask failure mechanism from our suppliers? (and failure rate for each mechanism?)

        Reply
        • Fred says

          December 29, 2017 at 2:21 PM

          No worries Tim, yes ask for failure mechanisms and models (not failure rates) to estimate failure rates given your particular set of environmental and use stresses. cheers, Fred

          Reply
          • Tim Gaens says

            December 29, 2017 at 2:25 PM

            Do you know component supplies that can provide this?

  2. Tim Gaens says

    December 29, 2017 at 2:25 PM

    Rephrase, “willing to” provide this.

    Reply
    • Fred says

      December 29, 2017 at 2:35 PM

      Hi Tim,

      Over the years I’ve worked with many vendors that can and did supply detailed failure mechanism and associated models. Fans, bearings, memory, IGBTs, etc.

      If you don’t ask, you will probably only get MTBF… so ask.

      Cheers,

      Fred

      Reply
  3. Larry George says

    October 15, 2024 at 9:16 PM

    Thanks to Fred for raising valid question. Here is a starting suggestion.
    Want to list failure modes in order of criticality? Not RPNs or other armchair guesses.
    Use library of failure rates from MIL-HDBK-217F or MIL-HDBK-217G (George, from your field data) plus library of failure mode probabilities from MIL-HDBK-338B or your own experience? Combine it all with FMERD (workbook), “Failure Modes and Effects Reliability Diagnostics” latest revision May 2016. It ranks alternative failure modes on the basis of criticality, using your field experience if available supplemented with handbook values.
    Available from pstlarry@yahoo.com.

    Reply

Leave a Reply to Tim Gaens Cancel reply

Your email address will not be published. Required fields are marked *

The NoMTBF logo

Devoted to the eradication of the misuse of MTBF.

Photo of Fred SchenkelbergArticles by Fred Schenkelberg and guest authors

in the NoMTBF article series

Recent Posts

  • Leadership Values in Maintenance and Operations
  • Today’s Gremlin – It’ll never work here
  • How a Mission Statement Drives Behavioral Change in Organizations
  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy