
Just a short post to point to a paper on the accuracy of part count prediction techniques. A few years ago, I recalled seeing a paper that studied the difference between various parts count methods and actual results.
[Read more…]Your Reliability Engineering Professional Development Site
A series of articles devoted to the eradication of the misuse of MTBF.
ISSN 2168-4375
Plus, we explore other commonly misused or misunderstood reliability-related topics and what one should do instead. A little understanding will help you get better results with your efforts.
Note: This is a reposting with editing, updating, etc. of the articles that first appeared at NoMTBF.com.
by Fred Schenkelberg 2 Comments
Just a short post to point to a paper on the accuracy of part count prediction techniques. A few years ago, I recalled seeing a paper that studied the difference between various parts count methods and actual results.
[Read more…]Historically, Reliability Engineering of Electronics has been dominated by the belief that 1) The life or percentage of complex hardware failures that occur over time can be estimated, predicted, or modeled, and 2) the Reliability of electronic systems can be calculated or estimated through statistical and probabilistic methods to improve hardware reliability. The amazing thing about this is that during the many decades that reliability
[Read more…]by Fred Schenkelberg Leave a Comment
With the kind permission of Wayne Nelson and Robert Abernathy, we are posting an article on the analysis of repair data. As you may know, the assumptions made when using simple time-to-failure analysis of repairable systems may provide misleading results. Using the analysis method outlined by Wayne is one way to avoid those costly mistakes.
[Read more…]by Pete Stuart Leave a Comment
When conducting a Human Reliability Assessment (HRA), we use the terminology errors of commission or errors of omission. It behooves every professional to question why we focus on one metric in preference to all others in an objective and constructive manner in order to discern whether we are exposing our organization to errors of professional omission or commission. The other conclusion is that we are doing the right thing and this is also an empowering piece of knowledge.
by Fred Schenkelberg Leave a Comment
Giving a presentation last week and asked if anyone uses an 85%RH/85°C type test, and a couple indicated they did. I then asked why.
The response was – just because. We have always done it, or it’s a standard, or customers expected it. The most honest response was, ‘I don’t know’.
Why is the test being done? Who is using the information for a decision? What is the value of the test results? If ‘just because’ is the best you can say about a test, why do it?
by Fred Schenkelberg 1 Comment
I’ve often railed on and on about the inappropriate use of MTBF over Reliability. The often cited rationale is, “it is simpler”. And, I agree, making simplifications is often necessary for any engineering analysis.
It goes too far when there isn’t any reason to knowingly simply when the results are misleading, inaccurate or simply wrong. The cost of making a poor decision based on faulty analysis is inexcusable.
by Fred Schenkelberg 1 Comment
During RAMS this year, Wayne Nelson made the point that language matters. One specific example was the substitution of ‘convincing’ for ‘statistically significant’ in an effort to clearly convey the ability of a test result to sway the reader. For example ‘the test data clearly demonstrates…’
As reliability professionals let’s say what we mean in a clear and unambiguous manner.
Thus, you may suspect, this topic is related to MTBF.
[Read more…]by Fred Schenkelberg 4 Comments
I am a rock climber. Climbing relies on skill, strength, knowledge, luck, and sound gear. Falling is a part of the sport, and with the right gear, the sport is safe. So far, I’ve enjoy no equipment failures.
I do not know, nor want to know, the MTBF (or MTTF) of any of my climbing gear. I’m not even sure this information would be available. And, all the gear I use has a finite chance of failing every time the equipment is in use. Part of my confidence is that the probability of failure is really low.
[Read more…]by Fred Schenkelberg Leave a Comment
The classic formula for availability is MTBF divided by MTBF plus MTTF. Standard. And pretty much wrong most of the time.
Recently, working for a bottling plant design team, we pursued design options to improve the availability and throughput of the new line. The equipment would remain the same: filler, capper, labeler, etc. So we decided to gather the last six months or so of operating data, which included up and down time. Furthermore, the data included time to failure and time to repair information.
[Read more…]by Fred Schenkelberg Leave a Comment
Let’s look at the characteristics of a sound reliability metric and how MTBF is not true or beneficial. A metric should be true, beneficial, and timely. We’ll start with a rock climbing analogy.
A bolted hanger along a rock climbing route is often a welcome site. It provides the climber safety (clipping the rope to the bolt), direction (this is the way), and confidence. Does MTBF as a metric do the same for your organization?
As climbers, we count on the bolts to provide support in case something goes wrong or we need to rest along the route.
A reliability metric is often used in the same way as a climbing bolt. The measure, whether MTBF, Reliability, or Failure Rate, assures that the product’s reliability performance is as expected.
The organization’s profits are or will be safe. The development team uses the measures to guide design and supply chain decisions. The measure provides confidence to the organization regarding meeting customer expectations around reliability.
[Read more…]by Fred Schenkelberg 10 Comments
Note: This first article in the NoMTBF campaign was published on April 1st, 2009. Thus, we’ve been at this and making progress for a long time and come a long was since starting the NoMTBF campaign. I am looking forward to your comments, contributions, and suggestions.
Fred
At first, MTBF seems like a commonly used and valuable measure of reliability. Trained as a statistician and understanding the use of the expected value that MTBF represented, I thought, ‘Cool, this is useful.’
Then, the discussions with engineers, technical sales folks, and other professionals about reliability using MTBF started. And the awareness that not everyone, and at times it seems very few, truly understood MTBF and how to properly use the measure.
[Read more…]by Fred Schenkelberg Leave a Comment
Let’s think of this as a crowdsourced project. The first version of this book is a compilation of NoMTBF.com articles. It lays out why we do not want to use MTBF and what to do instead (to some extent).
With your input on success stories, how to make progress using better metrics, and input of examples, stories, case studies, etc., the next version of the book will be much better and much more practical.
[Read more…]
The calculation of MTBF results in a larger number if we make a series of MTBF assumptions. We just need more time in the operating hours and fewer failures in the count of failures.
While we really want to understand the reliability performance of field units, we often make a series of small assumptions that impact the accuracy of MTBF estimates.
Here are just a few of these MTBF assumptions that I’ve seen and in some cases nearly all of them with one team. Reliability data has useful information is we gather and treat it well. [Read more…]
In college, Mechanics was a required class from the civil engineering department. This included differential equation.
Luckily for me, I also enjoyed a required course called analytical mechanics for my physics degree. This included using Lagrange and Hamiltonian equations to derived a wide range of formulas to solve mechanisms problems.
In the civil engineering course, the professor did the derivation as the course lectures, then expected us to use the right formula to solve a problem. He even gave us a ‘cheat sheet’ with an assortment of derived equations. We just had to identify which equation to use for a particular problem and ‘plug-and-chug’ or just work out the math. It was boring. [Read more…]
Failure rate and probability are similar. They are slightly different, too.
One of the problems with reliability engineering is so many terms and concepts are not commonly understood.
Reliability, for example, is commonly defined as dependable, trustworthy, as in you can count on him to bring the bagels. Whereas, reliability engineers define reliability as the probability of successful operation/function within in a specific environment over a defined duration.
The same for failure rate and probability of failure. We often have specific data-driven or business-related goals behind the terms. Others do not.
If we do not state over which time period either term applies, that is left to the imagination of the listener. Which is rarely good.
There at least two failure rates that we may encounter: the instantaneous failure rate and the average failure rate. The trouble starts when you ask for and are asked about an item’s failure rate. Which failure rate are you both talking about?
The instantaneous failure rate is also known as the hazard rate h(t)
$latex \displaystyle&s=3 h\left( t \right)=\frac{f\left( t \right)}{R\left( t \right)}$
Where f(t) is the probability density function and R(t) is the relaibilit function with is one minus the cumulative distribution function. The hazard rate, failure rate, or instantaneous failure rate is the failures per unit time when the time interval is very small at some point in time, t. Thus, if a unit is operating for a year, this calculation would provide the chance of failure in the next instant of time.
This is not useful for the calculation of the number of failures over that year, only the chance of a failure in the next moment.
The probability density function provides the fraction failure over an interval of time. As with a count of failures per month, a histogram of the count of failure per month would roughly describe a PDF, or f(t). The curve described for each point in time traces the value of the individual points in time instantaneous failure rate.
Sometimes, we are interested in the average failure rate, AFR. Where the AFR over a time interval, t1 to t2, is found by integrating the instantaneous failure rate over the interval and divide by t2 – t1. When we set t1 to 0, we have
$latex \displaystyle&s=3 AFR\left( T \right)=\frac{H\left( T \right)}{T}=\frac{-\ln R\left( T \right)}{T}$
Where H(T) is the integral of the hazard rate, h(t) from time zero to time T,
T is the time of interest which define a time period from zero to T,
And, R(T) is the reliability function or probability of successful operation from time zero to T.
A very common understanding of the rate of failure is the calculation of the count of failures over some time period divided by the number of hours of operation. This results in the fraction expected to fail on average per hour. I’m not sure which definition of failure rate above this fits, and yet find this is how most think of failure rate.
If we have 1,000 resistors that each operate for 1,000 hours, and then a failure occurs, we have 1 / (1,000 x 1,000 ) = 0.000001 failures per hour.
Let’s save the discussion about the many ways to report failure rates, AFR (two methods, at least), FIT, PPM/K, etc.
I thought the definition of failure rate would be straightforward until I went looking for a definition. It is with trepidation that I start this section on the probability of failure definition.
To my surprise it is actually rather simple, the common definition both in common use and mathematically are the same. There are two equivalent ways to phrase the definition:
We can talk about individual items or all of them concerning the probability of failure. If we have a 1 in 100 chance of failure over a year, then that means we have about a 1% chance that the unit we’re using will fail before the end of the year. Or it means if we have 100 units placed into operation, we would expect one of them to fail by the end of the year.
The probability of failure for a segment of time is defined by the cumulative distribution function or CDF.
This depends on the situation. Are you talking about the chance to failure in the next instant or the chance of failing over a time interval? Use failure rate for the former, and probability of failure for the latter.
In either case, be clear with your audience which definition (and assumptions) you are using. If you know of other failure rate or probability of failure definition, or if you know of a great way to keep all these definitions clearly sorted, please leave a comment below.