
Despite standing for the ‘time between failures’, MTBF does not represent a duration. Despite having units of hours (months, cycles, etc.), it is not a duration-related metric.
This little misunderstanding seems to cause major problems.
MTBF Calculation From Data
If I have ten pieces of equipment and they have run for a year, 8,760 hours. And, during that year we enjoyed five failures, which were quickly repaired, what is the MTBF of that equipment?
Ten units running for 8,760 hours for a total operating time of 87,600 hours. 5 failures are the only other bit of information needed for the calculation. 87,600 divided by 5 is 17,520 hours MTBF.
MTBF, Duration, and Confusion
Of the ten pieces of equipment that each operated for a year and experienced five failures, how does the mean time between failures of 17,520 hours remain consistent with the idea (mistaken idea) that we should only have one failure every 17,520 hours for each piece of equipment?
It is consistent if we expect one failure every 17,520 hours, and 17,520 divided by 8,760 hours is 2. Therefore, we expect each piece of equipment to have a 50% chance of failure yearly. 10 times 50% is 5, which is what we experienced.
The confusion occurs when some expect all ten units to run for 2 years and only have one failure. Or that each unit should operate 17,520 hours and then have a failure (this is less common to consider MTBF a failure-free period, yet it occurs).
MTBF is an Inverse Failure Rate
Keep in mind that we can consider MTBF to be a probability of failure. Unit-wise, it is an inverse failure rate or the chance of failure per hour.
In the example above, we have a 1 in 17,520 chance of failure every hour. Of course, ignoring early life and wear-out patterns is something one should never do. The more hours the equipment runs, the more times we have a 1 in 17,520 chance of failure. Run for two years, and you are pretty much certain to have at least one failure.
MTBF does provide a chance per unit (in many cases an hour) of failure, it doesn’t mean the failure rate is accurate or fixed over any period of time we want to use.
In the example above, we have data for one year of operation for the ten units. We do not have information over two years (17,520 hours) nor over 10 years. The MTBF value we calculated only represents a failure rate that is valid for one year. As the equipment breaks in or wears out, it will most likely be less accurate.
Summary
MTBF is not all that helpful as we rarely encounter a constant failure rate pattern with equipment. Second, MTBF is just a fancy way of representing a failure rate. It does provide information on the chance of failure per hour per piece of equipment. It does not suggest the equipment will have a two-year life with no failures or that the equipment will run for two years with only one failure.
MTBF is not all that helpful for many reasons, one is we often work with people that do not understand what MTBF is or is not. MTBF is not a duration it is a probability of failure, that is all.
the CRE BOK have many examples of probability of failure without duration.
this is why it annoys me when i see the definition that reliability is the probability of success, over a period of time, which of course, it is not.
it is the probability of success for a given scenario, which may be time, but might not be.
Much confusion in the calculation…For a constant failure rate, the probability of failure after 1 year is
1-R(1year) or 1-exp(-0.5)= 0.39
so 39% and not 50%. And it decreases the following year…
Hi Marie,
Remember the exponential distribution is not a normal distribution. We learned in school with a normal that the average or mean is 50%. other distributions, especially skewed distributions do not have an average at the 50th percentile. It will vary.
This is a common confusion with the constant failure rate assumption coupled with our stats knowledge based on the normal distribution.
The math you did is right and if the population has a 50% failure rate over a year, then the probability of failure is as you calculated. Failure rate (or MTBF) is not the same as the probability of failure.
Cheers,
Fred
Hi all.
To reiterate. There are a few good examples of non duration reliability in OConnor. He uses cable strength expressed as a mean and standard deviation versus known loads. The stress/strain interference gives the probability of success.
If we even consider basic statistics, is the mean, or ‘average’ the most robust measurement of ‘expectation’ (or the ‘middle tendency’) of a distribution of continuous data? The mean is very sensitive to outliers and is not the most robust measurement for non-symetrical or skewed probability distributions. The median measure is more often appropriate. Notwithstanding that, if we are only provided with an MTBF figure and no other data, we have to assume the underlying distribution is exponential, then the mean equates to a 63.2% probability of failure – not as many people assume 50%. For a symmetrical normal distribution both the median and mean is 50%. Why do we persist in using mean, when median is a better choice? Even the Median measure by itself is not enough, we need to know how the data is distributed to understand reliability.