Let’s say we have a product that most often fails for one major component. Let’s say a fan (it could be anything, and while I don’t have anything against fans, it’s easy to picture).
Ok, this fan has a data sheet with the classic reliability claim of 50,000 hours MTBF. For those that know about my disdain for MTBF (www.nomtbf.com) rest assured I’m not going to get into it here. The basic approach for estimating the number of failure during any period of time does require a few pieces of information. MTBF is common on data sheets, so, in this case, that’s where we start.
Without any other information about the life distribution and given only MTBF, we will have to use the exponential distribution. The cumulative distribution function is
$$ \large\displaystyle F\left( t \right)=1-{{e}^{-{}^{t}\!\!\diagup\!\!{}_{\theta }\;}}$$
where, F(t) is the probability of failure up till time, t. Theta, θ, is the MTBF.
The next piece of information we need is the warranty period or the period of time of interest. In this case, let’s say it’s three years. And, since the fan is the primary concern in this simple example, we can consider the duty cycle of the fan within the product. The sake of ease in this example, let’s say the fan in working full time (maybe a server product, for example). That means the fan will operate for 365 days x 24 hours x 3 years = 26,280 hours.
Now we’re ready to do the calculation.
t = 26, 280 hours
θ = 50,000 hours
Using the equation above, we find 0.41, or we would expect that about 41% of the fans would fail by three years. The time is related to the age of the individual units, not production time. In short, a lot would fail. How many?
We need how many units are shipped or expected to ship. Let’s say, we are assuming we will produce 10,250 of these products, how many will come back under warranty due to fan failure?
10,250 x 0.41 = 4202.5 or just over 4,000 fan failures.
Multiply the number of warranty failures by the cost of a warranty return to find a number of warranty reserves to set aside.
If you have any questions or would like to see other examples, please leave a comment. If you do have better data and are able to fit a distribution, such as Weibull, then take a look at a short tutorial that steps through the analysis and how to estimate future warranty returns.
Related:
Confidence Intervals for MTBF (article)
Using The Exponential Distribution Reliability Function (article)
Reliability Goal (article)
Hilaire Perera says
MTBF/MTTF as single point estimates are “risky”. Better to use Lower Confidence Limit of these numbers when calculating Reliability, Allocating Spares
Michael Li says
Hi,
By following formula, exp(-t/MTBF)=0.59, then 1 minus 0.59 equals 0.41. 0.41 would be the probability of failure. Is that true?
Regards,
Michael
Fred Schenkelberg says
Yes, Michael, that is true, the reliability function is as you describe it, and 1 – R(t) is the CDF which provides the probability of failure over the duration t. I forgot to subtract the reliability function (probability of success) from one. It’s updated now.
Michael Li says
This is a good article for helping me solving the relationship between MTBF and warranty.
KESAVA says
How would I calculate warranty cost for repairable products, if I have MTBF , missing time
Thanks,
Kesava.
Fred Schenkelberg says
Hi Kesava,
Pretty much the same way as in the article. If you have one piece of equipment, then skip the last part about how many units are running.
The hard part is with only MTBF you can only estimate the expected number of failures or the probability of failure over some duration. You need to be sure the MTBF value is valid over the time period of interest. IF the value is based on the first year of operation, it may not be accurate for the second year, and very inaccurate for the 10th year.
Another way to think of the problem is that MTBF is just the inverse of the failure rate. Given the failure rate per hour and how many hours you expect to run, calculate the number of expected failures.
You also need the cost of repair – or replacement.
If you really want to estimate the warranty of a repairable system – you really should understand the failure distributions for the repairable items and the overall system (reliability block diagram comes to mind here) and then estimate the costs based on which element of the system failures. A bit more complicated yet a whole lot more accurate.
Cheers,
Fred
Asfour says
Hi Fred,
at first thanks for the effort and followup, to answers others queries. My issue is related to devices warranty calculations, those devices vary from DDC controllers to various types of sensors, active and passive. one of the painful argument is how much spars cost should be considered during the warranty phase. which vary from 1-3 years.
MTBF for devices are known, but when i try to use available formulas and i tried a lot, the result is not logic. since actually this is not happening, and i mean by failure is device need to be changed/replaced not to be maintained. so can you help here?
Fred Schenkelberg says
Hi Asfour,
With actual field data, shipments and returns, better if you know the date of shipment or installation, and date of the return for specific serial numbers, you can sort out the time to failure distribution. I often start with Weibull and see how well that works. With that data, you have a representation of the actual rate of field failures and can estimate future failures as well.
Using MTBF or MTTF of components or any parts count type estimate of reliability rarely, and only by luck, going to represent the actual field reliability performance. Using field data and calculating MTTF or MTBF likewise will provide a crude estimate that does not include the changing nature of the failure rate as the item ages.
So, do not use MTBF. Use the field data you have.
Cheers,
Fred
Tom Nolan says
Looking to see which is the best way to calculate parts replaced and returned from the field. Currently using Predicted Annual Failure Rate (PAFR) is there any other method to do the calculation. I have had a request to do calculations on return rate do you know if it possible to do.
Failure Rate (PAFR) = the expected qty of returned parts to the OEM that are actually defective. This excludes NTFs. Again, expressed as a percentage of the component IB, annualised
Return Rate = the expected qty of parts returned from the field from Veritas’ service partner to our OEM, expressed as a percentage of the component IB, annualised
Fred Schenkelberg says
Hi Tom,
First off keep in mind that the annualized failure rate is an average and thus not informative on any changes to the rate of returns.
Second, always count NTFs – a very easy way to help improve the return rate is to classify more as NTF. Besides if you have NTF there is still something to solve else customers would not be returning them to you.
Third, better is to use the field return data directly to fit Weibull or appropriate distribution to the data – then use that information to predict returns each month going forward. Weibull++ has a handy tool to analyze and predict.
Forth, before shipping, you can use the development reliability block diagram and current reliability estimates to estimate warranty returns. You’ll need an estimate of weekly or monthly shipments as well.
Cheers,
Fred
Vijay says
Hi Fred,
Thanks for the example.
You arrived at 4202.5 failures based on CDF*number of fans.
What if we approach this from an expected number of failures view?
For a component having constant failure rate,the expected number of failures follows a poisson process with a mean of n*λ*t
Therefore , expected number of failures over time (26,280 hrs) = 10,250*1/50,000*26,280 = 5387.4 which is vastly different from 4202.5.
Which one is the correct methodology.
Thanks
Fred Schenkelberg says
Hi Vijay,
I do not think either is appropriate nor very good (accurate) as very few if anything follows a constant failure rate. Better to understand the driving failure mechanism and model the time to failure behavior.
Cheers,
Fred
William Thorlay says
Hi Fred,
Considering a duty cicle of 12 h/day, should I use only this 12 h and calculate F(t) in 3 years? If I am a maintenance engineer, should I take the downtime hours to calculate F(t) or assume that the down time is not representative and just use the period of time that I want to know this particular F(t).
Fred Schenkelberg says
Hi William, both good questions. Yes, adjust the time element to reflect the duty cycle and be clear about what 3 years represents – i.e. not 24/7 operation. For the maintenance example, downtime is fine, yet you most likely will want to know more than just an average. As with any set of data, adjust the analysis to help you learn or understand what is happening – the analysis should lead to better questions as you explore ways to make improvements or changes. cheers, Fred
Mark fiedeldey says
Fred,
I bet this was difficult for you to force yourself to write. MTBF is such a substandard metric. But thanks for the example.
Happy Easter,
Mark
Fred Schenkelberg says
Hi Mark, thanks for the note – many of my short tutorials are for those preparing for the ASQ CRE – yet, you know how I feel about using MTBF in any situation. cheers, Fred
Srinivas GS says
Hi Fred,
How can I predict failure rate and future warranty claims if I have field failures of returned products of 0 to 6 months. Assume sold qty 600nos per month . What will be the failure rate for 5th year of 60th month.
Months Failures Qty sold
0 3 600
1 11 600
2 17 600
3 23 600
4 23 600
5 21 600
6 3 600
Fred Schenkelberg says
Hi Srinivas,
seems you have consistent shipments or items sold. Having the number of units that have failed in the table do not seem to related to how old the unit is when it failed. of the 17 that failed in month two, where those from month zero or one or two? THis matters as what you need is time to failure information for each failure which allows you to also sort out the time to censored for those still operating. With the ‘time to’ data you’re ready for what we commonly call Weibull analysis (regressional analysis fitting a distribution to the data).
Enjoy the day (and the entire year) and best wishes.
Cheers,
Fred
Bernadeth De Belen says
Hi Sir, can you help me with this one. It is required to produce a device having a reliability of at least 95% over a period of 500hr. Estimate the maximum permissible failure rate and minimum MTBF
Fred Schenkelberg says
Hi Bernadeth,
Given minimum reliability of 95% or 0.95 and given that the probability of failure over the time period (500hs) is related to reliability as R(t) = 1 – F(t), we know over the 500 hrs you can have no more than 5% of items fail to achieve the 95% reliability.
Now, MTBF, first we really should not use it for many reasons. If the underlying time to failure distribution is well described by the exponential distribution, you can use the first formula in the article and simply solve for theta (which is MTBF, in this case). If not an exponential distribution, then you’ll need a bit more information than just a desired reliability and duration. Oh, F(t) here is the given 0.95 and t is 500 hours.
cheers,
Fred
Dustin says
Great article Fred. I stumbled upon this when looking for other examples of how to perform a warranty risk calculation. The way I did my calculations was not using the cumulative distribution function, but assuming a constant probability of failure over time. By running my calculation and yours for a 3 year warranty period, our numbers come out pretty close. I found that interesting. I think it makes sense to use the cumulative distribution as it assumes you would have a lower failure rate when the part is just installed, however I don’t think this accounts for infant mortality. For that reason I wonder if using a constant failure rate would be better? In any case, as I said our numbers actually came out pretty close when I totaled the cost of 200 different vehicle parts over a 3 year period.
Fred Schenkelberg says
Hi Dustin, thanks for the note/question and for reading through the article. Be certain that the distribution fitted to the data actually is appropriate. If there is a mix of distributions due to differing dominate failure mechanisms you may need two or more distribution to fit elements of the data.
Using a poorly fitted distribution or assuming it’s close enough to constant leads to under/over estimating reliability or failure rates at different over selected time periods. It also provides a false model of what is actually happening.
cheers,
Fred
Erik Johannes says
Fred, Thank you for your article. I have a system that is repairable. For each system component I know the MTBF. From your article I understand I can use the cumulative distribution function F(t)=1 – e^(-t/MTBF) to calculate the probability a component will fail before time t. My First Question: If I add all system component F(t) values for a given time t will the result be the probability of a failure of at least one component within the system before time t? My Second Question: If I add all the products of multiplying F(t) for a component by its Component Repair Cost will the result be an estimate of the repair cost at time t? – thx, Erik J
Fred Schenkelberg says
Hi Erik,
It’s easier to add the failure rates ( 1/MTBF) values, then convert back to MTBF for use in the formula you mention… you can add the lambda’s not MTBFs
Using the CDF you will get time to first failure, for any reason
Not sure about the component repair costs… best to run a simulation, which includes time and repair costs to get a better answer – a reliability block diagram approach may work well.
cheers,
Fred
Erik Johannes says
Thanks for your time.
Sachin says
Hi Fred.
whats your thought on warranty for a product having MTBF of just 50,000 hrs?
can we put a small correlation on how to setup warranty considering MTBF?
Fred Schenkelberg says
Hi Sachin,
You may already know my opinion of MTBF – and I suggest you get more information (or at least some information concerning the reliability of your product – which is not MTBF). Considering there is an infinite number of failure patterns that may calculate out to 50K hours, you have less than useful information if only using such a metric.
A warranty policy is more than the product’s expected reliability performance – it is part marketing and part customer expectation. For some product categories warranty duration and basic terms are mandated by local laws/regulations.
There is some correlation possible between field failures and cost to service those failures covered by warranty. What is often not covered by warranty for a product that has a higher than expected failure rate or costly failures for the consumer is the loss of market share.
cheers,
Fred
PS: please avoid using MTBF as it is less then useful related to product reliability.
Miguel says
Hi Fred!
Thanks for the great tutorial.
I have a question regarding estimation of warranty failures in a population of devices with mixed lifetime.
Whereas for new products the failure estimate can be calculated directly from the CDF, I would imagine that for products already in use for some time, we would have to perform some sort of adjustment based on the device’s elapsed lifetime, correct?
How would we then be able to predict the number of failures in the future while taking into account both new and previously functioning devices? Would we calculate the CDF value for each individual device and then compute a final prediction for the whole population based on them?
Thanks for your time.
Best,
Miguel
Fred Schenkelberg says
Hi Miguel,
Thanks of your question.
Dealing with data with products that have different durations in service is common when shipping new products each month. I use a Nevada chart to gather that data, and some reliability stats packages use the Nevada chart to allow easy data input.
You can fit a distribution and get the CDF with such data. There will often be shorter duration in service products, yet they tend, if all is going well, to have fewer failures. Be sure to include all the censored data – that which hasn’t failed yet.
Use all the data to estimate the CDF – it doesn’t make sense, nor do I think it is possible to get a fitted distribution and the CDF from just one device. Now, if the product is repairable, then the use of Weibull or Lognormal isn’t appropriate, and you should be using recurrent data analysis.
So, assuming you are shipping new products on an ongoing basis, use the shipped units, the returned/failed unit information, and estimate the fitted distribution (Weibull Analysis). then use the CDF to estimate how many you will expect to fail in the coming month or the duration of interest.
See https://fred-schenkelberg-project.prev01.rmkr.net/field-data-analysis-first-look/ for an overview of doing this analysis and future estimate – I’m using Weibull++, yet there are many packages out there that would also work.
cheers,
Fred
Miguel says
Thanks for your quick reply and additional information. Indeed, this seems to be what I am looking for. I had previously built a Kaplan-Meier curve on our data (using Python) in order to understand the survival probability of our product, but I was missing the method to integrate the continuously increasing number of products shipped/in use.
I have a follow-up question regarding the Nevada table. Let’s say that in the example you shared the product had a warranty of 6 months, after which period we wouldn’t care anymore about the products failing as they would no more represent a replacement cost. Would we simply leave those months after warranty expiration, for each of the shipments, empty?
Another question, and maybe I am over-complicating things here. I assume one can select the start of this analysis at any given point in time. But if that is the case, how to treat data from products that have been shipped before and have gotten some use? If for instance I start my analysis from January 2024, in that month, but also in upcoming months until the warranty expires, I should expect some returns from products that have been shipped in December 2023. Is there a way to account for those?
Best,
Miguel
Fred Schenkelberg says
Hi Miguel,
Happy to help.
On the Nevada table – first off, I would not dismiss the data about returns after the warranty period. While not useful for warranty future estimates or monitoring, it is useful for customer satisfaction and design improvements. Also, it may not be as complete as the data within the warranty period.
If you can start on any data for the analysis, I would back up to when units that are still under warranty began shipping. So back up at least 6 months and, as best as possible, collect the necessary data. Otherwise, if that information is not available, and you get a return that is before when you were tallying shipments, I would not use that data in the analysis – which means it may take 6 months before you have a clear picture of what’s happening.
If only interested in the warranty period and failures are reported after that device’s warranty expires – I would not add that failure to the analysis, yet would track those separately to spot any major issues, a change of failure mechanisms that would shorten the expected life, etc. Often, customers buy a product with, say, a 6-month warranty yet fully expect the device to work as expected for 5 years. A failure after 6 months is damaging to brand loyalty, customer satisfaction, etc. So, it is something to pay attention to.
hope that helps.
Also, if using Python, check out https://fred-schenkelberg-project.prev01.rmkr.net/articles/on-tools-techniques/reliability-engineering-using-python/, which points to a site with plenty of reliability-related content based on Python.
cheers,
Fred
Miguel says
Hi Fred.
Really helpful follow-up! And the Python link is extremely relevant. Thanks for taking the time to help out.
Best,
Miguel
Fred Schenkelberg says
You are very welcome. cheers, Fred