MTBF Correlation vs. Causation: MIL-HDBK-217G

People claim poor correlation of predicted and observed MTBFs. That is understandable because handbook failure rates and fudge factors for quality and environment were derived from unknown populations or samples. People also claim there is no basis for applying statistics or probability to MTBF predictions. MTBF predictions use failure rate averages that lack statistical causation. Why not incorporate Paretos in MTBF predictions?

Paretos are fractions of equipment failures caused by each type of part or subsystem. They represent what really happens. Incorporating Paretos requires statistics to adjust MTBF predictions. That causes Paretos in MTBF predictions to match field Paretos. A 1992 ASQ Reliability Review article “MIL-HDBK-217G” proposed using observed Paretos to adjust handbook MTBF predictions with a “Reality” factor.

Correlation of MTBF Prediction and Observed MTBF?

The RAMS article by Jais, Werner, and Das says, “The ratio of [MTBF] predictions to demonstrated [MTBF] values ranges from 1.2:1 to 218:1. This shows that original contractor [MTBF] predictions for DoD systems greatly exceed the demonstrated results. In addition, statistical analysis of the data using Spearman’s Rank Order Correlation Coefficient show that MIL-HDBK-217 based predictions cannot support comparisons between systems.”

Kirk Gray’s April 2024 “Accendo Weekly Update” article claimed about electronics: “…there is little if any empirical field data from the vast majority of verified failures that shows any correlation with calculated predictions of failure rates.” “…actual field failure data, and the root causes of those failures can never be shared… Reliability data is some of the most confidential and sensitive data a manufacturer has.”

Ancient History?

In December of 2000 Kirk Gray and Wayne Tustin wrote, “It is no longer possible or reasonable to even attempt statistical estimates of reliability [MTBF] based on a summation of components’ reliability [failure rates], even if accurate data on current components was available.” [“Don’t Let the Cost of HALT Stop You,” by Kirk Gray and Wayne Tustin, 2001.]

I wrote back, “Wayne [Tustin] kindly sends me his newsletters. I had the impertinence to dispute a few statements in his article with Kirk Gray.” Wayne invited me to expound at greater length. His tolerance is commendable. Kirk and Wayne advocate HALT and HASS. Their December 2000 criticism was representative.

I replied that reliability statistics convert available data required by GAAP, even without lifetime data, into actionable information; information that helps decide whether to do anything and what to do, to what, when, and how much.

“It is with great sadness that we announce the passing of Wayne Tustin, founder and former president of ERI. Wayne passed away on May 10, 2018 at the age of 95. Wayne, a Fellow of the IEST, was involved in teaching of Vibration and Shock testing for 70 years.” [equipment-reliability.com/]

Kirk replied, “Please tell him [Larry George] to go read the article on the web from University of Maryland’s CALCE titled “Why the traditional reliability prediction models do not work
– is there an alternative?” by Michael Pecht. Pecht proposed Physics of Failures (PoF) [“Long-Term Overstressing of Computers,” Nov. 2011 by Kirk Gray and Mike Pecht].

I replied to Kirk and Wayne… I agree with their criticism of MTBF predictions, and PoF is fine for failure modes where the physics is well known. I’ll never forget an Nevil Shute novel [No Highway, 1948] about planes with wings that would fall off after exactly 11,296 hours. However, PoF is deterministic and won’t give you reliability, because reliability is probability of failure as a function of age. [See US DoT FAA AC No: 23.1309-lE reference for life limits based on fatigue life.] Milton Ohring’s book about PoF is comprehensive but largely deterministic. Handbooks that use PoF provide fudge factors based on unknown populations or samples; the same problem as with using handbook failure rates.

MTBF Prediction with Causation?

Why not incorporate causation into MTBF prediction? Generations of products have similar component, undergo similar manufacturing processes, have similar customers, and similar environments. Use observed Paretos [Schenkelberg, July 2024] to adjust handbook failure rates and MTBF predictions. MIL-HDBK-217G [George, June 1992] incorporates causation by making MTBF predictions have the same Paretos as observed Paretos. Adjust MTBF predictions with a reality factors p_R for each type of component in the MTBF prediction.

The MIL-HDBK-217F MTBF prediction is 1/∑N_i(λ_G*π_Q)_I, where N_i is number of component i, λ_G is generic failure rate for component i, and π_Q is its quality factor. The sum Is from 1 to n, the number of generic part categories.

The MIL-HDBK-217G MTBF prediction is 1/∑N_i(λ_G*π_Q*π_R)_I, where π_R is a part reality factor based on Paretos! The reality factor adjusts the equipment reliability predictions by adjusting some parts’ failure fractions in the direction of field failure Paretos.

The MIL-HDBK-217G MTBF prediction causes a relation with the real MTBF, because it uses Paretos of new product’s parts in common with past Paretos. Causation helps make the MIL-HDBK-217G MTBF prediction closer to the real MTBF, because generations of products have similar parts’ Paretos.

Figure 1. Predicted and observed Paretos for two switches: The MIL-HDBK-217G MTBF predictions for the 2400 and 2800 products were close except for motherboard.

Math

Compute the ratio of each part’s failure fraction and its predicted failure fraction,

N_i(λ_G*π_Q)_I/λ_EQUIP, for the equipment observed in the field. If the ratio (“Percent” in table 1) exceeds the observed fraction of equipment failures, Pareto (“P(i)” in table 1) , then π_R = 1. If the ratio is less than P(i) , then reorder the parts so the parts with Pareto greater than predicted failure fraction, on the equipment observed in the field, is greater than 1. Let k denote the number of parts for which this is true, and let n denote the total number of parts on the equipment observed in the field, k < n.

The reality factor for the i-th part type is

$$ \displaystyle \pi_{R}=\left[A^{-1}b\right]_{i}\diagup\left(\lambda_{G}\pi_{Q}\right)_{i} $$

where [A^-1 b]_i is the i-th element of the vector [A^-1 b], and A is the matrix in table 1 and b is the vector [P(1),P(2),…,P(k)] transposed times ∑(λ_G*π_Q)_I where the sum is from k+1 to n. The reality factor adjusts the equipment reliability prediction by adjusting some parts’ failure fractions upward, in the direction of field failure experience. Naturally this increases the equipment reliability prediction.

Table 1. Matrix of the ratios of part failure rates divided by equipment failure rates, P(i)=N_i(λ_Gπ_Q)_i/λ_EQUIP.

A-matrix				b
1-P(1)	-P(1)	Etc.	-P(1)	P(1)∑(λ_Gπ_Q)_i
-P(2)	1-P(2)	-P(2)	-P(2)	P(2)∑(λ_Gπ_Q)_i
Etc.	Etc.	Etc.	Etc.	Etc.
-P(k)	-P(k)	Etc.	1-P(k)	P(k)∑(λ_Gπ_Q)_i

Table 2. Spreadsheet implementation. This is the data before MTBF adjustment. List parts in decreasing order of their Paretos P(i). MTBF is inverse of sum of parts’ N_i(λ_G*π_Q)_i. Percent is each N_i(λ_G*π_Q)_I divided by their sum.

Part	Count	Part failure rate λ_G	N_i(λ_G π_Q)_i	Percent	Pareto P(i)
1	1	1	1	35.09%	40.00%
2	1	0.2	0.2	7.02%	10.00%
3	1	0.1	0.1	3.51%	10.00%
4	1	0.2	0.2	7.02%	10.00%
5	1	0.15	0.15	5.26%	10.00%
6	1	0.25	0.25	8.77%	4.00%
7	1	0.3	0.3	10.53%	4.00%
8	1	0.25	0.25	8.77%	4.00%
9	1	0.15	0.15	5.26%	4.00%
10	1	0.25	0.25	8.77%	4.00%
Sum			2.85	100%	100%
		MTBF	350.9

Table 3. Spreadsheet implementation. Compute the reality factors and consequent adjusted MTBF prediction. The Reality Factor for part-type i is

[A^-1*b]_i/(λ_G*π_Q)_I or 1.0 if the part’s Pareto(i)/Percent(i) is greater than 1.0. The Excel formulas for the “Reality Factors” are MMULT(A27:E27,b)/C6, MMULT(A28:E28,b)/C7, etc… where A27:E27 is the first row of A^-1and C6 is the part 1 failure rate.

Part	λ_G	N_i(λ_G*π_Q)_i	Reality Factor	New total
1	1	1	2.40	2.4
2	0.2	0.2	3.00	0.6
3	0.1	0.1	6.00	0.6
4	0.2	0.2	3.00	0.6
5	0.15	0.15	4.00	0.6
6	0.25	0.25	1.00	0.25
7	0.3	0.3	1.00	0.3
8	0.25	0.25	1.00	0.25
9	0.15	0.15	1.00	0.15
10	0.25	0.25	1.00	0.25
Fail rate	2.85	2.85		6
MTBF	Old	350.9	Adjusted	166.7

Table 4. The A matrix and the b vector. The A matrix is from the parts where Paretos exceed failure rates.

A					b
60%	-40%	-40%	-40.00%	-40.00%	0.48
-10%	90%	-10%	-10.00%	-10.00%	0.12
-10%	-10%	90%	-10.00%	-10.00%	0.12
-10%	-10%	-10%	90%	-10.00%	0.12
-10%	-10%	-10%	-10%	90.00%	0.12
A^-1					*A^-1b**
3	2	2	2	2	2.4
0.5	1.5	0.5	0.5	0.5	0.6
0.5	0.5	1.5	0.5	0.5	0.6
0.5	0.5	0.5	1.5	0.5	0.6
0.5	0.5	0.5	0.5	1.5	0.6

If the formula for the reality factor seems daunting, ask pstlarry@yahoo.com for the spreadsheet. Or send your field Paretos, parts count and reliability predictions for the parts on the equipment. I’ll compute the reality factors and send them back.

Correlation vs. Causation?

Correlation is between random variables such as times to failures, not between an MTBF prediction (number) and an observed MTBF (random variable). An MTBF prediction is not a random variable, even though a sample MTBF is a random variable. (Jain, Werner, and Das gathered predicted and test sample MTBFs to compute the correlation reported in their RAMS article. They did not explain how they computed test sample MTBF.)

The MIL-HDBK-217G MTBF prediction using Paretos is a random variable that depends on the distributions of the Paretos (proportions). It is legitimate to ask for the correlation of the MIL-HDBK-217G MTBF prediction and the subsequent field MTBF. It depends on how well the old Paretos match field Paretos (new) and how much the predicted and field MTBFs depend on known Paretos.

For a one-part product, correlation of two Binomial distributions, Corr(old Pareto, field Pareto)=0. For a two-part product, Corr(Beta(a1, a2), Beta(b1, b2)) can be computed from the BetaBinomial(n2, a, b) distribution, where old Pareto has Beta[a, b] distribution. For more than two-part products, the joint distribution of Paretos are Dirichlet distributions.

Field MTBF = Total Time on Test, TTT/|Failures|where TTT= (∑(Y_i) + (n-r)Y_r)/r, where Yr is the end of test time and n-r is the number of survivors [Teyim Karibo, LinkedIn ASQ RRD post July 2, 2024]. Field MTBF is Limited by Life of Oldest Failure and is biased low. Same is true of test MTBF.

Reliability Prediction is not MTBF Prediction

Reliability is a function, of age, not a number like MTBF. Credible reliability predictions are harder to make than MTBF predictions. Fortunately, credible reliability predictions that incorporate causation resemble age-specific field reliability, because many products, parts, designs, production, shipping, installations, customers, and environments are the same generation after generation.

I proposed using nonparametric, age-specific, field failure rates (not MTBFs) scaled by the ratios of MTBF(new)/MTBF(old) for reliability predictions, such as for the products in figure 1. Nonparametric estimates of field reliability and failure rate functions are available, without lifetime data, even for recurrent processes, from ships and returns counts required by GAAP!

Harold Williams, ASQ Reliability Review editor nicknamed my “Credible Reliability Prediction” monograph as “CRP”. CRP doesn’t predict MTBF, it predicts new-product reliability functions, using observed field reliabilities of parts’, ratios of old and new MTBFs, and new-product reliability block diagram or “structure function”. Fred Schenkelberg invited me to HP to explain CRP to his colleagues.

CRP predicts the age-specific failure rate functions for each component or subsystem: λ(t; i)= λ(t; i; old)*EXP[MTBF(old)/MTBF(new)] because all that are known at time of reliability prediction: λ(t; i; old), observed MTBF(old), and predicted MTBF(new). (Don’t use Paretos in MTBF(old).) This is known as a proportional hazards model. The Credible MTBF prediction is

CRP MTBF = 1/(∑N_i(λ(t; i; old)*(EXP[MTBF(old)/MTBF(new)]*π_Q)_I)

Causation is not correlation, but incorporating Paretos helps make credible MTBF and reliability predictions closer to reality, by incorporating causation.

References

L. L. George, “MIL-HDBK-217G (George),” ASQ Reliability Review, Vol. 12, no. 3, June 1992

L. L. George, “Credible Reliability Prediction,” 2^nd edition, https://drive.google.com/file/d/1vxzrQUQKciZ1uyB1ZF_O-m4VcK6oVZe8/view/, 2023

L. L. George, “User Manual for Credible Reliability Prediction,” https://drive.google.com/file/d/1za5KT_qsF2sCSzGO7xi2EoHONBz2PwtZ/view/, 2023

Kirk Gray, “No Evidence of Correlation: Field failures and Traditional Reliability Engineering,” https://nomtbf.com/2012/02/no-evidence-of-correlation-field-failures-and-traditional-reliability-engineering/, Feb. 2012

Kirk Gray, “No Evidence of Correlation: Field failures and Traditional Reliability Engineering,” Weekly Update, https://fred-schenkelberg-project.prev01.rmkr.net/no-evidence-of-correlation-field-failures-and-traditional-reliability-engineering/,April 21, 2024,

Christopher Jais, Benjamin Werner, and Diganta Das, “Reliability Predictions – Continued Reliance on a Misleading Approach,” 2013 Proceedings Annual Reliability and Maintainability Symposium (RAMS), Orlando, FL, USA, pp. 1-6, Jan. 2013

Milton Ohring, Reliability and Failure of Electronic Materials and Devices; 2nd Edition – October 14, 2014

Fred Schenkelberg, “Field Data and Reliability,” Weekly Update, https://fred-schenkelberg-project.prev01.rmkr.net/field-data-reliability/

Fred Schenkelberg, “Fundamentals of Pareto Analysis,” Webinar, https://fred-schenkelberg-project.prev01.rmkr.net/accendo-courses/accendo-reliability-webinar-series/lessons/quality/topic/fundamentals-of-pareto-analysis/, July 2024

Fred Schenkelberg, “Who are you Fooling with MTBF Predictions?” Accendo Weekly Update, https://fred-schenkelberg-project.prev01.rmkr.net/reliabilty-predictions/, August 2024

US DoT FAA, “System Safety Analysis and Assessment for Part 23 Airplanes,” AC No: 23.1309-lE 11/17/2011