Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / Statistical Software Problem?

by Larry George 3 Comments

Statistical Software Problem?

Statistical Software Problem?

When a system fails for the first failure in one mode at time t, this data is right censored data for other failure modes! How to estimate reliability functions for all failure modes from first failure data?

Google AI says, “’Competing risks’ refers to a statistical scenario where a subject can experience failure from multiple possible causes, but once one failure occurs, it prevents the observation of any other potential failures, essentially creating “multiple failure modes” that compete with each other to be the first event observed; this means analyzing the probability of a specific failure type needs to account for the possibility of other competing failures happening first.” “Use appropriate statistical methods: Employ statistical models specifically designed for competing risks analysis…” 

Minitab uses the Kaplan-Meier nonparametric reliability estimator for each failure mode [Kaplan and Meier, Schenkelberg]. Using only first failure times for each mode ignores other failure modes’ survival times after being censored by the first part’s failure. The Kaplan-Meier reliability estimates for each failure mode are biased too low. Others have noticed this problem [Mailman School of Public Health].

History of Minitab?

Minitab started in 1972 with a NIST computer program OMNITAB. “The documentation for the latest version of OMNITAB, OMNITAB 80, was last published in 1986, and there has been no significant development since then,” [https://en.wikipedia.org/wiki/Minitab/]. Cofounder and mathematician Barbara Ryan is still CEO of Minitab. Minitab’s scope has broadened into non-statistical subjects such as ERP. Minitab has a nice user interface, which may mislead users in censored, multiple-mode reliability estimation. Minitab is advertising for a “Statistical/Analytics Consultant” in Coventry UK.

Jonathan of Minitab Technical Support, wrote to me, “Minitab does provide a way to analyze multiple failure modes as part of Parametric Distribution Analysis.  You can click the FMode button to enter the failure mode information.  Also, you can click on the Results button and check the box for Display analyses for individual failure modes according to display of results to display full results for each failure mode.” I suspect Minitab estimation problems with multiple-mode parametric and nonparametric distribution choices.

Here’s Some Data!

I estimated nonparametric reliability functions simultaneously from table 1 data for each failure mode. Account for censoring using maximum likelihood and account for dependence by constraining maximization to yield the proportions of first failures in table 2. I compared those maximum likelihood estimates with the independent Kaplan-Meier estimates for each failure mode from the first failure time and mode data.

Table 1. Data are ages at first failure mode detected at times of overhaul. 

UnitOverhaul HoursFailure Mode
8031995Gears
7931677Camshaft
7831597Head
7731205PC
7631087Bearings
Etc.  
610394PC
510317Head
48970PC
38900Camshaft
28553Gears
18042Gears

Table 2. Proportions of each failure mode as first failure out of 80 first failures

BearingsCamshaftGearsHeadPCCrankshaft
0.05000.21250.22500.16250.31250.0375

What’s Wrong?

The Kaplan-Meier estimator assumes the lifetime data are independent, identically distributed samples of failures and survivors’ ages. The Kaplan-Meier likelihood function is 

∏[(a(t)d(t)*(1-a(t))n(t)-d(t)]*COMBIN(d(t),n(t)) = ∏[(a(t)d(t)*(1-a(t))n(t)-d(t)]*n(t)!/(d(t)!*(n(t)-d(t)!),

where a(t) are actuarial failure rates conditional on survival to age t, d(t) counts the single-mode failures at age t, and n(t) counts the other mode survivors to age t. The product is over all observed failure ages t=1,2,…,80. This likelihood function is the product of terms representing the probability of d(t) survivals to failure at age t out of n(t) survivors up to age t. It does not account for multiple failure modes and first failures.

Multiple failure modes and first failure times, data are statistically dependent. The likelihood function for multiple-mode first failures at age t (table 1, Bearings for example) is the product of terms such as

p(t; Bearings)*∏(1-F(t; mode),  

where p(t; Bearings) denotes the probability density function of bearings’ ages at failures times t, 1-F(t; mode) denotes the survival of the other parts, and the product ∏(1-F(t; mode)) is over all other failure modes (Gears, Camshaft, Head, PC, and Crankshaft). I.e., the likelihood function is 

∏(p(t; mode k)*∏(1-F(t; mode j)) j not equal to k, for first-failure times t=1,2,…,80.

Each multiplicand of the likelihood function is the probability of the first part failing at age t AND all other parts surviving past age t. The difference from Kaplan-Meier likelihood is that (this multiple-mode likelihood represents) all other parts’ failures or failure modes occur at ages older than t. 

Excel Solver Maximizes Log Likelihood

I made an Excel spreadsheet with columns for each failure mode (Gears, Bearings, Camshaft, Head, PC, and Crankshaft) and rows for each of the 80 failure times in table 1. Each cell entry contains, for example

p(t; Bearings)*(1-F(t; Gears))*(1-F(t; Camshaft))*(1-F(t; Head))*(1-F(t; PC))*(1-F(t; Crankshaft)).

Then I multiplied row cells together in another column. The likelihood is the product of each row’s products. It’s value was too small for Excel computation, so I summed log likelihoods for maximization by varying cell contents p(t; mode) (and consequently F(t;mode)=∑p(s;mode), s=1,2,…,80). Constrain the Excel Solver maximization to equal make the sum of p(t;mode) estimates equal to the parts’ observed failure mode proportions in table 2, to represent the dependence among failure modes. Figure 1 shows the maximum likelihood reliability function estimates for each failure mode.

Plot of maximum likelihood reliability curves for each of the size components individually

Figure 1. Maximum likelihood reliability estimates account for multiple failure modes. 

Notice some approximately annual 11,000 hour, 22,000, and 33,000-hour changes in the reliability estimates? (An average year is 8766 hours, and a year plus one quarter is 10,957.5 hours.) These changes may be failures detected on overhauls. (I confess these estimators in figure 1 should show step functions changing at observed first failure times. It’s awkward to make Excel show step functions instead of sloping lines between data points.) 

Compare with Kaplan-Meier?

Kaplan-Meier failure mode reliability estimates do not show approximately annual behavior and underestimate reliability, because they presume independence and ignore survivals after first failure. The 80 failure times in the original data are for 80 units’ first-failure ages and all the other parts survive. Figures 2-6 show that the Kaplan-Meier estimates are pessimistic compared with the maximum likelihood estimates that include terms for first failure times and survival of all other failure modes.

Minitab’s use of the Kaplan-Meier estimator ignores the dependence among the failure modes represented by their proportions of the data, table 2, even though it counts survivors out of 80 in other failure modes. The distributions of ages at first failures are dependent, and ignoring that dependence biases reliability estimates, parametric or nonparametric. This applies to parametric distribution estimates as well as nonparametric [Reliawiki]. Others have used failure mode as a factor in proportional hazards models using the Kaplan-Meier proportional hazards estimator in R-package “Survival” [Rao].  

plot or reliability for gears using maximum likelihood and Kaplan-Meier approaches, and hte KM approach shows a lower reliability values then using MLE

Figure 2. Gears maximum likelihood and GearsKM Kaplan-Meier reliability estimates

reliability plot of Camshaft using MLE and KM - again showing KM with lower reliability for the same data.

Figure 3. Camshaft maximum likelihood and CamshaftKM Kaplan-Meier reliability estimates. Vertical axis reliability scale is from 0 to 1.

reliability plot of Bearings using MLE and KM, again KM has lower reliability

Figure 4. Bearings maximum likelihood and BearingsKM Kaplan-Meier reliability estimates

reliability plot of PC using MLE and KM, again with KM showing less reliability

Figure 5. PC maximum likelihood and PCKM Kaplan-Meier reliability estimates. Vertical axis, reliability scale in now 0 to 1.0

reliability plot of head data using MLE and KM, and KM results in lower reliability values

Figure 6. Head maximum likelihood and HeadM Kaplan-Meier reliability estimates. Vertical axis, reliability scale in now 0 to 1.0

Recommendations

Don’t use the Kaplan-Meier estimator from cohorts of installed base and grouped failure counts when cohort sizes vary over time. Don’t use Greenwood’s variance estimate [Greenwood, George 2022-2024].  

Don’t use Minitab’s Kaplan-Meier estimator for nonparametric multiple-mode, censored reliability estimation. The Kaplan-Meier estimator is pessimistic for first-failure times in multiple-mode reliability estimation, because it does not account for dependence and the subsequent survival in other failure modes.  

Prevent innocent misuse of Minitab. Techsupport@minitab.com says, “We appreciate your feedback and are always interested in ideas for improving the software!” Maybe I can get that job in Coventry.

 References by George

“Fred’s Bicycles and Kaplan-Meier Error,” Nov. 2024

“Kaplan-Meier Ignores Cohort Variability,” Oct. 2024

 “What Price Kaplan-Meier?” Jan. 2022

 “Variance of the Kaplan-Meier Estimator?” March 2023

 “Covariance of Kaplan-Meier Estimators?” March 2023

References

Greenwood, M., “The natural duration of cancer. Reports on Public Health and Medical Subjects,” Vol. 33, pp. 1–26, His Majesty’s Stationery Office, London, 1926

E. L. Kaplan and P. Meier, ”Nonparametric Estimator From Incomplete Observations,” J. Amer. Statist. Assn., Vol. 53, pp. 457-481, 1958

Mailman School of Public Health, “Competing Risk Analysis,” Columbia U. Irving Medical Center, https://www.publichealth.columbia.edu/research/population-health-methods/competing-risk-analysis#/

Reliawiki, “Competing Failure Modes (CFM) Analysis,” https://www.reliawiki.com/index.php/Competing_Failure_Modes_Analysis/, Sept. 2023

Shishir Rao, “Competing Risks in Failure Time Data,” https://fred-schenkelberg-project.prev01.rmkr.net/competing-risks-in-failure-time-data/, Nov. 2024

Fred Schenkelberg, “Kaplan-Meier Reliability Estimator,” Accendo Weekly Update, https://fred-schenkelberg-project.prev01.rmkr.net/kaplan-meier-reliability-estimator/#more-25994/ 2019

Filed Under: Articles, on Tools & Techniques, Progress in Field Reliability?

About Larry George

UCLA engineer and MBA, UC Berkeley Ph.D. in Industrial Engineering and Operations Research with minor in statistics. I taught for 11+ years, worked for Lawrence Livermore Lab for 11 years, and have worked in the real world solving problems ever since for anyone who asks. Employed by or contracted to Apple Computer, Applied Materials, Abbott Diagnostics, EPRI, Triad Systems (now http://www.epicor.com), and many others. Now working on actuarial forecasting, survival analysis, transient Markov, epidemiology, and their applications: epidemics, randomized clinical trials, availability, risk-based inspection, Statistical Reliability Control, and DoE for risk equity.

« The Science of Teams and Teamwork
Risk Management Lessons from the Ukraine War »

Comments

  1. André-Michel Ferra says

    December 31, 2024 at 8:55 AM

    Always a pleasure to read from you Larry. Thanks for your generosity and will to share your knowledge! Happy new year!

    Reply
    • Larry George says

      December 31, 2024 at 10:49 AM

      Thanks. I dislike being critical of statistical software, but… Jonathan, of Minitab Technical Support, today emailed me that he forwarded the article to developers for their consideration. Let me know if you would like the spreadsheet example that I will describe to readers, if Shishir Rao, Suncor, Calgary, allows.

      Reply
  2. Larry George says

    January 10, 2025 at 1:57 PM

    “Understanding underlying principles is critical to be able to interpret results and troubleshoot issues…If the only way you know how to deal with data science problems is to use a piece of software, then you’re going to be replaced with a piece of software.” Brian Conrad, Stanford Mathematician, in “Algebra Strikes Back,” California magazine, Fall-Winter 2024

    Reply

Leave a Reply to André-Michel Ferra Cancel reply

Your email address will not be published. Required fields are marked *

Articles by Larry George
in the Progress in Field Reliability? article series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • Leadership Values in Maintenance and Operations
  • Today’s Gremlin – It’ll never work here
  • How a Mission Statement Drives Behavioral Change in Organizations
  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy