Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / What Price Kaplan-Meier Reliability?

by Larry George 2 Comments

What Price Kaplan-Meier Reliability?

What Price Kaplan-Meier Reliability?

The Kaplan-Meier estimator is the maximum likelihood, nonparametric reliability estimator for censored, grouped lifetime data. It’s traditional. It’s in statistical software. Greenwood’s variance formula is well known. Could Kaplan-Meier be improved: smaller variance, better actuarial forecasts, seasonality, separate cohort variability from reliability? Could you estimate reliability without life data and preserve privacy?

The title comes from Rupert Miller’s “What Price Kaplan-Meier?” article in which Miller compared nonparametric vs. some parametric survival function estimators. His article spawned similar comparisons, unlike this article: nonparametric Kaplan-Meier vs. nonparametric reliability estimators, without lifetime data! 

Lifetime Data for Kaplan-Meier Estimator?

Government and industry standards require lifetime data: 21 CFR821.25, 21 CFR821.30, ISO 14224 [https://fred-schenkelberg-project.prev01.rmkr.net/?s=ISO+14224/], DOD “Guide for Achieving RAM”, and more. Censored lifetime reliability data often comes in cohorts: e.g., periodic cases, ships, or sales, and their corresponding death or failure counts by ages at failures. This data could be a “Nevada” table 1. The Kaplan-Meier (K-M) reliability estimator from table 1 includes cohort randomness as well as randomness in times to failures [Kaplan and Meier]. 

Table 1. Cohort ships and failure data in a Nevada table (18 periods) 

CohortShipsPeriod 1Period 2Period 3Period 4Period 5Period 6Etc.
1471378135 
241 43476 
345  2496 
439   164 
543    26 
641     0 
Etc.        
Sums7111712173728 

Cohort ships and the bottom row of table 1, sums of periodic failure counts, are statistically sufficient to make a nonparametric maximum likelihood reliability estimate (npmle) [George and Agrawal 1973, George 1999]. Ships and returns counts (S&R) are available from sales revenue and service cost data or even spares sales required by generally accepted accounting principles (GAAP). They’re population data. They help separate cohort randomness from randomness in times-to-failures. 

If you have failure counts from 18 cohorts as in table 1, why not make 18 Kaplan-Meier reliability estimators from each cohort? They provide more information than a single Kaplan-Meier estimator. They provide better variance estimates than the Greenwood formula. Compare the likelihoods, information, and variances from:

  • NPMLE from periodic ships (cohorts) and returns (S&R) counts (bottom row of table 1)
  • K-M estimator from survivors at risk and grouped failure counts of same age from all cohorts
  • K-M cohort estimator from each cohort’s size and failure counts by row of table 1

Likelihoods?

The likelihood functions “L(probability|data)” for alternative estimators are:

  • S&R Npmle: L=𝚷Poisson(λ)Poisson(λG(t)) or 𝚷Poisson(λ(t))Poisson(λ(t)G(t)) (Poisson(λ) is the probability distribution of cohort sizes. Poisson(λG(t)) is that of the output of an M(t)/G/infinity service system with lifetime cumulative distribution function G(t) from Poisson cohorts M(t).) This for periodic data of cohort sizes and total returns in period t. 
  • K-M : L=𝚷BINOM.DIST(d(t),r(t),a(t),FALSE) (BINOM.DIST() is an Excel function where a(t) is actuarial, age-specific failure rate function to be estimated.), This is for observed failure times t, r(t) the number of survivors at risk, and d(t) the failures or deaths at observed failure ages t
  • K-M Cohort: L=𝚷𝚷BINOM.DIST(d(t;j),r(t;j),a(t;j),FALSE) Same as K-M, for each cohort j.

The Npmle S&R likelihood function is from ships and returns counts. “Ships” are cohort sizes in column two of table 1. “Returns” are period sums of failure counts regardless of cohort, from the bottom row of table 1. It assumes ships (cohort sizes) have Poisson distribution(s) and consequently so do period sums of failure counts {Mirasol, George and Agrawal]. It’s hard to disprove that each cohort has a nonstationary Poisson(l(t)) distribution from samples of size one cohort (table 1 column 2) [Nelson and Leemis].

The K-M estimator does not use all the information in the Nevada table. It uses cohort sizes and failure counts (of same ages) on the diagonals of table 1, the “traces” of the Nevada table matrix. The Kaplan-Meier reliability estimator R(t) = Π(1-a(s)), s=1,2,…,t where a(s) are actuarial failure rates a(s) = d(s)/r(s) where d(s) counts the deaths of age s on the diagonals, and r(s) is the number of survivors to age s (at risk). The death counts have binomial distributions with failure probability a(s) conditional on survival to age s.

The K-M Cohort likelihood is the same as that of the Kaplan-Meier estimator, for each cohort, multiplied together. It uses the failure counts from each cohort j. The K-M estimates by cohort uses all information in the Nevada table.

An Excel spreadsheet and a VBA program compute the maximum likelihood reliability estimator from ships and returns counts [George and Agrawal, George 1999, George 2019]. Excel’s Solver maximizes likelihoods as functions of probability distributions. The Solver maximum likelihood K-M estimator was the same as from the K-M formula for the K-M estimator. The Solver maximum likelihood reliability estimator from ships and returns was the same as from the spreadsheet estimator.  

Figure 1. Nonparametric maximum likelihood reliability estimates from ships and returns (S&R) and K-M grouped failure counts. Discrete reliability function estimates are constant in between ages at failures. 

 

The nonparametric maximum likelihood reliability estimates from ships and returns (S&R) were the same whether cohort variability was included or not (table 2). Their likelihoods differ because cohort variability was included or excluded.

The K-M estimator by cohort yields 18 nonparametric reliability and actuarial failure rate function estimates for cohorts 1,2,…,18 for ages from 1 to 18,17,…,1 respectively. Excel’s Solver complained “Too many variables” when asked to maximize likelihood for all 18X18/2 = 162 cohort failure rates. I had to constrain them, 1E-12<=a(t)<=0.999999, to prevent probability formulas from blowing up. So I maximized likelihoods in batches of 10 and 8 cohorts and rechecked. 

Table 2. Maximum likelihoods, entropies. “K-L Div” (Kullback-Leibler divergence from the K-M estimate), and “AIC” (Akaike Information Criterion=2*|number of parameters|-ln(L).) 

MethodLikelihoodEntropyK-L Div. from K-MAIC
K-M npmle7.88E-1082.16910282.6
S&R Npmle3.31E-482.58000.6276145.2
K-M by cohort1.27E-220~3.19080668.3
S&R wo cohort1.27E-232.58000.6276536.7

Information?

The results differ from what I expected; I expected K-M to be a better estimator than the reliability estimator from S&R. Likelihood from ships and returns, 3.31E-48, is larger than the K-M likelihood, 7.88E-108, is larger than the K-M by cohort likelihood, 1.27E-220. The likelihood of the K-M by cohort is the product of 18 likelihoods, which explains why it is so small. The entropy of the S&R Npmle is more than the Kaplan-Meier estimator! I.e., it provides more information. The K-L divergence confirms that observation. The K-M Cohort entropy ~3.190 is that of the single semi-nonparametric, proportional hazards, reliability function estimate constructed from 18 individual cohort reliability function estimates in a previous article about seasonal reliability estimation [George 2024]. 

Cohort sizes and period failure counts (ships and returns) seem to contain less information than cohort sizes and grouped failure counts (K-M). K-M grouped failure counts by age contains less information than failure counts by cohort. But the S&R npmle makes the Poisson assumption about cohort size distribution. That’s why the S&R npmle likelihood is greater than K-M likelihood. 

Entropy =-Σp(t)ln(p(t) measures the information in a reliability function estimate, p(t)=R(t)-R(t-1). Using natural “ln” natural logarithm in entropy yields “nats”. Log base 2 would give information in bits. The K-M Cohort entropy is computed from the semi-nonparametric seasonal failure rate function ao(t)EXP[Z1β1+Z2β2] where Z1=1,2,or 3 representing quarterly cohort variation, Z2=Period mod 12, and the β-values are regression coefficients [George April 2024].

The npmle from S&R (ships and returns) contains cohort variability as well as variability in times to failures and censoring. If the cohort variability is removed by using the average cohort size 39.5, then the npmle likelihood is 3.53E-219 is a little more likely than the K-M Cohort, but less than the likelihood of the Kaplan-Meier estimator. 

Variances of Alternative Estimators?

Entropy and AIC measure bits of information contained in reliability estimates. The Kullback-Leibler divergence, Σp(t)ln(p(t)/q(t)), measures bits of additional information provided by one probability distribution estimator p(t) vs. another q(t). Entropy, K-L divergence, and AIC compare alternative data inputs, for marginal analysis: dbits/d$$$. 

Statisticians compare variances of alternative estimators; they define estimator efficiency as VAR(estimator)/Cramer-Rao variance. The Cramer-Rao (C-R) variance is an asymptotic lower bound for maximum likelihood estimators. Greenwood’s formula is a Cramer-Rao bound. The variance of the S&R Npmle is also a Cramer-Rao bound, excluding cohort variability. The “K-M cohort” variances are those of the 18 cohort reliability function estimates. 

Table 3. Variance of reliability estimates: K-M Greenwood, K-M Cohort, and S&R Npmle reliability estimates. K-M reliability estimate was 0 after age 15 periods; the S&R estimate was zero after age 9 periods. 

Age, PeriodsK-M GreenwoodK-M CohortS&R npmle
10.00004770.00058580.0004340
20.00018240.00195540.008396
30.00032630.00326930.003333
40.00039830.0042710.05882
50.00038310.0041080.0001941
60.00029640.004440.05880
70.00020880.001620.03030
80.00009860.0022550.11025
90.00005040.0010010.01110
100.00002330.000261 
110.00002330.000168 
120.000098650.000149 
130.000023290.000167 
140.000015030.000119 
150.000015030.000119 

Greenwood’s variance drastically underestimates variance of the K-M reliability estimator from the data in table 1. The K-M cohort variances are close to the S&R Npmle Cramer-Rao bound on the reliability variances from ships and returns counts, without lifetime data. 

Conclusions?

One example does not prove a theorem, but it should get your attention. Ships and returns counts are essentially free, from data required by GAAP, to estimate reliability for products and parts not tracked by name and serial number from first use to failure.

Wikipedia says, “ Periodic cases and deaths counts are statistically sufficient to make non-parametric maximum likelihood and least squares estimates of survival functions, without lifetime data.” [https://en.wikipedia.org/wiki/Survival_function/]. Cases could be ships, sales, or cohorts, and deaths could be complaints, failures, returns, recoveries, etc. Wikipedia doesn’t say how to do it. 

Periodic cases or ships (cohorts) and failure or death counts preserve privacy, because they do not include peoples’ names or products’ or parts’ serial numbers [NIST]. Kaplan-Meier software usually costs $$$; I give it away. Lifetime data costs $$$. ISO 14224 requires lifetime data and OREDA sells CMMS software to collect it (300+ Euro/year). Auditors make sure you collect required data, but no more than required. 

This article shows ships (cohorts) and returns (failure counts) yield more information than the Kaplan-Meier reliability estimator, with approximately the same variance as K-M Cohort estimators. GAAP require data that can be used to compute ship cohorts and returns counts by age, product, and service part. (Use BoMs and gozinto theory to convert product installed base by age into part installed base by age.) That requires work up front, but ships and returns counts provide sufficient data for population reliability estimates, without lifetime data. Commercial statistics programs don’t do it. Excel (VBA) and R do [George 2019]. Mark Felthauser helped me with the R-scripts to do it..

The Kaplan-Meier estimator is not for recurrent process failure counts [George 2023]. It’s for failure counts of products or parts that stay dead. If you want reliability estimates for recurrent processes where failures may have prior failures, send ships and returns data to pstlarry@yahoo.com and describe it. 

References

21 CFR821.25, Subpart B “Tracking Requirements,” “Device Tracking System and Content Requirements: Manufacturer Requirements,” Aug. 2017

21 CFR821.30 “Subchapter H,” “Medical Devices, Medical Device Tracking Requirements,” April 2016

“DOD Guide for Achieving Reliability, Availability, and Maintainability,” Aug. 2004

Greenwood, M., “The Errors of Sampling of the Survivorship Tables,” Reports on Public Healthand Statistical Subjects, no. 33, London: HMSO. Appendix 1, 1926

Kaplan, E. L. and Paul Meier, “Non–Parametric Estimation From Incomplete Data”. Jour. Amer. Statist. Assn., Vol. 53, pp. 457–481, 1958

Miller, R. G. Jr. “What price Kaplan-Meier?” Biometrics; vol. 39, no. 4, pp. 1077-81, PMID: 6671119, Dec. 1983

Mirasol, Noel M., The Output of an M/G/infinity Queuing System is Poisson,” Operations Research, vol. 11, pp. 282-284, 1963

Nelson, Barry L. and Lawrence M. Leemis, “The Ease of Fitting but Futility of Testing a Nonstationary Poisson Processes From One Sample Path,” Proceedings of the 2020 Winter Simulation Conference, IEEE, 2020

NIST, Paul A. Grassi, Michael E. Garcia, and James L. Fenton, “Digital Identity Guidelines,” 800-63-3, https://doi.org/10.6028/NIST.SP.800-63-3/, June 2017

References by George

George, L. L. and A. Agrawal, “Estimation of a Hidden Service Time Distribution for an M/G/Infinity Service System,” Nav. Res. Log. Quart., vol. 20, no. 3, pp. 549-555 , 1973

George, L. L., “Field Reliability Estimation Without Life Data,” ASA SPES Newsletter, Dec. 1999

George, L. L., Random-Tandem Queues and Reliability Estimation Without Life Data, Random-Tandem Queues and Reliability Estimation, WIthout Life Data – Field Reliability (google.com)/, 2019

George, L. L., “Kaplan-Meier Estimator for Renewal Processes?”, Weekly Update,https://fred-schenkelberg-project.prev01.rmkr.net/kaplan-meier-estimator-for-renewal-processes/#more-531665/, Nov. 2023

George, L. L., “What if Ships Cohorts were Random?” Weekly Update, https://fred-schenkelberg-project.prev01.rmkr.net/what-if-ships-cohorts-were-random/#more-534951, Jan. 2024 

George, L. L., “Do the Best You Can With Available Data?” Weekly Update, https://fred-schenkelberg-project.prev01.rmkr.net/?s=Do+the+best+you+can/, March 2024

George, L.L., “Semi-Nonparametric Reliability Estimation and Seasonal Forecasts,” Weekly Update,https://fred-schenkelberg-project.prev01.rmkr.net/semi-nonparametric-reliability-estimation-and-seasonal-forecasts/#more-546910/, April 2024

Filed Under: Articles, on Tools & Techniques, Progress in Field Reliability? Tagged With: Field data analysis

About Larry George

UCLA engineer and MBA, UC Berkeley Ph.D. in Industrial Engineering and Operations Research with minor in statistics. I taught for 11+ years, worked for Lawrence Livermore Lab for 11 years, and have worked in the real world solving problems ever since for anyone who asks. Employed by or contracted to Apple Computer, Applied Materials, Abbott Diagnostics, EPRI, Triad Systems (now http://www.epicor.com), and many others. Now working on actuarial forecasting, survival analysis, transient Markov, epidemiology, and their applications: epidemics, randomized clinical trials, availability, risk-based inspection, Statistical Reliability Control, and DoE for risk equity.

« Maintenance and Reliability Best Practices: CMMS Selection
Too Much Risk Management Can Hurt the Organization »

Comments

  1. Mai Zhou says

    January 6, 2025 at 1:28 AM

    I am a retired Statistics Professor from University of Kentucky.
    I wrote a paper investigating if the Kaplan-Meier and Nelson-Aalen estimator
    (mean integrated) reach the CR lower bound. The tricky part is to calculate
    the CR lower bound.

    You can see the paper at
    https://www.ms.uky.edu/~mai/research/KMInfoJNS.pdf

    Reply
    • Larry George says

      January 6, 2025 at 10:41 AM

      Thank you for your paper. I saved it and will think about the difference between iid failure times and censoring times vs. periodic cohorts and their failure and censoring times.
      Being an engineer, I computed the cohort variance-covariance and noticed the differences from Greenwood.
      PS I taught at University of Louisville, Speed School of Engineering from 1973 to 1975

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Articles by Larry George
in the Progress in Field Reliability? article series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success
  • 3 Types of MTBF Stories
  • ALT: An in Depth Description
  • Project Email Economics

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy