Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / Kaplan-Meier Ignores Cohort Variability!

by Larry George Leave a Comment

Kaplan-Meier Ignores Cohort Variability!

Kaplan-Meier Ignores Cohort Variability!

The Kaplan-Meier reliability estimator is the nonparametric, maximum likelihood estimator from right-censored, grouped lifetime data. It has been used since publication, most statistics programs do it, and it has been taught since I was in school. I give away a spreadsheet version. 

Lifetime data requires tracking individual subjects or units from their start to failure, death, or censoring. Data may be collected periodically grouped by cohorts: monthly sales, ships, or other collections of individuals, subjects, or units and each cohort’s lifetimes. Data could be displayed in a “Nevada” table with random cohorts in one column, and each cohort’s lifetimes grouped in periodic age-at-failure intervals in columns to the right [Schenkelberg].

The Kaplan-Meier estimator ignores cohort variability, as if the only variability is due to random ages-at-failures. You need to quantify cohort variability too, if you want to forecast future cohorts and failures or construct legitimate confidence intervals or bands on reliability estimates. 

Maximizing likelihood produces estimates of the ships cohort rate λ and the Kaplan-Meier actuarial failure rates a(t) (conditional on survival to age t). These estimates are statistically dependent, because the ships cohorts appear in the denominators of the actuarial rate estimates! Dependence appears in the distribution of actuarial forecasts, ∑a(t)n(t-s), s=1,2,…,t,  where n(t-s) is the number of survivors to age t-s. Dependence also explains why the Greenwood variance estimator of the Kaplan-Meier reliability estimates is so bad when cohorts vary in size.

You may download the two CSV files, KMCohSR.csv and KMTabl1.csv, containing all Nevada Table 1 data and ships (column 1) and returns (bottom row). Readers could use the files to:

  1. Verify the nonparametric reliability estimates from all 18 periods of Nevada Table 1 data, with and without cohort variability, and from ships and returns data. 
  2. Estimate the variance-covariance matrixes of the Kaplan-Meier estimator with and without cohorts, and the variance-covariance matrix from the ships and returns estimator.
  3. Construct pointwise reliability confidence interval on reliability at age 19, R(19). Construct confidence band on R(t), t=1,2,…,18. 
  4. Make actuarial forecasts for period 19 from table 1 and ships and returns and estimate their distributions with and without cohort variability.
  5. Use time series analysis or artificial intelligence to do 1, 2, 3 and 4. Compare.  

Separately estimate cohort variability?

The Kaplan-Meier likelihood function is the product of the probabilities of periodic failures and survivors,

∏a(t)d(t)(1-a(t))n(t)-d(t)

where d(t) is the number of failures (deaths) at age t (or in age-interval t). The product is over all periods up to the age of the oldest survivors or failures. The Kaplan-Meier reliability estimate is R(t)=EXP[-∑a(s)], s=1,2,…,t. 

Table 1. Nevada table of periodic ships (cohorts) and grouped lifetime counts in each row from each cohort, whether failed or censored. It called a Nevada table because lifetime counts in the body of the table look like Nevada on its side. Data continued through 18 periods.

Period Ships Period 1 Period 2 Period 3 Period 4 Period 5 Period 6 Etc.
1 47 1 3 7 8 13 5  
2 41   4 3 4 7 6  
3 45     2 4 9 6  
4 39       1 6 4  
5 43         2 6  
6 48           1  
Etc.                
Sums   1 7 12 17 37 28  

Table 1 ships column seems that cohorts could be random. Imagine that a product’s or part’s lifetime is the service time in a self-service system. Imagine periodically observing inputs to and outputs from an M/G/infinity (self-service queuing system). “M” denotes Poisson inputs at rate λ (“ships”), “G” denotes the service time cumulative distribution function G(t), and “infinity” means self-service.

The likelihood function for the table 1 data, with ships cohorts that have the same stationary Poisson probability distribution and the same actuarial failure rates a(j), is 

∏∏(λs(i)Exp[-λ]/s(i)!)a(t)d(t)(1-a(t))n(t)-d(t)

where λ is the ships rate per period, and s(i) is the size of cohort i. Notice the Poisson probability density function left of the Kaplan-Meier likelihood terms? The products are over all cohorts i and all periods t up to the ages of the oldest survivors or failures from each cohort. If the ships cohort rates could be different Poisson processes for each cohort, then the likelihood function is

∏∏(λ(i)s(i)Exp[-λ(i)]/s(i)!)a(i; t)d(i; t)(1-a(i; t))n(i; t)-d(i; t).

“It’s nearly impossible to disprove the arrivals of cohorts to an M(t)/G/∞ service system) are a nonstationary Poisson processes, from samples of size one in each period.” [Nelson and Leemis]

The maximum likelihood reliability estimates from the likelihood function that includes stationary Poisson ships cohorts differs slightly (“ML R(t) Poisson”) from the Kaplan-Meier estimator (“K-M R(t)”) in figure 1. It’s from an Excel workbook Solver to maximize likelihood. Their difference could be attributed to Excel or Solver math. The maximum likelihood estimator of λ is the average of ships cohorts: 39.5 vs. Excel Solver’s 39.81 for both the stationary λ and the nonstationary λ(t).

Figure 1. Compare the Kaplan-Meier estimate (“K-M R(t)”) vs. the nonparametric maximum (log(2)) likelihood reliability estimate with Table 1 cohort data (“ML R(t) Poisson”)) vs. the reliability estimate from the bottom row returns (“S&R T(t)”). Technically nonparametric estimates are step functions, but it’s easier to see differences in smooth functions. 

The S&R R(t) reliability estimate in figure 1 is the nonparametric maximum likelihood estimate from cohort sizes and the sums of periodic returns in the bottom row of table 1. These row sums and the cohort sizes are field reliability data required by GAAP (revenue and service costs) and they are population data. The Kaplan-Meier estimator may be using sample data. 

The likelihood function is 

∏(λs(i)Exp[-λ]/s(i)!)λG(i))d(i)Exp[-λG(i)]/d(i)!

Where G(i)=1-R(i) (1-reliability) and d(i) are the deaths (outputs) in period i. The S&R R(t) estimate follows both the K-M R(t) and ML Poisson R(t) estimates for some ages. It accounts for cohort variability and reliability separately. 

The Kaplan-Meier reliability estimator is OK if all you want is a nonparametric reliability estimate. But, if you want to account for all random sources in forecasts and confidence limits, then you need to account for cohort variability too. 

Simple example: two period cohorts

The likelihood for two period cohorts using Nevada table data is 

(λs1Exp[-λ]/s1!)(a1)d1(1-a1)n1-d1(λs2Exp[-λ]/s2!)(a2)d2(1-a2)n2-d1

To maximize likelihood and estimate unknown λ, a1, and a2, set derivatives with respect to the unknowns equal to zero and solve for the unknowns. Equivalently, maximize log likelihood,

s1Ln(λ)-λ-Ln(s1!)+d1Ln(a1)+(s1-d1)(Ln(1-a1)+s2Ln(λ)-λ-Ln(s2!)+d2Ln(a2)+(n2-d2)Ln(1-a2),

where n2 = s1+s2-d1.

The maximum likelihood estimators are: λ=(s1+s2)/2 (average cohort size), a1=d1/s1, and a2=d2/(s1+s2-d1). Those are the Kaplan-Meier estimators. Notice the formulas depend on cohort sizes s1 and s2?

If the ships cohort rates differ, then the maximum likelihood estimators of λ(1) and λ(2) are s1 and s2. The actuarial rate estimates of a1 and a2 remain are the Kaplan-Meier actuarial rate estimates. The likelihood with random cohorts is a little greater than that of the Kaplan-Meier likelihood.

The variance of a sum of two random variables, Var[X+Y], is Var[X]+Var[Y]+2*Cov[X,Y]. So the variance of an actuarial forecast involves covariances of actuarial rate estimates a(t) as well as the variances of survivors n(t-s). I compute the variance-covariance matrix of the actuarial rate estimators from individual cohort reliability estimates, the variances of n(t-s) from the Poisson distributions of ships s(i), i=1,2,…,t-s, and the variance of hindcasts, ∑a(t)n(t-s), s=1,2,..,t, from the Poisson ships and the binomial distributions of failures or deaths. 

Failure forecasts’ distributions require extrapolating ships cohorts s(t) and computing or simulating the distribution of ∑a(t)n(t-s) for future t beyond oldest censored or failed unit. Confidence bands also require accounting for covariances [Hall and Wellner]. AFM 66-1 assumes Poisson distributions, others assume normal distributions. It’s easier to extrapolate time series or use artificial intelligence, if you ignore some information contained in the Nevada table!

Conclusions?

Use the Kaplan-Meier reliability estimate with grouped ships cohorts and their failures even if cohort sizes are random. Random cohort sizes affects actuarial forecasts and reliability confidence limits. Don’t trust the Greenwood variance estimator and don’t trust the confidence limits or bands from statistical software. Some confidence bands may be pointwise confidence limits based on Greenwoods’ variance. 

If cohort sizes are a sample from a stationary Poisson process, then the Poisson rate parameter estimate is the average cohort size. If cohort sizes are non-stationary, then the maximum likelihood cohort rate estimates are the observed ships cohort sizes, and the maximum likelihood reliability estimator is the Kaplan-Meier estimator. But the reliability estimator variance is not the Greenwood estimator, because the Kaplan-Meier actuarial rate estimator ignores random cohort sizes!

I used Mathematica to verify the Kaplan-Meier formulas for the maximum likelihood. I used Excel’s Solver and Mathematica to maximize likelihood functions: Excel Solver is not perfect and Mathematica would only maximize one or two period problems! 

I estimate the Kaplan-Meier variance-covariance matrix from each cohort’s Kaplan-Meier estimates. (“Variance of the Kaplan-Meier Estimator”, https://fred-schenkelberg-project.prev01.rmkr.net/variance-of-the-kaplan-meier-estimator/#more-508828/, and “Covariance of the Kaplan-Meier Estimators”, https://fred-schenkelberg-project.prev01.rmkr.net/covariance-of-the-kaplan-meier-estimators/#more-509950/). If you want to account for cohort variability in forecasts and reliability confidence intervals, send your Nevada table data to pstlarry@yahoo.com.

References

AFM 66-1, “Maintenance Management, Vol. 1,” Washington, DC, August 1972 (US Air Force Logistics Command) 

L. L. George, “Kaplan-Meier Reliability Estimation Spreadsheet,” ASQ Reliability Review, Vol. 25, No. 2, pp. 6-12, June 2005, (2017 revision available from pstlarry@yahoo.com)

L. L. George and A. Agrawal, “Estimation of a Hidden Service Distribution of an M/G/∞ Service System,” Naval Research Logistics Quarterly, vol. 20, pp. 549-555, https://doi.org/10.1002%2Fnav.3800200314, 1973

Greenwood, M., “The natural duration of cancer. Reports on Public Health and Medical Subjects,” Vol. 33, pp. 1–26, His Majesty’s Stationery Office, London, 1926

W. J. Hall and Jon A. Wellner, “Confidence Bands for a Survival Curve from Censored Data,” Biometrika, Vol. 67 No. 1, pp.133-143, Apr. 1980

E. L. Kaplan and P. Meier, ”Nonparametric Estimator From Incomplete Observations,” J. Amer. Statist. Assn.,Vol. 53, pp. 457-481, 1958

Nelson, Barry L. and Lawrence M. Leemis, “The Ease of Fitting but Futility of Testing a Nonstationary Poisson Processes From One Sample Path,” Proceedings of the 2020 Winter Simulation Conference, IEEE, 2020 

Fred Schenkelberg, “Nevada Charts to Gather Data,” https://fred-schenkelberg-project.prev01.rmkr.net/nevada-charts-gather-data/, March 2016

Jon A. Wellner, “Notes on Greenwood’s Variance Estimator for the Kaplan-Meier Estimator,” Greenwood.pdf (washington.edu), Jan. 2010

Filed Under: Articles, on Tools & Techniques, Progress in Field Reliability?

About Larry George

UCLA engineer and MBA, UC Berkeley Ph.D. in Industrial Engineering and Operations Research with minor in statistics. I taught for 11+ years, worked for Lawrence Livermore Lab for 11 years, and have worked in the real world solving problems ever since for anyone who asks. Employed by or contracted to Apple Computer, Applied Materials, Abbott Diagnostics, EPRI, Triad Systems (now http://www.epicor.com), and many others. Now working on actuarial forecasting, survival analysis, transient Markov, epidemiology, and their applications: epidemics, randomized clinical trials, availability, risk-based inspection, Statistical Reliability Control, and DoE for risk equity.

« How to Succeed at RCM (Part 3): Successful Planning
Can Starting with Waterfall Lead to Better Agile? »

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Articles by Larry George
in the Progress in Field Reliability? article series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success
  • 3 Types of MTBF Stories
  • ALT: An in Depth Description
  • Project Email Economics

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy