Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / Help! They Lost the Data

by Larry George Leave a Comment

Help! They Lost the Data

Help! They Lost the Data

What can we do without reliability function estimates? FMEA? FTA? RCA? RCM? Argue about MTBFs and availability? Weibull? Keep a low profile? Run Admirals’ tests? Look for a new, well-funded project far from the deliverable stage? 

Ask for field data; there should be enough to estimate reliability and make reliability-based decisions, even if some data are missing. Field data might even be population data!

Data Saga

I wanted to estimate the reliability and failure rate functions, for reasonable ages t, for all the hematology business unit’s products and parts. Textbooks say, estimate reliability functions from random samples of ages-at-failures {T(1), T(2),…,T(r)} and survivors’ lives. As usual, I didn’t have ages-at-failures. 

I could estimate failure rate function from ships and parts’ failure counts required by GAAP using the methods in “How Can You Estimate Reliability Functions Without Life Data?”, https://fred-schenkelberg-project.prev01.rmkr.net/?s=tribus/, and https://sites.google.com/site/fieldreliability/.

In February I submitted a request to the MIS department for the hematology business unit’s products’ and parts’ installed base and failure counts. The MIS department “prioritized” my request. 

In July Eric from MIS said he’d start working on my request. He asked if it would be OK to give numbers failed at each age, in months 1-24? That’s grouped age-at-failure data. I thought, “Why do work so hard? I could use the Kaplan-Meier nonparametric reliability estimate on ages-at-failures, at least up to age 24 months.” I thanked Eric, grateful for anything. 

In October, Frank from MIS offered data by ”PRODCODE”, “SER#”, “PN”, “DESC”, “TRANSDATE”, “FLAGS”. In addition to failure counts at ages 1-24, Frank offered total failures at all ages greater 24 months grouped into the 25th month. “PRODCODE” and “TRANSDATE” indicated many products had been in service longer than 24 months, with some parts’ failures, usually for the first time. (Automotive aftermarket stores save parts’ sales data for two years, without parts’ ages-at-failures. They’re renewals or replacement parts https://fred-schenkelberg-project.prev01.rmkr.net/renewal-process-estimation-without-life-data/.)

Reliability Estimation from Grouped Life Data is Easy 

The installed base and failure data for months 1-24 go into a “Nevada” table for grouped failure data, https://fred-schenkelberg-project.prev01.rmkr.net/nevada-charts-gather-data/. I used the Kaplan-Meier nonparametric reliability estimator for ages 1-24, and Greenwood’s formula for variances (covariances are approximately zero!). I could forecast replacement requirements, recommend parts’ stock levels, do diagnostics, and make credible reliability predictions for new products from similar, old parts’ reliability estimates, for all 2537 hematology business unit’s parts. 

What should I do with the failures grouped into month 25, from products or parts older than 24 months? Who cares? Me! Why? That’s additional information! I wanted to detect premature wearout, which indicates possible design defect. (Failure rate function increases.) I also wanted to detect retirement so I don’t get stuck with obsolescent spares. (Failure rate function decreases.)

To forecast replacement requirements, I needed to estimate or extrapolate the failure rate function for ages greater than 24 months, because some products and their parts have ages greater than 24 months. 

Failure Rate Function Extrapolations

When I have had no information about older failures, I have extrapolated failure rate function estimates by regression. But Frank told me how many failures occurred after age 24 months, just not when. Why not extrapolate by maximizing likelihood, 

PRODUCT[(1-R(t))r(t)R(t)(n(t)-r(t)); t=1,2,…oldest],

where R(t) is the reliability function, r(t) is the number of failures of age t, and n(t) is the installed base of age t including ages beyond 24 months? That’s what the Kaplan-Meier estimator does, except that all I know is n(t), t=1,2,…,oldest and r(?) the sum of all failures at ages greater than 24 months.

How to model failures older than 24 months? Constant failure rate? Linear? Other? The choice should depend on how the failure rate function looks before age 24 months, the number of failures older than 24 months, and your experience. Wait, you say! Couldn’t the failure counts older than 24 months change the earlier reliability estimates, ages 1-24? Nope, maximizing log likelihood maximizes a sum by maximizing each summand. I checked reliability estimates; no difference. That’s enough proofiness for me. 

Constant Failure Rate? For older parts, make expected deaths older than 24 months equal the observed and reported sum r(?) of failures at ages greater than 24 months, by choice of a constant (actuarial) failure rate “a(25)” estimate = failures per month/number exposed; i.e.,

a(25) = r(?)/SUM[(t–24)*[N(t)–a(25)*E[N(t)]); t=25, 26,…,oldest], 

where N(t) is the ships in month t=25,26,…,oldest, and E[N(t)] is the average ships per month. Expected failures are

 SUM[N(s)*p(s)]*PRODUCT[(1–SUM[p(t)])/R(24)],

where the sum and the product run from s and t = 25 to the age of the oldest product, N(s) is the number shipped s months ago, p(s) is the probability age at failure is s months, and R(24) = P[life > 24] = 1 – SUM[p(t); t = 0, 1,…,24]. Set Expected failures equal to observed with a(t) = a(25) for all ages t > 24, where a(t) = p(t)/R(t), the conditional probability of failure in the next month given survival to age t.Table 1 Example: Constant failure rate for parts in a product 32 months old: The E[deaths] column is the actuarial failure rate a(25) times the numbers of survivors, and the survivors column is Ships N(t) minus E[failures] r(t). The last column a(25) is r(?) divided by the sum of the t*sum[N(t)] column.

Table 1

Other Failure Rate Models: Maximum likelihood chooses fractional ships after 24 months of age, constrained to equal the reported failure count after 24 months r(?), to make nonparametric estimates of the reliability and failure rate functions for ages up to the oldest unit in the installed base. 

I used Excel Solver to maximize likelihood; Excel blew up for the “Unconstrained” alternative, so I manually entered 1 failure in month 30 or “Limited” the failure rate to prevent #NUM! error. The maximum likelihood (lnL in Table 2) was achieved by the “Unconstrained” alternative with one failure in month 30. The failure rates indicate there was wearout, because the “Limited” and “Linear” alternatives also showed increasing failure rates.

Table 2 Example. Data are from some US postal service machines. There was 1 failure in months 25-30. Alternative failure rate models are: unconstrained, constant, limited, and linear. The alternatives postulate fractional failures at ages 25-30, and Solver maximizes log-likelihood (lnL) for reliability and failure rate function estimates. The constrained maximum likelihood failure rate estimates are in the last four columns.

Table 2

Free offer

These examples are not the only problem I’ve seen with grouped data. A sterile glove company’s [Terumo] customers batch failures and send them back whenever they feel like it. Imagine grouped failure counts with reporting delays so that the most recent counts are obviously under-reported [ReliaSoft]. Imagine sell-through time, the time from reported sale until first use [hematology business unit].   

If you have a problem with grouped failure counts, send pstlarry@yahoo.com your installed base by age and grouped ages at replacements, and I’ll send back the Kaplan-Meier estimate of reliability function, Greenwood’s estimator of its variance, estimate of the failure rate function, and alternative maximum likelihood estimators for the older, grouped data. 

Filed Under: Articles, on Tools & Techniques, Progress in Field Reliability?

About Larry George

UCLA engineer and MBA, UC Berkeley Ph.D. in Industrial Engineering and Operations Research with minor in statistics. I taught for 11+ years, worked for Lawrence Livermore Lab for 11 years, and have worked in the real world solving problems ever since for anyone who asks. Employed by or contracted to Apple Computer, Applied Materials, Abbott Diagnostics, EPRI, Triad Systems (now http://www.epicor.com), and many others. Now working on actuarial forecasting, survival analysis, transient Markov, epidemiology, and their applications: epidemics, randomized clinical trials, availability, risk-based inspection, Statistical Reliability Control, and DoE for risk equity.

« Ten Ways to Improve Your Measurement Systems Assessments
Maintenance and Reliability Maturity – 2 »

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Articles by Larry George
in the Progress in Field Reliability? article series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success
  • 3 Types of MTBF Stories
  • ALT: An in Depth Description
  • Project Email Economics

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy