Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / Statistical Reliability Control?

by Larry George Leave a Comment

Statistical Reliability Control?

Statistical Reliability Control?

The (age-specific or actuarial) force of mortality drives the demand for spares, service parts, and most products. The actuarial demand forecast is Σd(t‑s)*n(s), where d(t-s) is (age-specific) actuarial demand rate and n(s) is the installed base of age s, s=0,1,2,…,t. Ulpian, 220 AD, made actuarial forecasts of pension costs for Roman Legionnaires. (Imagine computing actuarial demand forecasts with Roman numerals.) Actuarial demand rates are functions of reliability. What if reliability changes? We Need Statistical Reliability Control (SRC).

Actuarial demand forecasts require updating as installed base and field reliability data accumulates. Actuarial failure rate function, a(t), is related to reliability function, R(t), by a(t) = (R(t)-R(t-1))/R(t-1), t=1,2,… If products or parts are renewable or repairable, then actuarial demand rate function, d(t), depends on the number of prior renewals or repairs by age t [George, Sept. 2021].

Actuarial demand forecasts are credible because actuarial rates are based on field data. Grouped failure inputs make actuarial rates (derived from the Kaplan (KM) estimator) credible because KM estimator is a maximum likelihood reliability estimator. Nonparametric actuarial rate estimators from ships and returns counts are credible, because they are derived from data required by Generally Accepted Accounting Principles, and because the estimates are either maximum likelihood or least squares population estimators [George, 1993 and April 2021]. … credible as long as the actuarial rates don’t change too much.

A Modest Proposal

Statistical process control (SPC) or statistical quality control (SQC) should be applied to reliability function estimates (Weibull from lifetime data, KM estimator from grouped cohort failure counts, or nonparametric estimators from population ships and returns counts). Statistical Reliability Control (SRC) naturally follows SPC and SQC but SPC and SQC apply to proportions failing to meet specifications, not to reliability functions.

Imagine Statistical Reliability Control (SRC) on successive reliability function estimates such as broom charts. Broom charts are graphs of successive reliability estimates as cohorts of ships add to the population. Broom charts could come from each cohort of Kaplan-Meier (KM) grouped failure counts or from nonparametric reliability estimates from successively larger data sets as ships and returns accumulate. 

Kullback-Leibler (KL) divergence is the relative entropy difference between two successive reliability function estimates [Wikipedia]. KL divergence converts differences in functions into a single numbers for SRC. 

Imagine the KL divergence jumps from one period or subset to the next? What happened? There’s information in that jump; where did the jump come from? There are other divergence measures. I prefer KL divergence, because it may be related to the reliability change. Imagine if the product was software; the KL divergence may be related to code changes. I would also compare the KL divergence to the entropies in successive reliability estimates before and after suspicious KL divergence. What else could be done? Look at high-entropy reliability or failure rate functions for infant mortality, warranty returns or PM glitches, premature wearout, seasonality, and retirement. 

Time Series Analysis (TSA) demand forecasting extrapolates past demand data such as for commodities. TSA searches for the times series model that wins a “tournament”, a contest between different models. The winner depends on the training data or subset of the data. TSA forecasting doesn’t account for changes in installed base, attrition, or force of mortality, even though it models past demand data. 

Actuarial forecasts have no such ambiguity. Actuarial forecasts use all information in available data: life data, grouped lifetimes, or ships and returns counts. Walter Shewhart Rule number one says, “Original data should be presented in a way that will preserve the evidence in the original data for all the predictions assumed to be useful.”  Nonparametric estimates preserve all information in data. Maximum likelihood estimates yield the most likely actuarial failure rate function estimates. Least squares yields the estimates that minimize the squared difference between observed and expected returns. Gauss Markov theorem says least squares yields the least sample variance among linear unbiased estimators. (An actuarial forecast is a linear function of installed base n(s).) 

Actuarial demand forecasting works with other business processes. Accurate spares sales and service forecasts are fundamental inputs to a manufacturing company’s production planning, inventory, and production maintenance processes. Designers, process engineers, trainers, customers, and service people welcome feedback from reliability and failure rate function estimates and forecasts. Warranty reserves are functions of failure or returns forecasts times costs of returns. Recalls, warranty, and end-of-life plans depend on reliability forecasts. Corporate valuations depend on forecasts of the future streams of sales revenue and warranty, service, recall, and end-of-life costs.

Eyeball SRC on EEPROM Reliability Functions?

Figure 1. EEPROM reliability broom chart by ships cohort. I.e., 99-1 is ships in first quarter of 1999.
Figure 1. EEPROM reliability broom chart by ships cohort. I.e., 99-1 is ships in first quarter of 1999.

Figure 1 is computed from each cohort of grouped ships and failure counts, inputs to the Kaplan-Meier reliability estimator. Power generation and distribution controls’ EEPROMS were losing their little memories! Obviously something went wrong in 1999-1. In late 1999, we asked the vendor, Catalyst Semiconductor, and they admitted they shrunk the die size and claimed to have fixed it. Claim seems true for subsequent quarters. That’s because the figure 1 reliability functions are shorter than 99-1 and have higher reliabilities than 99-1. That’s clearly evident in the K-L divergence in figure 2. 

Figure 2. Kullback-Leibler divergence from one quarter to the next. Units are bits.
Figure 2. Kullback-Leibler divergence from one quarter to the next. Units are bits.

The KL divergence quantifies the differences between pairs of subsequent cohorts’ reliability functions. The units of divergences are relative entropies in bits. Wikipedia calls that “expected excess surprise,” from one quarter to the next. [See also https://en.wikipedia.org/wiki/Variation_of_information.] Consider comparing the KL divergence with the entropies in each reliability function of pairs of successive reliability function estimates [George, Sept. 2021]. 

Table 1. Entropy, KL Divergence, and the ratio KL Div/Entropy for EEPROMs by calendar quarter cohorts.  Logarithms are to the base 2, so entropy is in bits.

Quarter Entropy KL Div Ratio
97–1 33.92 0 0
97–2 17.98 0 0
97–3 8.69 0 0
97–4 19.05 0 0
98–1 37.55 -3E-04 0.00%
98–2 9.16 0 0
98–3 18.16 0 0
98–4 9.76 -0.002 0.02%
99–1 75.05 0.0045 0.01%
99–2 19.23 0 0
99–3 0.00 0 0
99–4 0.00 0 0

Clearly from figure 1 and from the 99-1 entropy of 75.05 bits, the 99-1 cohort was different. However the ratio of Kl divergence to 99-1 entropy, 0.01%, was not as large as that from 98-4, 0.02%. That’s what makes me wonder if the EEPROM problem started in 98-4.

EEPROM problems might have started in 1998-4 although not serious, because reliability was high and entropy in 98-4 was below average. However 99-1 shows large entropy because reliability deteriorated and startling difference even from 98-4 shown by KL divergence between 98-4 and 99-1. Reliability of subsequent quarterly cohorts returned to normal. 

Do we need SRC on reliability function entropies to update actuarial demand rates for actuarial forecasts? Actuarial forecasts and their distributions depend on the entropy (information) in the reliability function. I haven’t figured out how entropy affects actuarial forecasts, the rate of change per unit of entropy δSUM[a(t‑s)*n(s)]/δentropy and their distributions, but I will try.

Do we need SRC on KL divergence? KL divergence quantifies the expected surprise from one period to the next. How much divergence calls for action? Look at past actions and compare them with past KL divergences. I will use the methods from SPC to construct control limits and rules similar to SPC and SQC. 

The figure 1 EEPROM broom chart clearly showed a problem. What if problem(s) were more subtle or different? 

  • Do SRC on many products or parts? 
  • Suppose some parts’ reliabilities are statistically dependent?
  • Data are samples of lifetimes. How large samples are needed ?
  • Compare samples of grouped failure counts by cohorts (KM reliability estimate) vs. individual sample lifetimes estimates? Lifetime data costs more than grouped failure counts.
  • Compare population periodic ships (cohorts) and returns counts instead of grouped failure counts by cohorts? Grouped failure counts by cohort (KM) cost more than ships and returns from data required by GAAP.
  • Link entropy bits or KL divergence bits to design, process, or program changes, especially software.

Want SRC on your data? Send periodic cohort field reliability data to pstlarry@yahoo.com. 

References by George

Please see Wikipedia for entropy and Kullback-Leibler divergence.

“Estimate Reliability Functions Without Life Data,” Reliability Review, ASQC, Vol. 13, No. 1, pp 21-26, March 1993

“Want Field Reliability Without Life Data?” Weekly Update, /want-field-reliability-without-life-data/, April 2021

“Renewal Process Estimation, Without Life Data, Weekly Update, https://fred-schenkelberg-project.prev01.rmkr.net/renewal-process-estimation-without-life-data/, Sept. 2021

“How can you Estimate Reliability Without Life Data?” Weekly Update, https://fred-schenkelberg-project.prev01.rmkr.net/can-estimate-reliability-without-life-data/, May 2022

Filed Under: Articles, on Tools & Techniques, Progress in Field Reliability?

About Larry George

UCLA engineer and MBA, UC Berkeley Ph.D. in Industrial Engineering and Operations Research with minor in statistics. I taught for 11+ years, worked for Lawrence Livermore Lab for 11 years, and have worked in the real world solving problems ever since for anyone who asks. Employed by or contracted to Apple Computer, Applied Materials, Abbott Diagnostics, EPRI, Triad Systems (now http://www.epicor.com), and many others. Now working on actuarial forecasting, survival analysis, transient Markov, epidemiology, and their applications: epidemics, randomized clinical trials, availability, risk-based inspection, Statistical Reliability Control, and DoE for risk equity.

« Manufacturer Recommended Maintenance
How to Use Ben Franklin’s Distribution Curves for Performance Monitoring »

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Articles by Larry George
in the Progress in Field Reliability? article series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • Leadership Values in Maintenance and Operations
  • Today’s Gremlin – It’ll never work here
  • How a Mission Statement Drives Behavioral Change in Organizations
  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy