Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / Leveraging Survival Analysis for Multi-Modal Failure Analysis in R

by Laxman Pangeni 5 Comments

Leveraging Survival Analysis for Multi-Modal Failure Analysis in R

Leveraging Survival Analysis for Multi-Modal Failure Analysis in R

In reliability engineering and data-driven maintenance strategies, understanding different failure modes is crucial for designing robust systems. Survival analysis is a powerful statistical tool that allows us to analyze time-to-event data and assess the reliability of components over time. In this article, we’ll explore how survival analysis can be applied to multi-modal failure scenarios using R.

What is Survival Analysis?

Survival analysis helps in modeling and predicting the time until an event of interest occurs, such as equipment failure. It is widely used in fields like healthcare, manufacturing, and engineering. The Kaplan-Meier estimator is a popular non-parametric method used to estimate the survival function from lifetime data.

Multi-Modal Failure Analysis

In real-world applications, systems often exhibit multiple failure modes due to varying stress factors and operational conditions. Analyzing these failure modes separately allows for better insights into system performance and potential improvements.

The figure below illustrates a survival analysis by failure modes performed in R:

Example: Key Insights from the Plot [Generated Using R]

three plots showing K-M curves, censoring, and number at risk, along with mention of significance
  • Kaplan-Meier Curves:

The plot shows separate survival curves for two failure modes (Mode1 in green, Mode2 in blue), along with censored data in red.

Mode1 appears to have better survival performance compared to Mode2, as its survival probability remains higher over longer distances.

  • Censoring Information:

The red marks indicate censored observations, meaning that for those instances, the failure did not occur within the observed time.

  • Number at Risk:

The middle section of the plot displays the number of units still at risk at different distance intervals.

  • Statistical Significance:

The p-value (< 0.0001) suggests a significant difference between the failure modes, reinforcing the need for tailored maintenance strategies for each failure type.

Performing Survival Analysis in R

To replicate this analysis, the survival and survminer packages in R can be used. Below is a basic code snippet to analyze multi-modal failure data:

library(survival)
library(survminer)

# Sample data
data <- data.frame(
  distance = c(5000, 10000, 15000, 20000, 25000),
  status = c(1, 0, 1, 1, 0),  # 1 = event occurred, 0 = censored
  failure_mode = c('Mode1', 'Mode2', 'Mode1', 'Mode2', 'Mode1')
)

# Fit survival model
fit <- survfit(Surv(distance, status) ~ failure_mode, data = data)

# Plot survival curves
ggsurvplot(fit, data = data, pval = TRUE, risk.table = TRUE,
           conf.int = TRUE, legend.title = "Failure Modes",
           palette = c("green", "blue"))

Conclusion

Survival analysis provides valuable insights into system reliability and helps identify different failure behaviors. By leveraging such techniques, engineers can implement data-driven maintenance strategies, optimize component usage, and enhance overall operational efficiency.

If you’re looking to dive deeper into reliability modeling, R offers a comprehensive set of tools to perform survival analysis effectively.

Feel free to share your thoughts and experiences with survival analysis in the comments!

Filed Under: Articles, on Product Reliability, Reliability by Design

About Laxman Pangeni

Laxman Pangeni is a seasoned Design and Reliability Engineer specializing in predictive modeling, data science, and advanced statistical analysis to enhance product performance and reliability across complex systems.

« ALT in Depth
How Reliability Professionals Can Overcome the 3 Barriers to Effective Communication »

Comments

  1. Shishir Rao says

    February 17, 2025 at 12:09 PM

    Hello Laxman, congratulations on posting your first article on the Accendo website! Looking forward to seeing more from you on the topic of Design for Reliability.

    Here are some comments on this article that you might find useful. I see that you have also used the “shock absorber data” to plot the figure in your article. If you have used the above code snippet (modifying it for the shock absorber data), I don’t think it is giving you what you think it is giving you. The code just treats the 3 different groups (Censored, Mode 1 and Mode 2) as 3 separate populations and plots 3 Kaplan Meir curves on the same plot (the curve for the “Censored” group is the red horizontal line at survival probability 1). This is similar to plotting a Kaplan Meir curve for the time to fatigue failure of an automotive component and time to voltage-surge failure of an electronic component in a washing machine on the same plot. The electronic component cannot experience a fatigue failure and the automotive part cannot experience a failure due to voltage surge. The “number at risk” for both groups are different, because there is no connection between the two. But this is not the case when we are talking about multiple failure modes of a component. The same component can fail due to mode 1, mode 2 or be censored. We do not know at time zero what will happen to the component in the future. Hence, the “number-at-risk” at time zero includes all the components and we cannot separate it by the failure mode. I suggest taking a look at the concept of cumulative incidence function (also called sub distribution function) when dealing with competing risks/multiple failure modes.

    Regards,
    Shishir.

    Reply
    • Laxman Pangeni says

      February 17, 2025 at 8:46 PM

      Hi Shishir,

      Thank you for your thoughtful feedback! I really appreciate your insights, especially regarding the interpretation of multiple failure modes in the Kaplan-Meier analysis.

      You’re absolutely right that in a strict competing risks scenario, all components should be considered together at time zero, rather than treating different failure modes as separate populations. However, in this example, where there are Commercial Off-The-Shelf (COTS) components involved, predominant failure modes of sub-components are often analyzed separately, as different failure modes may be attributed to distinct operational conditions. In contrast, for actual manufacturers designing and testing their own components, multiple failure modes are typically analyzed within the same reliability framework.

      The graph and code snippet in the article aren’t meant to be an exact competing risks representation but rather a more generic approach to illustrate survival trends under different conditions. That said, I agree that using the cumulative incidence function would provide a more precise representation of competing failure modes. I’ll look into incorporating that perspective in future analyses.

      I really appreciate your detailed input—discussions like this help refine our approaches and improve how we communicate reliability concepts. Looking forward to more such exchanges!

      Regards,
      Laxman

      Reply
      • Shishir Rao says

        February 17, 2025 at 10:36 PM

        I see. If I understood you correctly, mode 1 and mode 2 failure risks in this case are not acting on a component at the same time (mode 1 could be a brittle fracture failure due to to extreme cold operating conditions and mode 2 could be a stress related failure due to extreme heat conditions). In this case, I agree this is not a competing risks framework and are 2 different populations. Thank you for the clarification! Cumulative incidence function is useful when both risks are active at the same time. I noticed the words “multi modal failure” and the shock absorber data in the Kaplan Meir curve and automatically assumed that this is a competing risks framework, which it is not.

        On a side note, I see that you have used the “survminer” package to get the p-value from a log rank test. The “survminer” package plots look better aesthetically as compared to the default ones plotted by the “survival” package, but can sometimes give the incorrect p-values (when the “weights” argument is used in “survfit”). I wrote a blog article on it last year, which you might find useful: https://rpubs.com/shishir909/1199328

        I am not sure if there is a different way to use the “ggsurvplot” function to incorporate case weights, or if they have fixed the bug. This is something that could be easily missed. You might find it informative.

        Regards,
        Shishir.

        Reply
        • Laxman Pangeni says

          February 19, 2025 at 9:57 AM

          Hi Shishir,

          Thank you for the clarification! I completely agree— since failure modes 1 and 2 act on different populations rather than simultaneously on the same component, this isn’t a competing risks framework. Your point about the cumulative incidence function being useful when risks are active at the same time is well noted.

          Also, I appreciate you sharing your blog article on the survminer package and its potential issue with the weights argument in survfit. I wasn’t aware of this specific limitation, and it’s definitely something to keep in mind when using ggsurvplot. I’ll check out your post—always great to learn from real-world findings like this! Have you come across any alternative approaches or recent updates that address this issue?

          Regards,
          Laxman

          Reply
          • Shishir Rao says

            February 19, 2025 at 6:14 PM

            The github page for the “survminer” package shows that the latest version was released on Oct 30, 2024, and I had written the blog sometime in June, 2024. I don’t know if this bug was fixed or not, I haven’t checked. (I should do it someday and raise an issue on github if it hasn’t been done already)

            I would rely on the “survival” package for the log-rank test p-values, since this is a very stable package with regular updates. I verified it by calculating the p-value manually (using excel) for the example in the blog. You can conduct the log-rank test using the “survdiff” function, and the p-value that you get here can be hard coded on ggsurvplot by using the “pval” argument. The blog has details on this.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Reliability by Design series logo Photo of Laxman PangeniArticles by Laxman Pangeni
in the Reliability by Design article series

Recent Posts

  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success
  • 3 Types of MTBF Stories
  • ALT: An in Depth Description
  • Project Email Economics

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy