Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / Learning from A Failure

by James Kovacevic Leave a Comment

Learning from A Failure

Learning from A Failure

Why Failing Can Be Good and What You Can Take Away from It.

Regardless of how good a maintenance & reliability program is set up and managed, there will be failures.  This is partly due to the maintenance program itself, where the focus is on the consequences of the failures, not the failures itself.   This approach allows most organizations to manage large facilities will a minimum of staff and cost.

But what should happen when something does fail?  Should we just carry on as usual since we avoided the consequences?  Absolutely not.  When a failure occurs, we need to learn from it and improve the maintenance & reliability program.  Yet, many organizations address failures by implementing a PM routine.  This is not the right approach.  Remember only 11% of failures are age-related.  Adding these PM routines to the program will cause a collapse of the program from too much work, not to mention the maintenance induced failures that result from it.

So what should happen?  The failure should be analyzed and actions implemented to reduce the chance of the failure occurring again.

What to do When a Failure Occurs

When a failure does occur in your operation, it is vitally important to not rush in as a bunch of firefighters to put out the fire.  There needs to be a calculated approach to dealing with the failure.

The first step in addressing and learning from a failure is to analyze the scene of the failure like a detective would a crime scene.  This is often a difficult activity, as we need to get production up and running as quickly as possible.  But if we don’t take the time to collect the information and data, chances are production will be down again with the same issue.

To collect the data & information from a failure, there needs to be a systematic approach.  One of the best approaches I have seen is to have a kit that is stored in the maintenance shop and brought to all failures.   This kit contains everything that would be required to collect the data for a Root Cause Analysis (RCA).  Included in the kit is;

  • A checklist outlining the approach to use
  • A digital camera
  • A flash light
  • Zip Lock bags of various sizes (used to collect failed components and keep as is until a failure analysis can be conducted)
  • Failure Data Collection form (used to ensure all the failure data is captured in a repeatable way)
  • Notepads w/ Pens (for recording observations)
  • Markers (for writing on the Zip Lock bags)
  • Measuring tapes of various size
  • Adhesive Measuring Tape (to be used when taking pictures of the failure)
  • Reference Scales (used when taking pictures of the failure scene)
  • Inspection mirror
  • Equipment tags (to tag the failed components that are too large for the Zip Lock bags)

It is important that this kit is maintained and replenished after each failure, so it is ready to go for the next failure.  Once the failure data has been collected, it is time to learn from the failure.

Once the failure scene has been analyzed and data collected, then the repair can be made to the equipment.

How to Learn from Failure

With the failure data collected, a Root Cause Analysis can take place to learn from the failure.  Depending on the severity of the failure, a different approach to determining the root cause may be taken.  It could be a simple 5-Why focusing on the 3 legs (Direct root cause, detection root cause, and systemic root cause).  Or the failure may warrant a Fault Tree Analysis, taking into account all human factors.

Regardless, any failed components should be sent off for detailed analysis.  Bearings should be sent out for analysis to determine the true cause of the failure.  Was it a lubrication issue, wear, etc.? Using expertize is critical to identifying the cause of the failure.

Many suppliers and distributor will provide this engineering service either free of charge or at a deeply discounted rate.  Be sure to ask your supplier and take advantage of the service.

At the outcome of the root cause analysis, you should have a detailed reported outlining the cause of the failure, and what needs to be done to eliminate the cause of the failure.

It is important to note that the RCA process is not about finding a responsible person (even if there was a responsible person).  It is about learning from the failure and making sustainable changes.

Applying the Learnings

This is where many organizations fail.  They often collect the data and perform some form of Root Cause Analysis, but they fail to implement some or all the recommendations coming from the Root Cause Analysis.

It is vitally important to improving the reliability of the equipment, that all recommendations are reviewed and if it makes sense to implement them.  All recommendations should be reviewed against criteria to determine the impact that it will have on the failure and the ease of implementation.

When you have this criteria set up, it should eliminate many of the recommendations that call for “setting up a PM”.  You will never improve reliability by adding a PM routine for all failures encountered.

The Root Cause Analysis may determine many other factors that need to be addressed.  This could include an engineering change to add the ability to monitor the condition of the component.  It may also include training recommendations, equipment redesign, material changes, etc.

With the recommendations are implemented, you are on your way to improving the reliability of the plant.  And if by chance this failure does occur again, be sure to review your previous Root Cause Analysis for any gaps.  Take those gaps and improve your RCA process in the future.

Does your site have a failure analysis kit ready to go?  Do you have defined criteria for evaluated RCA recommendations?  By taking a few small steps to improve your RCA process and learn from your failures, you will improve plant performance.

Remember, to find success, you must first solve the problem, then achieve the implementation of the solution, and finally sustain winning results.

I’m James Kovacevic
Eruditio, LLC
Where Education Meets Application

Follow @ReliableJames
Follow @EruditioLLC
Follow @HPReliability

 

References:
RCA Made Simple
Root Cause Analysis: Improving Performance for Bottom-Line Results, Fourth Edition

Filed Under: Articles, Maintenance and Reliability, on Maintenance Reliability Tagged With: Root Cause Analysis (RCA)

About James Kovacevic

James is a trainer, speaker, and consultant that specializes in bringing profitability, productivity, availability, and sustainability to manufacturers around the globe.

Through his career, James has made it his personal mission to make industry a profitable place; where individuals and manufacturers possess the resources, knowledge, and courage to sustainably lower their operating costs.

« Oil — How Clean Does it Have to Be?
Defining Precision Maintenance »

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Maintenance & Reliability series


by James Kovacevic

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success
  • 3 Types of MTBF Stories
  • ALT: An in Depth Description
  • Project Email Economics

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy