Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / RCA is the Bedrock of a Reliability Program

by Kevin Stewart Leave a Comment

RCA is the Bedrock of a Reliability Program

RCA is the Bedrock of a Reliability Program

Basic Reliability Definition

Occasionally, I like to step back and reflect on reliability in basic terms.

In that spirit, the basic premise of reliability is usually stated as “The probability that an item will perform a required function, without failure, under stated conditions, for a stated period of time.”

To use the reliability equation, the definition of failure must be defined, so you can tell if your equipment has indeed failed.  This way you can include it in the MTBF (Mean Time Between Failure) calculation.

After you have defined a failure and recorded them appropriately, you can plug the numbers into the reliability equation, R = e ^-(λ*t)  where λ is the failure rate which is defined as λ= 1/MTBF and come up with an objective value for the reliability.

reliability_smallBased on this equation, as the number of failures goes down, the reliability increases assuming all the other parameters stay unchanged.

This is important because I can also increase the reliability by reducing the “t”, or mission time the system is expected to perform the function.

In other words, if I can’t make a system go 8 hours without a failure, I would have R = 0. However, if it could always be counted on to go 7 hours without a failure, then I could change the mission time to 7 hours and have R = 1 (100% probability of going 7 hours without a failure).

In a manufacturing environment, they are always looking to extend the mission time so we tend to ignore the time adjustment issue. As company’s try to achieve improved reliability, they implement systems, hire consultants, install software, establish preventive maintenance programs, implement RCM, and many other great tools which all are all necessary to provide or support maintenance strategies to increase reliability.

Which is just another way of saying eliminate or reduce failures.

It is all about eliminating failures

So at a basic level, it is all about eliminating or reducing failures.

For example, one of my first forays into reliability was an issue with a bearing that was causing significant downtime. There were four identical systems, and the plant had floated a capital improvement authorization to add a 5th.

The reason for this was they couldn’t keep the system running due to bearing failures. I started investigating, and a tradesperson suggested that we should look at system one. “Why?”, I asked. “Because system one isn’t failing,” they replied.

I then wanted to know why system one was not failing The tradesperson didn’t know, so we scheduled an inspection to see if we could find out.

As the mechanic started to raise the pillow block cover, I said: “Ok you can put it back down.” The tradesmen looked at me like I was crazy. I asked him to do it again, but this time I asked him to look at the bearing.

He did and realized the same thing – it was different than all the others.

So, we buttoned it up and investigated. Luckily the system we just inspected had not had a bearing changed in a while and that drove us to uncover the fact that someone had replaced the spherical ball bearing with a spherical roller bearing.

Others had just continued to follow suit and the problem perpetuated itself.

Continuous improvement

So, we put the correct bearing in all of the four systems, canceled the capital improvement authorization, and I got an honorable mention from the plant manager.

We could have stopped there, but I kept asking “why?” when one of the bearings failed.  That led us to other issues.

We had to correct things such as improper installation; incorrect internal clearances; not correcting for soft foot, using poor alignment practices (time card and eyeball), not correcting for hot alignment, incorrect use of lubricant and incorrect frequency of lubrication, and no vibration monitoring.

Most will recognize that doing all of these things is now what is referred to as precision installation.

The reliability of this system was improved by simply extending the MTBF from 1 month to 8 years, and we saved significant maintenance dollars and time.

We improved the reliability so much that shortly before I left the plant, I got a call from the maintenance crew supervisor saying that they were having problems with the same systems.

In doing the root cause, it turns out that they were so reliable that people had forgotten how to fix them.

At the time, we had no CMMS, no PM program, obviously training issues, and had never heard of RCM.

I was also using an oscilloscope-type of analyzer to do vibration analysis with an X/Y pen plotter to capture the signature. From this experience, I learned that improving MTBF through Root Cause Analysis or defect elimination first, it can be quick and cost effective and it also doesn’t hurt your career.

I learned that it provided the quick wins that management wanted and that allowed them to take the leap of faith (for them) to support reliability.  I also learned that after doing Root Cause Analysis, there was still a need to capture the lessons learned in the CMMS.

This became evident after eight years and the lessons hadn’t been captured in order to be passed on to others.

It showed that there was a proper timing to implementation to each of the tools that are available.

Lessons learned

So what has changed from that time which was back in the mid-1980’s?

We have better vibration equipment and RCM. We have FMEA’s and state of the art CMMS, to name just a few. These are all valuable tools and integrate into an overall reliability program. The simple bearing example used in this article always makes me careful when discussing how to best attack an improvement project.

I can use all the tools available today, have no unscheduled downtime on equipment but still be doing more maintenance than necessary.

My lesson learned from this plant experience was that basic root cause analysis has a very prominent place in the total reduction of overall costs.

Many times it should be the first thing considered to implement in your program because as the title of this article suggests, it provides the bedrock on which to build your program.

Please let me know if you think RCA is the bedrock – if you don’t agree, what is?

Filed Under: Articles, on Tools & Techniques, Reliability Reflections Tagged With: Root Cause Analysis (RCA)

About Kevin Stewart

Welcome to Accendo Reliability – join us and learn the art and craft of reliability engineering

I am an experienced educator and maintenance/reliability professional with 38 years of practical work experience in a variety of roles for ALCOA Primary Metals Group and ARMS Reliability.

« Adjusting to Customer Expectations Changing
3 Case Studies of How to Define the Right Reliability Requirements for Each Customer »

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Articles by Kevin Stewart
in the Reliability Reflections series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success
  • 3 Types of MTBF Stories
  • ALT: An in Depth Description
  • Project Email Economics

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy