Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / Name That Failure Pattern (1)…

by Robert (Bob) J. Latino Leave a Comment

Name That Failure Pattern (1)…

Name That Failure Pattern (1)…

This is a failed shaft that came out of a pump in a paper mill. The pump was only in service for about a month before it failed unexpectedly.

From the top view above, identify the type of failure pattern that you see from the fractured surface(s). If you need more info to make your assessment, just ask.

Below is a side view of the same failed shaft.

I am seeking a discussion on the physics of the failure based on the fractured surface.

FOLLOW UP POST 5.18.17 (Provided by Ron Hughes, Sr. Investigator, RCI)

Some facts about this failure involving the shaft shown above. 

1.       The key way was welded. In the past the pump broke numerous keys causing excessive downtime.

2.       Maintenance installed a new key made of a harder material than specified. 

3.      Heat from the welding process changed the microscopic structure of the shaft and added additional weight which caused an unbalanced condition.

Facts about the pictures below.

1.      The failure started at the lower corners of the key way. 

2.      There are 2 small fatigue planes at the initiation points.

3.      The initial cracks were caused by Stress Corrosion Cracking from stresses internally induced during the welding process.

4.      Not counting the case hardening of the shaft, there are 3 distinct grain structures in the shaft. These are caused again by the welding process causing the shaft material to change from varying stages of austenitic (Body-Center-Cubic) to a martensite (Face-Center-Cubic) i.e.; when you heat the material it gets harder with the addition of martensite. 

5.      There are Chevron marks in the case hardened surface depth area of the shaft. This is not unusual as these marks are left when very hard material breaks instantaneously. 

6.      The final fracture zone is very large indicating that the shaft was heavily loaded at the time of failure.

7.      There is some torsion in the final fracture that is due to the unbalanced shaft. However, since the torsion is less than 45 degrees this is not a perfect torsional failure but rather rotating bending of the shaft.

So the physical cause of the failure is fatigue due to rotating bending and Stress Corrosion Cracking.

If we drilled deeper into the human and systemic issues by asking “Why would someone decide to make a weld repair to the key way?”, what potential answers can you think of to that question?

This has been a great exchange of experience by some very knowledgeable experts. Thank you.

If you’re interested, we have plenty of these types of fracture patterns from cases to discuss? We apparently can do so on this type of forum, just let us know if you would be interested in sharing your expertise. Take care folks.

FOLLOW UP POST 5.22.17 – Understanding the Human Contribution to the Physics of Failure

We clearly have a great deal of technical talent that responded to this post regarding the physics of failure (the hard side of failure). But now I wanted to dive into the human contributions to the failure (the soft side of failure).

This is often the difference between what people call ‘Root Cause Failure Analysis’ (RCFA) and ‘Root Cause Analysis’ (RCA). The term RCFA tends to limit itself to the hard side of failure and RCA is a broader term intended to pull in the Human and Systemic sides of the failure mechanisms.

When dealing with the hard side of failure (RCFA) we hypothesize by continually asking ‘How Can’ the previous hypothesis have occurred. We let the evidence answer the questions for us, as it will tell us which hypotheses were true and which were not.

For example’s sake, if we hypothesize as to ‘How can a shaft fail?’, we may come up with the possibilities of Overload, Fatigue, Erosion and Corrosion. From level to level, represents a cause-and-effect relationship in time.

I know we can think of many reasons a shaft can fail, but if we can visualize being the shaft at the time of the failure, we have to ask ‘what just happened to me’? This is where the physics of the failure is so important. The fractured surfaces tell the real story and it takes an educated eye to understand what those fracture patterns are telling us. We are simply going backwards and doing a visual reconstruction of the sequence of events.

The above is for example’s sake only, but you get the message. From here we would ask ‘How could we have a fatigue failure of the shaft?’. The questioning goes on and on, deeper and deeper, as the evidence itself leads the way. Whatever is true, we follow. What is not, we cross off as NOT TRUE.

As we drill down, eventually we will come across a human error (or several), which is simply a decision error. It will be either an error of omission or commission. We did something we shouldn’t have, or we should have done something and we didn’t.

In our hourglass slide above, we are discussing the human behavior related to our undesirable outcome. It is at this point we switch our deductive questioning (general to specific) to inductive questioning (specific to general).

We are now in the decision makers head, and have to try and understand his/her reasoning at the time and location of the decision. It is not for us to make judgments, we just have to put the decision into the proper context of what was going on at the time. Most of the time, when we truly understand the conditions, the decision seems perfectly logical. After all, most people don’t wake up in the morning and think to themselves, ‘How can I screw up at the plant today?’:-)

Getting back to our shaft failure and where I was guiding the discussion, what do you think was going through the mind of the maintenance personnel who welded the keyway?

In this case, let’s presume one of our Human Roots (HR) was the ‘Decision to Make Weld Repair on the Keyway of the Pump Shaft’. Our questioning reverts here from ‘How Could’ to ‘Why’. Why would the maintenance personnel have chosen to make such a weld repair on the keyway?

Some possibilities could be:

  1. There were no engineering Management of Change (MOC) requirements for the weld repair. In other words, there were not any guidelines for them to follow, so it was left up to their discretion. They did not violate any ‘rule’.
  2. There was a belief (paradigm) that a harder keyway will prevent the key from breaking. In their minds, this will also ensure increased uptime in the near term.

I am throwing this out for debate as there are additional human and systems considerations in these types of cases. Can you think of more?

Some food for thought, do you think this was the first time a keyway was welded to make such a repair? Could this have become a ‘practice’ that was acceptable, only until there was a high visibility failure? Could it be perceived that due to production pressures, such decisions are made hastily, despite the known failure risks? Do you think management was aware of these practices in the past?

What is your experience? What do you think could be going through the minds of those that were well-intentioned in this case, but their decision didn’t pan out as intended?

FOLLOW UP POST 6.8.17 (Provided by Bob Latino, RCI)

Thanks to Tim Lim for his REPLY on 6.7.17. As a result of his suggestions about the potential human contributions to this failure, I updated my logic tree in this case.

A ‘Human Root (HR)’ in this case was the actual decision to make the weld repair in the manner they did. So at this point we have to put ourselves in the position of the person making that decision and think ‘What was going through their mind’? WHY did they feel the manner in which they made the modification, was OK?

  1. No engineering Management of Change (MOC) was done for this specific weld repair
  2. A belief by the person making the modification that a harder keyway will prevent the key from breaking (and increase uptime)
  3. An adequate weld procedure did not exist.

As we drill down, we continue to ask ‘Why did the person making the modification believe that a harder keyway would work?’ Perhaps the person making this modification was not a qualified welder. As Tim Lim stated in his reply, “As a rule of thumb, any steel with a carbon content of more than 0.35% by heat analysis should not be welded. Key steel would have a typical value of 0.4C.” A qualified welder would have known this and if the conditions were not appropriate, they should have questioned ‘why not’ and ‘what is Plan B?’.

This leads us to understanding how we could have had a person who was not qualified, making such a modification. This is a managerial function. Supervisory personnel should be responsible and accountable for staffing all positions under their control with qualified personnel. Both knowledge and skill should have to be demonstrated in such positions prior to taking on responsibility for the position.

Moving on, ‘Why did a proper weld procedure not exist, since the failure had happened before?’ Previous RCA’s were either inadequate or non-existent. Had a proper RCA been conducted after previous failures, they would have identified this deficiency and corrected it.

If previous RCA’s were conducted, ‘Why were they not effective?’ Either the ones submitted were not scrutinized by management enough (properly validated) and/or the people conducting them were not qualified to be leading such analyses. Such system flaws are referred to as Latent Root Causes or LR’s (as labeled on the logic tree).

One fact that often goes unnoticed here is, ‘Do we think this is the first time the person making the modification, did it this way?’ That is something we should always ask ourselves when looking at a person’s reasoning. This person likely had done this very same thing in the past, in an effort to succumb to production pressure to get back online quickly. They likely got pats on the back and the proverbial ‘atta boy’ for getting production back up quickly. So naturally, if I get positive recognition for such behavior, I am likely to repeat it. Food for thought:-)

Thanks Tim Lim for helping me drive this point home about Human and Latent Root Causes in this case. The concept applies to all cases so think beyond the physics of the failure and what role did the human play in permitting the physical failures to occur?

Please click the hyperlink if you’re interested in more job aides likes this and/or information on associated training and tools to help with understanding Why Parts Fail.

Filed Under: Articles, on Maintenance Reliability, The RCA

About Robert (Bob) J. Latino

Robert Latino is currently a Principal at Prelical Solutions, LLC, along with his brother Ken Latino. Bob was a Founder and CEO of Reliability Center, Inc. (RCI), until it was acquired in 2019. RCI is a 50-year-old Reliability Consulting firm specializing in improving Equipment, Process and Human Reliability. Mr. Latino received his Bachelor’s degree in Business Administration and Management from Virginia Commonwealth University. For any questions, please contact Bob at blatino@prelical.com

« Uptime Insights – 5 – Materials Management for Maintenance
When people do right – does the right thing happen? »

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

logo for The RCA article series image of BobArticle by Robert (Bob) J. Latino
Principal at Prelical Solutions, LLC

in the The RCA article series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Posts

  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success
  • 3 Types of MTBF Stories
  • ALT: An in Depth Description
  • Project Email Economics

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy