Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / What Are Broken Parts Trying to Tell Us?

by Robert (Bob) J. Latino 1 Comment

What Are Broken Parts Trying to Tell Us?

What Are Broken Parts Trying to Tell Us?

Author: Mark Latino

RCA and How to Understand the Basics of Component Failure 

When performing a PROACT® Root Cause Analysis (RCA) there is a data collection step called ‘Preserve’ (or the PR in the PROACT acronym) which requires the team to collect failed parts, conduct interviews, obtain paper data and positional information after an undesirable event occurs.

The method also has a step to construct a logic tree and hypothesize all of the possible ways an undesirable failure mode can occur. This paper explores what internal knowledge is helpful when examining failed parts and how that knowledge verifies the physical possibilities on the logic tree.

When leading an RCA investigation, the investigator will collect failed parts like bearings, mechanical seals, shafts, etc. The broken parts are inspected to determine what forces the part experienced as the event unfolded. Having knowledge about the forces present, allows the RCA team to verify if a certain hypothesis did or did not occur.

There are two types of mechanisms that cause mechanical failures. There is either a loss of material or the material is overpowered. There are additionally two mechanisms that cause material loss, 1) material can have loss because of corrosion or 2) there is material loss from erosion/wear.

There are also two mechanisms that overpower materials, 1) the material is overpowered with a single load application or 2) the material is overpowered over time by fatigue.

There are four all-inclusive buckets or hypotheses used in logic trees that cause material failure:

·      Erosion

·      Corrosion

·      Fatigue

·      Overload 

Failure Mechanisms for Overpowering 

For now, we will talk about the mechanisms that overpower materials. 90% of all mechanical failures are caused by fatigue, therefore we will talk about fatigue first.

Fatigue occurs when a material is subjected to repeat loading and unloading. When the loads are above a certain threshold, microscopic cracks begin to form at the surface of stressed areas and a crack(s) will begin. Eventually a crack will reach a critical size. This will be when the remaining material can no longer support the load and the material will suddenly fracture.

Overload failures occur in two forms based on whether the material is brittle or ductile. If the material is brittle, they call the failure a brittle overload fracture. Brittle overload occurs instantly, usually from a single load application. If the material is ductile, the material will become deformed and fail plastically.

When analyzing the failure type, an easy way to identify fatigue failure is there has to be an origin(s) plus one. If the failed part has an origin plus progression marks it is fatigue. If the part has an origin plus a final fracture zone then it is also fatigue. Sometimes the load variations are so minor you can’t visually see the progression marks but there is a final fracture zone. These are the most frequent. There are other indicators that we won’t get into right now as they add unnecessary complexity.

Brittle overload failures can be determined by a ‘salt and pepper’ look on the surface. The salt and pepper appearance is because the fracture moves across the surface so fast. The origin can be determined by following chevron marks (they look like arrows) and point to the origin. The chevron marks will only be present in brittle overload failures. If the fracture was a tension failure you will most likely have a hinged lip. We see this with fastener failures and sometimes alignment pin failures. The final visual is the failed pieces look as if they can be put back together perfectly. Don’t worry, pictures are coming!!

Ductile overload failures are visually determined by material deformation. Ductile failures happen in the plastic range of the stress strain curve so they will have a ‘cup and cone’ appearance when they fail in tension. There may also be a fibrous look to the surface. We often see this in wire rope failures because wire rope is a ductile material and the job it performs is lifting. Therefore, it tends to fail most often in tension. 

Now let’s show how this would work when performing an RCA.

For example, let’s say you have experienced an unexpected pump shaft failure on PCH-112, you have collected the failed shaft and it looks like the shaft below (Figure 1).  

Figure 1: Unexpected Pump Shaft Failure

The logic tree being developed by the RCA team states the Event was ‘PCH-112 Unexpectedly Lost Function’ and the only Failure Mode is the ‘Shaft Failed’. The first level of hypotheses answers the question “How can a shaft fail?’

There are four all-inclusive buckets (as stated earlier) for how a shaft can fail; the shaft can erode, corrode, fatigue, and/or be overloaded.

The logic tree at this point would look similar to the one below (Figure 2). To determine which possibilities did and did not occur, we will use the failed shaft inspection results to help us.

Figure 2: Top Levels of Logic Tree

When interpreting a logic tree, just remember that the top box is the Event or Undesirable Outcome that forced us to take action (PCH-112 Unexpectedly Log Function). This happened because the Pump Shaft Failed (Failure Mode). We know these to be true, because we can see the evidence with our eyes.

Level to level in a logic tree is essentially a cause-and-effect relationship. Underneath the Mode level, as we explore the physics of the failure, we simply keep asking ‘How Could?’ This will generate hypotheses which have to be proven or disproven with hard evidence (not hearsay). Figure 2 shows how we expressed our four hypotheses for how our shaft could have failed.

If an analyst doesn’t have any knowledge of fracture basics, they would most likely have to send the broken part out to an internal or external expert (metallurgist) to analyze. They would then have to wait for the report explaining the forces present at the time of failure.

If an analyst has basic metallurgical knowledge, they can identify those forces themselves (with their trained eye) and move the RCA forward. Let’s take a look at what this part is telling us. 

Figure 3: Progression Marks

 The arrows in the photograph above point to progression marks (Figure 3). There are many progression marks across the surface of the fracture. Progression marks are only present in fatigue failures. They represent the propagation of a crack. Cracks need load fluctuations to propagate across the shaft surface. The more rapid the growth the farther apart the progression marks will be. The information at this point has verified the failure as fatigue. What other information is present in the part?

In the photograph below (Figure 4) we can see the crack origin. When we follow the progression marks backwards, it points out the crack origin which is always the point of the highest stress. This is usually a sharp corner, in this case it is the sharp corner of the key-way.

Figure 4: Crack Origin

When the progression marks are followed away from the origin, we will find the Final Fracture Zone (FFZ). The FFZ is the point where the material could no longer support the remaining load and it breaks. The FFZ also has information for the investigator. The larger the FFZ, the heavier the load was at the time of failure. This part’s FFZ was small. Therefore, the load was minimal. What we have here is a fatigue failure that started in a sharp corner of the key way under minimal load (Figure 5).

Figure 5: Fatigue Which Started in Sharp Corner of Key Way

The shaft’s side view below also has some information to contribute. Figure 6 shows the part turned up on its side, the break is at about a 45 degree angle. This indicates some torsion and/or bending was also going on at the time of failure.

Figure 6: Shaft Side View

The information from the failed part also helps the investigator with the next step of data collection. The question now is “How long was the shaft in service?” Why does this matter to us?

If the shaft had been in service two+ years and the loads were minimal, what would need to change to make the failure happen? Most investigative teams would likely be interested in any operational changes, possibly an increase in throughput. Obtaining process data before and during the time of the event would be able to verify if the throughput was increased or not.

Another possibility (hypothesis) might be there was shaft corrosion present, severe enough to lower the material’s fatigue strength, which could cause the failure at normal operating loads.

Now let’s say the shaft was in service for only two days, what direction would be most logical to pursue now?

Usually when the service time is short, investigative teams would focus on data collection first, related to the shaft itself. Some concerns to investigate could include:

1.    Was the correct shaft installed?

2.    Was the shaft ordered from stores or stock?

3.    Was the shaft material correct for the service?

Obtaining the equipment specifications, maintenance manuals, drawing, etc. and comparing them to the actual shaft dimensions and chemistry, would also help to validate the hypothesis.

The other direction would be to investigate everything about the installation. Some things of concern might be:

1.    Was the procedure followed?

2.    Was a procedure even used?

3.    Was it aligned properly?

4.    Was a baseline vibration signature performed after initial start-up?

These things are more wrapped around human installation errors.

Let’s move back to the logic tree and see if any of the possibilities can be eliminated. The part information provided from basic material failure allows the investigator to determine what did and did not happen.

Figure 7: Updated Logic Tree Based on Evidence Collected

The hypothesis blocks also have a number in the bottom, left-hand corner, which is what we call the ‘Confidence Factor’ of the verification method used. The ‘0’ indicates with 100% confidence that erosion, corrosion, and overload did NOT occur. The ‘5’ indicated with 100% confidence that fatigue DID occur (See Figure 7).

Since progression marks can only occur in fatigue failures, the lead investigator is 100% confident in the fatigue conclusion. Since there were no visual signs of erosion, corrosion, or overload, the lead investigator is 100% confident they did not occur.  

Figure 8: Continued ‘How Could?’ Questioning in the Logic Tree

The question now is “How could the pump shaft be fatigued?’ (Figure 8). There are only two kinds of fatigue: thermal and mechanical. To verify thermal or low cycle fatigue, the analyst could view the part using a powerful microscope and visually see the effects of heat fluctuations. In this case, thermal fatigue signs were not present, and it was ruled out.

Figure 9: Updated Logic Tree Based on Evidence Collected

The “How could?” question is again applied, “How could the pump shaft have mechanically fatigued?” There are four possible all-inclusive hypotheses: Misalignment, Unbalance, Resonance, and Looseness. 

Figure 10: Drilling Down Through the Physics of Failure

This level can be verified using vibration data taken before the failure occurred. Vibration trend data is extremely valuable for verifying types of vibration. A vibration signature history can quickly validate all four hypotheses.

The results verify there was misalignment present before the failure. This becomes a physical root cause. If the misalignment was not present, then the shaft would not have failed. The physical root was determined using basic understanding of what the fracture surface was telling us. 

If we were to continue, the next levels down would get into the human and latent root causes (which is another article that explores human reasoning and decision-making). In the interim, if you’d like to download this free Personnel Error Diagnostic Chart, it would help you complete your RCA and navigate the human and latent roots.

Figure 11. Personnel Error Diagnostic Chart

For more information on our evidence-based training on our PROACT RCA Methodology, Why Parts Fail and Human Error Reduction Techniques, please click for detailed workshop data sheets. Online instructor-led and recorded workshops are available as well, just give us a call if travel restrictions are in place.

Author’s Note:  I want to thank my mentor’s Neville Sachs and Edward Sullivan for teaching me how important it is to pay attention to the fracture surface and question everything about that surface. There are many things we have visually seen like hammer marks, vise marks, chisel marks, welded nuts used as an additional set screw, and the like. The markings can tell a story about how much of a problem the equipment has been for maintenance.

About the Author: Mark Latino is currently President of Reliability Center, Inc. (RCI). Mark came to RCI after 19 years in corporate America. During those years a wealth of reliability, maintenance, and manufacturing experience was acquired. He worked for Weyerhaeuser Corporation in a production role during the early stages of his career. He was an active part of Allied Chemical Corporations (Now Honeywell) Reliability Strive for Excellence initiative that was started in the 70’s to define, understand, document, and live the reliability culture until he left in 1986. Mark spent 10 years with Philip Morris primarily in a production capacity that later ended in a reliability engineering role. Mark is a graduate of Old Dominion University and holds a BS Degree in Business Management that focused on production and operations.

Filed Under: Articles, on Systems Thinking, The RCA

About Robert (Bob) J. Latino

Robert Latino is currently a Principal at Prelical Solutions, LLC, along with his brother Ken Latino. Bob was a Founder and CEO of Reliability Center, Inc. (RCI), until it was acquired in 2019. RCI is a 50-year-old Reliability Consulting firm specializing in improving Equipment, Process and Human Reliability. Mr. Latino received his Bachelor’s degree in Business Administration and Management from Virginia Commonwealth University. For any questions, please contact Bob at blatino@prelical.com

« Five Ways to More Effectively Facilitate Tree Diagrams
LNG Regasification Facilities: Consider High Pressure Releases »

Comments

  1. JD Solomon says

    March 31, 2022 at 4:46 AM

    Meaty article. Good information.

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

logo for The RCA article series image of BobArticle by Robert (Bob) J. Latino
Principal at Prelical Solutions, LLC

in the The RCA article series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Posts

  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success
  • 3 Types of MTBF Stories
  • ALT: An in Depth Description
  • Project Email Economics

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy