Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / Four Reasons to Rethink your Reliability Improvement Journey

by Andrew Kelleher 5 Comments

Four Reasons to Rethink your Reliability Improvement Journey

Four Reasons to Rethink your Reliability Improvement Journey

The term “reliability improvement journey” is well-established in the chemical process industry. The decade-long, tortuous journey of one company is shown in terms of operational availability (i.e., production) and relative maintenance cost at Figure 1.

Chart

One organization reliability improvement journey, plotting operational availability and relative maintenance costs over 16 yeasrs
Figure 1 : The reliability improvement journey, adapted from [1].

The length of a company’s reliability journey reflects the maturity of the “reliability culture”. Here, the term “reliability culture” may be described technically as “the extent to which each decision vector aligns with the company’s target vector”.

It logically follows that the reliability journey may be significantly shortened simply by improving the quality of each decision made by the reliability organization. But how?

Four reasons for a long and arduous reliability journey are presented below. It is intended that these reasons prompt you to critically rethink how you approach the reliability engineering problem in your plant.

Reason 1 – An important role in your reliability organization is vacant.

The production system is, as the term implies, a “system”. Comprised of assets, process units, operational logic, storage tanks, supply chains, failure mechanisms, maintenance processes, etc..

This calls for a “systems engineering approach” to the reliability problem, which in turn requires the appointment of a “Systems Reliability Engineer” (SRE). This is an engineering discipline of its own, which – to my knowledge – is not explicitly taught for application in the unique context of a chemical production plant.

As depicted schematically at Figure 2, the role of the SRE is to align and direct the reliability improvement efforts of the reliability organization. That is, to ensure that they are working on the right topics and the ensure that each decision vector aligns with the company’s target vector.

listing of members of the reliability organization including systems reliability engineer, plant manager, process engineer, production expert, logistics expert, reliability engineer, corrosion engineer, inspection engineer, and maintenance engineer.
Figure 2 : An (incomplete) example of a reliability organization.

Precisely how the SRE accomplishes the abovementioned tasks of direction and alignment are largely outlined in Reasons 2 to 4, described below.

Reason 2 – Your targets are poorly defined.

The performance of your production system is described in multiple dimensions (e.g.: production volume and maintenance cost) and varies from year to year according in a probabilistic function that you have probably not characterized. Further, the achieved performance in a given year may be largely determined by events that are outside of your control. It is likely that the reality of this situation is not adequately accounted for in your target-setting process or in your reliability improvement plan.

The adoption of a systems reliability engineering approach requires that the current stochastic performance of the production system be estimated as a basis for target-setting; refer Figure 3. The “target vector” is defined as the gap between the current performance and the target performance and is the basis for aligning the efforts of the reliability organization. 

Plot of poor targets using pdf of avaiability distributions.
Figure 3 : Visualization of the current and target system performance in terms of a Probability Density Function (PDF).

Figure 3 demonstrates that targets in stochastic systems are best specified in terms of two parameters, i.e.: FAIL and TARGET criteria. This practice enables the required performance improvement to be visualized and quantified.

Reason 3 – Your strategy to reduce “waste” is incomplete.

A typical reliability improvement plan is comprised almost solely of methods that focus on reducing “waste”. That is, hazards that may lead to a production loss. These methods can be characterized in terms of being proactive or reactive in nature, as shown at Table 1.

Table 1 : Examples of proactive and reactive reliability improvement methods.

ProactiveReactive
Failure Modes and Effects Analysis (FMEA)Reliability-Centered Maintenance (RCM)Risk-Based InspectionRoot Cause AnalysisDefect EliminationBad Actor Program

In the absence of an overarching systems reliability approach, a reliability improvement plan that focuses solely on reducing “waste” is likely to result in a long, arduous reliability journey, for the following reasons:

  • The proactive methods tend to be largely theoretical exercises with no strong coupling to the system performance vector(s). It is therefore practically not possible to reach an “optimum” solution. That is, it is not possible to align the decision vector with the target vector. 
  • The reactive methods target a sub-set of the possible future system hazards which, once alleviated, will be quickly replaced by newly recognized hazards. This is a characteristic of the complex stochastic production system. Hence, the extent to which the anticipated gains will be achieved in practice may be highly uncertain. Further, experience has shown that significant knowledge and experience may be required to develop robust and economically viable solutions. The extent to organizations have access to the required resources (technical, financial and time) is highly variable.

A systems reliability engineering approach will additionally consider the application of capacity “growth” strategies, such as debottlenecking and expansion projects. These types of improvement measures are usually able to be tightly coupled to system performance targets and are certainly able to be planned with a higher degree of confidence.

The task of the SRE is to ensure that company resources are wisely invested. This may be done by quantifying the impact of each improvement measure in terms of stochastic system performance. 

Reason 4 – You are using the wrong tool for the job.

Whilst most reliability literature is concerned with “product” reliability engineering, the described methods (e.g., Weibull analysis and FMEA) find relatively little application in a process plant environment. At first glance, the reason for this would seem to be the ratio of (many) Assets to (few) Engineers. However, the real reason is much more interesting. It is because the traditional methods were developed for application in “simple” and “complicated” systems, whereas a process plant is a “complex” system.

The response to this situation has been to trivialize the complex system behavior, for example in the form of a risk matrix. This approach, however, prohibits the realization of optimal outcomes. An alternative response would be to apply methods suited for application in complex systems. For example, simulation is absolutely necessary to make optimal decisions in complex systems.

The results of a high-level simulation of a process plant, representing the current system performance, are presented at Figure 4.

A picture containing chart

Description automatically generated
Figure 4 : Left: High-level Block Flow Diagram (BFD) of the production system; Right: Estimated stochastic performance of the production system in relation to the performance targets.

The developed model also provides a basis for evaluating the merits of proposed measures for improving production system performance. You decide where you are headed: promotion, demotion or mediocrity!

Summary

A technical, systems engineering approach to the process plant reliability engineering problem is neither well-described in the literature, nor well-supported by appropriate tools in the practice.

RAMS Mentat GmbH has developed an innovate technical and systems engineering approach – and supporting tool – that enables the reliability and safety performance of an entire production system to be optimized with consideration of capital investment, operational and maintenance cost constraints.

One more good reason to rethink how you approach your reliability improvement journey!

References

[1] “Reliability – How Industry Leaders Take Advantage of this Often-Overlooked Improvement Opportunity,” Solomon Associates, 31 05 2021. [Online]. Source: https://www.solomoninsight.com/blog/reliability-how-industry-leaders-take-advantage-of-this-often-overlooked-improvement-opportunity.

Filed Under: Articles, on Maintenance Reliability, Process Plant Reliability Engineering

About Andrew Kelleher

Andrew Kelleher is a Materials and Systems Reliability Engineer with many years of industrial experience (1999 to 2021) in diverse fields of safety and reliability engineering at renowned companies in Australia, England, and Germany, including: The Welding Institute, ExxonMobil, QinetiQ Aerostructures, Bayer and Covestro.

« Failure Analysis – Mitigation
Thoroughly Modern Maintenance »

Comments

  1. James Reyes-Picknell says

    December 13, 2021 at 4:46 AM

    I think there’s more than just a technical alignment that’s needed. The article does a great job explaining the technical challenge, but isn’t it a bigger hurdle getting senior management aligned? Changes involved in solving this complex problem require multi-disciplinary approaches (as shown) and those need the support of the multi-disciplinary managers who typically have competing priorities. At the level of engineer and manager where we often work, we are dealing with those, including middle-level managers, who can say “no” to what we are doing or to their participation. They don’t have the authority to say “yes” and have little or no motivation to stick their necks out. We rarely seem to deal with the senior levels who can actually say, “yes”, and sponsor it to happen.

    Reply
    • Andrew Kelleher says

      December 13, 2021 at 7:46 AM

      That is a very nice comment, which I agree reflects the current reality. In the absence of a tool capable of quantifying the complex system behaviour, decisions are often made using “intuition”. This almost inevitably leads to frustration since the decision basis is unclear. The right approach (and tool), however, can make the decision-making process much more transparent. Every “decision” can be formulated as a choice between two options, i.e.: Option A (Status Quo) or Option B (Alternative Future). It follows logically that a decision is always made. The expected outcome associated with each option can be quantified (via simulation, with agreed data from the cross-functional experts) in terms of multiple and/or competing performance criteria. In the case shown, the competing criteria are “Lost Production Cost” and “Maintenance Cost”. There is, however, no reason why other performance criteria cannot be simulated. Competing criteria, however, are no reason for not making a decision. The beauty is that the expected outcomes of Option A and Option B are quantified (based on data and without emotion) and documented; the best possible basis for a decision. Hence, the decision-maker does not need to rely so heavily on his “intuition” but rather on the combined knowledge of his cross-functional team, quantified in terms of the estimated impact on system performance. Don’t be afraid the challenge me with a concrete example!

      Reply
  2. John Bessman says

    December 13, 2021 at 9:49 AM

    Hi Andrew,

    Terrific article and I found myself nodding in agreement many times. One question I had was how you describe the difference(s) between a “complicated” and a “complex” system. I think I intuitively understand it, but it’s a concept I’ve struggled to “explain upwards” to our decision makers.

    Thanks!

    Reply
    • Andrew Kelleher says

      December 13, 2021 at 10:31 AM

      Hello John,

      it is a good question, which I have also researched. The behaviour of a simple system (e.g. a car key) is easily knowable and reproducible. The behaviour of a complicated system (e.g. a car) can be “known” with many structured expert steps, e.g. via defining and characterising the heirachical structure of components. Solutions that work with complicated systems, however, do not work well with complex systems (e.g. car traffic), which involve too many unknowns and too many interrelation factors to reduced to rules and processes.

      Complicated systems are, for example: Homogeneous, Linear, Deterministic, Static, Independent and Without Feedback. In contrast, complex systems are: Heterogeneous, Non-Linear, Stochastic, Dynamic, Interdependent and With Feedback.

      The “Stacey” matrix (https://drawingchange.com/project/simple-complicated-and-complex-decision-making-new-visual/) provides a nice visual depiction of types of decision-making strategies in different system types. Simulation is not listed, though is suitable for complex systems. Thinking of the current supply chain problems, I think now is definitely the time to be “focusing on stability”; in my opinon pretty difficult to do well, without simulation.

      Best regards, Andrew Kelleher.

      Reply
  3. Christos Christoglou says

    December 22, 2021 at 4:14 AM

    I don’t know what I liked more, the article itself or your answer to John Bessmann and explanation about complicated and complex systems.
    Thanks for both!

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Headshot of Andrew KelleherArticles by Andrew Kelleher
in the Process Plant Reliability Engineering article series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Posts

  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success
  • 3 Types of MTBF Stories
  • ALT: An in Depth Description
  • Project Email Economics

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy