Articles

Find all articles across all article series listed in reverse chronological order.

HALT shouldn’t be “H.A.L.T.”

The fist part of this post you likely already know. It’s the second part that may be helpful.

I love HALT testing and almost always include it in a new program. With a team new to the concept there is always the hurdle of getting them to understand it’s value. It’s not intuitive to see value in destroying a product with stepped stresses. Often these stresses aren’t even apart of the product’s use case. Why vibrate a lab electronic device that spends its entire life on a bench? Seeing the failure mode is a capacitor flying off the PCB at 50 G’s doesn’t reinforce the value of the activity without some explanation.

[Read more…]

by Fred Schenkelberg Leave a Comment

Building and Using Pareto Charts

You may have heard of the 80/20 rule. The idea is that 80% of the wealth is held by 20% of the population. As an Italian economist, Vilfredo Pareto made this observation that became generalized as the

Pareto Principle: 80% of outcomes are due to 20% of causes

For field returns, for example, we may surmise that 80% of the failures are due to 20% of the components, for example. This principle helps us to focus our work to reduce field failures by address the vital few causes that lead to the most, or most expensive, failures. [Read more…]

by Nancy Regan Leave a Comment

Is Reliability Centered Maintenance (RCM) just for airplanes?

Have you ever heard that Reliability Centered Maintenance (RCM) is just for airplanes? Don’t believe it! Here’s how RCM came to be one of the most effective (and universal) Reliability improvement efforts an organization can implement. [Read more…]

by Christopher Jackson Leave a Comment

How Company Visions Make or Break Reliability

Every organization needs to be able to explain ‘why’ it is here. In fact, an organization’s ‘why’ is . Check out Simon Sinek. One of his favourite phrases is that ‘people don’t buy what you do – they buy why you do it.’

People buy Apple products because of they enjoy how they interface with devices that are part of a bigger and seamless ecosystem. All their competitors try to emulate this. Amazon is all about customer experience. [Read more…]

by Robert (Bob) J. Latino Leave a Comment

Viewing a Hospital as a System: A Reliability Perspective

Veteran professionals in the Reliability field view every business as a system. All systems have 1) inputs, 2) a transformation of those inputs in some form or fashion and 3) outputs. Just think about that for a minute; think about your schools, banks, manufacturing plants, small businesses…they are all systems. [Read more…]

by Fred Schenkelberg Leave a Comment

A Two-Step Approach to Get Better at What You Do

How is it that some people continue to get better at managing meetings, designing complex test plans, making presentations, or solving problems? How in general do people improve their performance over time at something? [Read more…]

by James Kovacevic Leave a Comment

Establishing the Frequency of Failure Finding Maintenance Inspections

Preventing The Consequences Of A Hidden Failure From Devastating Your Organization.

Ever wonder how some of the worst industrial disasters occur? It is usually the result of multiple failures. Failure of the primary system and failure of the protective systems. Ensuring the protective system(s) are not in a failed state should be of utmost importance to any organization. But how often should we test the protective systems to ensure the required availability?

Establishing the correct frequencies of the inspection/ testing activities of these protective system(s) is critical to not only the success but safety and reputation of any organization. Too infrequently and the organization is at risk of a major incident. Too frequently, and the organization is subjected to excess planned downtime, an increased probability of maintenance induced failures and increased maintenance cost.
This article will continue the discussion on establishing the correct inspection frequency in a maintenance program. There are three different approached to use, based on the type of maintenance being performed;

This article will focus on Failure Finding Maintenance.

What Are Protective Systems, Hidden Failures and Failure Finding Maintenance

A protective system or device is a system or device which is designed to protect and mitigate or reduce the consequences of failure. These consequences may be safety, environmental or operational in nature. These devices or systems are designed to;

Alert – to potential problem conditions (i.e. alarm)
Relieve – prevent failure conditions causing greater problems (i.e. pressure relief valve)
Shutdown – stop a process to prevent greater problems from occurring (i.e. motor overload)
Mitigate – alleviate the consequences of a failure (i.e. fire suppression equipment)
Replace – continue to provide a function by an alternative means (i.e. back up pump)
Guard – prevent an accident from occurring (i.e. E-Stop)

Knowing what a protective device or system is, you may see that if a pressure relief valve became corroded and seized in the closed position, it would not be evident to the operators. This is a hidden failure. A hidden failure can be defined as; a failure which may occur and not be evident to the operating crew under normal circumstances if it occurs on its own. Obviously, this could lead to significant consequences if the tank that the pressure relief valve is protecting is overpressurized. This is where failure finding maintenance comes in.

Failure-finding maintenance is a set of tasks designed to detect or predict failures in the protective systems or devices to reduce the likelihood of a failure in the protective system and the regular equipment from occurring at the same time. So how to do you determine how often the protective systems should be checked for failure? Establish the frequency using a formula.

Establishing Failure Finding Maintenance Frequencies Using Formulas

There is a single formula that will take into consideration of all variables to establish the failure finding interval (FFI); FFI = (2 x M_TIVEx M_TED) /M_MF

Where;

M_TIVE= MTBF of the protective device or system
M_TED= Mean Time Between Failure of the Protected Function
M_MF= Mean Time Between Multiple Failures

So if we use an example from RCM2, we can see how this works; The users of a pump and a standby pump want the following from the system.

The probability of a multiple failure to be less than 1 in 1000 in any one year (M_MF)
The rate of unanticipated failures of the duty pump is 1 in 10 years (M_TED)
The rate of unanticipated failure of the standby pump is 1 in 8 years (M_TIVE)

Therefore the correct failure finding interval would be;

FFI = (2 x 8 x 10) / 1000
FFI = (160)/1000
FFI = 0.16 years
0.16 years x 12 months = 2 months

This indicates that the standby pump must be checked every two months to verify it is fully operational. If this check is not performed, the likelihood of a multiple failures increases.

Lastly, if the failure of the protective device can be caused by the failure finding task itself, there is another approach to be used, which is beyond the scope of this article.

Do you have a program in place to check your protective systems? If not, are you aware of the risk that your organization is exposed to? Take the time to determine your protective systems and establish your failure finding tasks.

Remember, to find success; you must first solve the problem, then achieve the implementation of the solution, and finally sustain winning results.

I’m James Kovacevic
Eruditio, LLC
Where Education Meets Application
Follow @EruditioLLC

References;

by Adam Bahret Leave a Comment

Using Statistical Confidence to Protect your Family

A helpful analogy in communicating the concept of statistical reliability confidence is the “new airplane” example. Let’s say I am developing an entirely new technology for airplanes. The airplane has an engine that has never been used before for air travel; a fusion engine. I tell the world that this new airplane with a fusion engine will have a reliability of 99.99999999%, the highest any airplane has ever had. It’s not possible to fully demonstrate this reliability until every single unit of this airplane has been produced, used to full life, and the full fleet is retired. As long as one is still flying it can add or subtract from the reported reliability number. So, how do we make decisions at product launch regarding the design’s reliability? No products have yet to be produced or used by customers, so how can we trust the design?

[Read more…]

by Ray Harkins Leave a Comment

The Value of Transferrable Skills

Several times during my career, as I’ve listened across the interview table to an eager and aspiring job candidate, I’ve realized this person has very few skills that will readily transfer into the position I’m offering. They spent years working at their previous company. But how much work will immediately apply to our open position. And conversely, how much work will be required to get them up to speed? And in that moment, I mentally moved them to the bottom of my “viable candidates” list. Why? Because that candidate has too few transferrable skills. [Read more…]

by Greg Hutchins Leave a Comment

In Risk Management, It’s the Destination, Not the Journey

Guest Post by Andrew Sheves (first posted on CERM ® RISK INSIGHTS – reposted here with permission)

A while back, I felt that pretty much everything was out of sync and I was highly disorganized. There was a growing list of undone things whether that was around the house, at work, with my family, or at the places where I volunteer.

It was definitely time for a reorganization. [Read more…]

by James Reyes-Picknell Leave a Comment

Rapid PM Program Deployment

In the first installment of this series we described the basics behind proactive maintenance and some of the considerations users need to make.

The second installment describes RCM – the “gold standard” for reliability program development and physical asset related risk management. This article is for those who are in “panic” or “fire fighting” mode. If you don’t have a proactive program, equipment runs until it breaks and you can’t seem to get ahead of it, then this one is for you. In a few cases you may have a PM program but your not getting the results you want. You could be overdoing overhauls, not doing enough predictive work, not following up on what you find, or the maintenance actions are simply inappropriate for the failures that occur in your circumstances. [Read more…]

by Nancy Regan Leave a Comment

To Achieve Your Equipment Reliability Goals, Begin at the Beginning…

Unless you live in Fantasyland, there’s no silver bullet for achieving your equipment Reliability goals. Start at the beginning, with Reliability Centered Maintenance and watch your Reliability program come to life. [Read more…]

by Fred Schenkelberg 1 Comment

Beware of the Type III Error

There is a type of error when conducting statistical testing that is to work very hard to correctly answer the wrong question. This error occurs during the formation of the experiment.

Despite creating a perfect null and alternative hypothesis, sometimes we are investigating the wrong question. [Read more…]

by Robert (Bob) J. Latino Leave a Comment

Don’t Healthcare Workers Fatigue Like Anyone Else?

Fatigue regulations and guidelines have been long established in aviation, transportation and the nuclear industries (just to name a few). The science is solid supporting the correlation between human fatigue, and poor decision-making/poor responsiveness.

So why aren’t such fatigue regulations required in healthcare as a matter of standard like in other industries? Is there something different about the physiology and/or anatomy of a healthcare worker versus a pilot, truck/bus driver or nuclear operator? [Read more…]

by Fred Schenkelberg Leave a Comment

One Does Not Simply Do Reliability

Some time ago when talking with someone I just met, the conversation turned to what we did for a living. I mentioned being a reliability engineer, and his response: “Oh, yes, we do reliability”. Curious, as I’m not sure that I ‘do reliability’, we then talked about what he meant.

The conversation revealed that they had a list of tasks that they accomplished for each product under development. They did tests and reviews of the results. A lot of testing. They did FMEA and HALT. He believed the engineers did derating or stress/strength calculation. He didn’t know about process stability with vendors or internal manufacturing lines.

They did stuff, which meant they did reliability.

[Read more…]