Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / Calculating System Availability

by Fred Schenkelberg Leave a Comment

Calculating System Availability

Calculating System Availability

How to Properly Calculate System Availability

Recently received a request for my opinion concerning the calculation of system availability using the classic formula

$$ \displaystyle \large A=\frac{MTBF}{MTBF+MTTR}$$

The work is to create a set of goals for various suppliers and contractors to achieve. The calculation values derive from vendor data sheets and available information concerning MTBF and MTTR. The project is in the design phase; thus, they do not have working systems available to measure actual availability.

How would you go about improving on this approach?

Context

From what I understand, the system is actually a collection of systems supporting something like a bus station within a transit system. There are sound, surveillance, ticketing, passenger information, and similar systems that all connect to a fleet management system. The desire is to have all of these systems operate at a specific station with at least 99.8% availability.

As mentioned, this project is just setting specifications at this point and thus cannot measure actual performance.

To make the system availability problem a little more difficult, let’s say this is a new transit agency and this is the first installation of bus stations. As using similar systems actual performance, although different technology, may provide a baseline set of measurements. Other useful sets of data from existing systems my be customer or staff complaints or repair data, spares data, or work order tickets. Adding a new station to an existing system has the benefit of a rich set of data to draw from to create specifications.

Proposed Set of Calculations

As outlined above the calculation of availability is just the ratio of uptime over total time. What matters is what is included in both set of terms. Component vendors rarely know the operating expectation or conditions thus may report generic or complied MTBF and MTTR values.

For a sound system the amplifier vendor may report an MTBF value based on a Mil Hdbk 217 parts count prediction using all default setting. Or they may have used reported field failures and a few assumptions about operating time for all shipped units. The data sheet rarely specifies the source of the report data.

Repair times, the MTTR value, is problematic for a vendor to report accurately. If they report repair time it is often with the assumption of perfect and immediate diagnostics and presence of technician with tools and spare parts. MTTR in an ideal world just doesn’t happen, yet is really the only thing the vendor can control and report.

Using vendor reported MTTR will inflate the availability value as the MTTR value will be artificially low. Vendors do not have the expected maintenance and spare part policies thus unable to include them in the reported MTTR value.

Why Use MTBF and MTTR at all?

My first issue is the use of MTBF, of course. We’re not interested in the mean but rather in the onset of failures and how failure patterns may change over time. Should we plan on replacing all station sound system amplifiers every five years?

The question was missing durations of interest. Is the availability over an hour, a day of use, a week, a year, or 20 years? That matters, as availability over any specific duration will likely be different and entail a different set of risks and expectations, along with cost of ownership considerations.

If over a year, buy highly reliable components that have very low chance of failure over the first year of operation. We can avoid any concern over maintenance time as it will rarely if at all occur. We can safely defer any maintenance action till after the time period of interest.

If over 20 years, we increase the risk of uncertainty around any reliability or repair time predictions. Plus it increases the likelihood of significant wear out failure mechanisms will appear — even in electronic systems.

Using only MTBF to represent the reliability of a component or system smooths out and ignores the changing nature of the failure rates over time. Some components will have early life failures thus a decreasing failure rate for some period of time. While others will wear out, sometimes relatively quickly for specific environments and use conditions. It is rare to have a stable situation where failure rates are well modeled using only MTBF across the board.

What to Do Instead

Here’s my approach to create a set of reliability and maintenance specifications that should provide meaningful guidelines for the vendors and contractors building the bus station.

Model the system using a reliability block diagram (RBD). Create enough detail to capture repairable and replaceable elements of each system that makes up a bus station.

Perform FMEA or some form of risk assessment to determine the system’s most important elements. Identify critical to operation elements, as that provides a focus for finding the best available reliability and maintenance. Identify single-point failure elements that would shut down or seriously hamper operations. Identify the information you need to gather or improve the accuracy to populate the system RBD.

If vendors are the only source of data, ask for better data than MTBF and MTTF. Ask for the support data and distributions. Ask for effects of operating time and environmental conditions. Ask for the expected failure mechanisms and any models related use and stress to life distribution parameters. Ask for the Weibull, lognormal or appropriate distribution that describes the changing nature of reliability or repair times over time.

If vendors do not have sufficient data, talk to other transit agencies with similar systems. Ask for the data, and/or help them analyze their data so both may benefit with the analysis results. Check with professional organizations and check the literature for information on system performance and expected failure mechanisms.

Populate the system RBD with the best available data and look for areas that still need data or improved data (if the uncertainty and impact on results are large).

Estimate the cost or impact of downtime. System availability goals are high for system which provide value and have to do so regularly often with little or no interruption. Knowing the value of an hour of operation helps you balance the costs of building and maintaining the system with value of the system.

For absolutely essential data conduct accelerated life tests to create time to failure distribution estimates. Run experiments and get that data you need to create a meaningful system availability estimate.

Finally, estimate the cost of ownership for each sub system – just because it is repairable doesn’t mean your organization will have the funds to do so. It may be worthwhile to spend more up front to avoid major recurring expenses for years to come.

There are an infinite number of ways to achieve a given system availability target. Containing the options with cost of ownership and ease of maintenance may help to find the right solution.

What would you do?

How would you, or have you, approach this kind of problem? What would you recommend? Leave your suggestions and comments below.

Related articles

Reliability and Availability

MTBF free Availability

Popular Reliability Measures and Their Problems

Filed Under: Articles, NoMTBF

About Fred Schenkelberg

I am the reliability expert at FMS Reliability, a reliability engineering and management consulting firm I founded in 2004. I left Hewlett Packard (HP)’s Reliability Team, where I helped create a culture of reliability across the corporation, to assist other organizations.

« Why Agile Software Often Fails and What to Do About It
8D in RCA »

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

[popup type="" link_text="Get Weekly Email Updates" link_class="button" ][display_form id=266][/popup]

The Accendo Reliablity logo of a sun face in circuit

Please login to have full access.




Lost Password? Click here to have it emailed to you.

Not already a member? It's free and takes only a moment to create an account with your email only.

Join

Your membership brings you all these free resources:

  • Live, monthly reliability webinars & recordings
  • eBooks: Finding Value and Reliability Maturity
  • How To articles & insights
  • Podcasts & additional information within podcast show notes
  • Podcast suggestion box to send us a question or topic for a future episode
  • Course (some with a fee)
  • Largest reliability events calendar
  • Course on a range of topics - coming soon
  • Master reliability classes - coming soon
  • Basic tutorial articles - coming soon
  • With more in the works just for members
Speaking of Reliability podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Dare to Know podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Accendo Reliability Webinar Series podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success
  • 3 Types of MTBF Stories
  • ALT: An in Depth Description
  • Project Email Economics

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy