Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / Understanding Software FMEA

by Carl S. Carlson 6 Comments

Understanding Software FMEA

“My software never has bugs. It just develops random features.” Anonymous

More and more mechanical and electrical systems include software integration. The FMEA methodology applies very well to software as well as hardware. It is possible to include software functionality in the System FMEA as part of the functional descriptions. However, for complex software functionality such as embedded control systems, it may be useful to perform a separate software FMEA.

What is a Software FMEA?

Software FMEA is a type of Design FMEA that analyzes the software elements, focusing on potential software-related deficiencies, with emphasis on improving the software design and ensuring product operation is safe and reliable during useful life.

Pete Goddard, in his paper “Software FMEA Techniques” (RAMS 2000) wrote, “Software FMEA assesses the ability of the system design, as expressed through its software design, to react in a predictable manner to ensure system safety.”

Software FMEA is similar to System or Design FMEA, with the exception that Software FMEA focuses primarily on software functions.

What are objectives for Software FMEA?

Objectives for software FMEA include:

• Identifying missing software requirements
• Analyzing a system’s behavior as it responds to a request that originates from outside of that system
• Identifying (and mitigating) single-point failures that can result in catastrophic failures
• Identifying features that need fault-handling strategies
• Identifying software response to hardware anomalies

What is the difference between Software FMEA and hardware or electrical FMEAs?

All types of FMEAs are grounded in similar principles and fundamentals. The primary difference with software FMEA is the focus of the analysis.

For example, in the case of a software module that provides a warning for low-level of windshield washer fluid, the Item in the Software FMEA can be “Low washer fluid level warning software” and the Function might be “Communicate low fluid level to instrument panel.” A single-line excerpt from this software FMEA example is in the updated SAE J1739 FMEA Standard, due to be published in January 2021.

Other differences between Hardware and Software FMEAs include:

1. Software failure modes are analyzed from unique modes of operation, compared to hardware
2. Software has unique set of failure mode and cause categories
3. Software FMEA analyzes how software reacts to hardware failures
4. Can be minor difference between Design FMEA and Software FMEA template

Software Modes of Operation

Understanding the unique modes of operation for software helps ensure nothing important is messed. [1]

1. Functional (new requirements or significant revision)
2. Interface (complex hardware or software interfaces)
3. Detailed (high-risk items)
4. Maintenance (older legacy system prone to errors)
5. Usability (when user misuse can impact system reliability)
6. Serviceability (mass distribution or difficult installation location)
7. Vulnerability (risk from hacking or abuse)
8. Production (system schedule is disrupted by software production process)

[1] Recognition is due to Ann Marie Neufelder, whose book “Effective Application of Software Failure Modes Effects Analysis – 2nd Edition” is an excellent resource for anyone performing Software FMEAs. (Copyright © 2017 by Softrel, LLC., published by Quanterion Solutions, Inc., Utica, New York)

Software Common Cause Categories

What are some of the unique software cause categories? [1]

1. Missing error detection (missing in specifications, design and code)
2. Specifications missing important details (review process can miss what is NOT in code, design or requirements)
3. Faulty state transitions (leads to dead states, inadvertent / prohibited state transitions, etc.
4. Faulty logic (when software fails to consider all possibilities from logic perspective)
5. One size fits all error recovery (detecting and recovering from errors should be specific to circumstances)

Level of detail

Software FMEA can be applied at the system functional level, the detailed design (logic level) or at the code level.

Similar to System FMEAs, software system FMEAs should be performed early in the design process, as soon as the software design team has determined initial software architecture and transferred the functional requirements to the software design. Software FMEAs at the detail-level are typically done later in the software design process, when detailed design description and preliminary code exists.

What precedence-guidelines can be used to address software problems?

When identifying Recommended Actions, Software FMEA teams can use the following precedence suggestions in order to ensure the software is fail-safe and accomplishes its functions, with heightened focus on potential hazardous outcomes. Special attention should be paid to identify any need for new or modified software requirements. [2]

a. Design out the failure mode
b. Use redundancy to achieve fault tolerance
c. Go into fail-safe mode (for example, the ability to “limp home”)
d. Implement early prognostic warning
e. Implement training to reduce risk for human error

[2] Reference article titled “Software FMEA: A Missing Link in Design for Robustness,” by Dev Raheja, copyright 2003 by SAE International

What is “fail safe” and how is it used in software?

The software should always go to the desired state no matter what causes the software to malfunction. If a desired state is not identified in the specification, the software should always go into fail-safe state. A fail-safe state is one that, in the event of failure, responds in a way that will cause minimal harm to other devices or danger to personnel.

Software programmers should consider fail-safe strategies to help ensure systems are safe and robust.

What standard is used?

There is no universally agreed-upon standard for performing software FMEAs, although it is the subject of various standards committees. Practitioners who will be performing software FMEA projects are encouraged to read the software standards and articles that apply to their projects. In addition, it is essential for any software FMEA team to include a subject matter expert in the specific software systems that are being analyzed.

As mentioned above, Ann Marie Neufelder’s book “Effective Application of Software Failure Modes Effects Analysis” is an excellent resource for anyone performing Software FMEAs.

Tips

Understanding software potential trigger events can improve the effectiveness of Software FMEA and ensure important issues are addressed.

1. Consider potentially critical hardware failures or user misuse when performing Software FMEA. Software must be robust to potential hardware failures or user misuse, and must default to safe condition.
2. Potentially critical hardware failures or user misuse can be derived from System FMEA, lower-level FMEAs, Hazard Analysis or Fault Tree Analysis.
3. Lack of robustness or lack of default to safety condition can be input to software FMEA cause descriptions. Note, Software FMEA causes should be expressed as a potential software deficiency.

Next Article

Hazard analysis is the process of examining a system throughout its life cycle to identify inherent safety related risks. The next article in the Inside FMEA series discusses the application of Hazard Analysis.

[display_form id=415]

Filed Under: Articles, Inside FMEA, on Tools & Techniques

About Carl S. Carlson

Carl S. Carlson is a consultant and instructor in the areas of FMEA, reliability program planning and other reliability engineering disciplines, supporting over one hundred clients from a wide cross-section of industries. He has 35 years of experience in reliability testing, engineering, and management positions, including senior consultant with ReliaSoft Corporation, and senior manager for the Advanced Reliability Group at General Motors.

« What is Depreciation?
Leaders & Losers »

Comments

  1. Angel Montañez says

    April 29, 2022 at 8:45 PM

    Good evening
    Will you have information from an RFMEA or Reverse FMEA?
    Thank you
    Angel

    Reply
    • Carl Carlson says

      April 30, 2022 at 8:29 AM

      Hello Angel,

      I am aware of “Reverse FMEA,” but from what I have read and understand, it is not a method that I advocate.

      One definition is “Reverse FMEA (R FMEA) is a structured process of continuous improvement that is aimed at ensuring the permanent updating and progress of an FMEA (Failure Mode and Effect Analysis) study. This risk assessment method is based on the actual situation and not predictive reliability.”

      My personal view is that if Design FMEA and Process FMEA are done well, there is no need to use this variant.

      Carl

      Reply
  2. Paul Ortais says

    May 23, 2022 at 4:25 AM

    Hello Carl,
    As a real-time control system architect, most projects I contributed were badly impacted by software production methods, seemingly unavoidable, to treat every problem as an event-driven issue.
    I understand this is culturally linked to the existing consumer computers, but real-time is something quite orthogonal: time dependency first vs data dependency.
    Failures (“bugs”) occur when reality didn’t comply with the designer’s expectations = the specification. Everything was ok according to SW FMEA, so I examined how things are specified, coded and checked and, by the way, all is consistent with a list of asynchronous “services”, generally based on extremely thick, opaque OS and libraries. But Time is far from a priority, even when some events are timed.

    When a µP and FPGA are coupled for redundancy, and the FPGA programmed by a HW engineer, the result is a straight, clean and extremely robust, synchronous FSM implementation, where the actual STATE the system is best known, and entering “else” states immediately reported in the best designs.

    In SW failures, it can take months to discover what the program was doing “around” the moment of the failure; the notion of system state is squarely ignored, as well as what is done in the suspected libraries…

    I ended up specifying SW as sync FSMs, with all the observability I need from a HW design, and working so saved me unimaginable amounts of setbacks and frustration.

    I am interested in reading your opinion on this
    Thank you,
    Paul

    Reply
    • Carl Carlson says

      May 25, 2022 at 6:04 PM

      Hello Paul,

      I appreciate your sharing your software experiences and insights. I do have a few comments, and they will be centered around SW FMEA.

      I like your statement, “Failures (“bugs”) occur when reality didn’t comply with the designer’s expectations = the specification.” However, you also say, “Everything was ok according to SW FMEA.” Based on what you write in the post, it seems the SW FMEA could have been more helpful. For example, did the SW FMEA identify missing or incorrect software requirements? This is one of its objectives. You ended up specifying SW as sync FSMs, and mention that saved a lot of time. I’m wondering why the SW FMEA did not help with that improved specification.

      Also, you say, “In SW failures, it can take months to discover what the program was doing “around” the moment of the failure.” I’m sure you are right. You go on to say, “the notion of system state is squarely ignored, as well as what is done in the suspected libraries…” SW FMEA should not ignore system state and should examine relationship with libraries.

      I would take a look at the quality of the SW FMEA. I have not examined the SW FMEA, so this is just a suggestion.

      Thanks again for your excellent post.

      Carl

      Reply
  3. Kelly says

    June 10, 2024 at 10:48 AM

    In assessing cybersecurity risks, can Software FMEAs incorporate sufficiently concerns regarding vulnerabilities of the software? Thank you.

    Reply
    • Carl S. Carlson says

      June 12, 2024 at 3:46 AM

      Hello Kelly.
      Software FMEAs can incorporate concerns about vulnerabilities of the software. The key is to ensure the software function description and requirements include adequate and proper requirements for robustness against vulnerabilities. The software FMEA analyzes each function (including requirements) and identifies potential failures, effects and causes, including risk prioritization and recommended actions. An additional reference to my article is the book “Effective Application of Software Failure Mode and Effects Analysis,” by Ann Marie Neufelder, which includes software vulnerabilities.
      Hope that is helpful.
      Carl

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Articles by Carl Carlson
in the Inside FMEA series

[popup type="" link_text="Logo Info" ]

Information about FMEA Icon

Inside FMEA can be visually represented by a large tree, with roots, a solid trunk, branches, and leaves.

- The roots of the tree represent the philosophy and guiding principles for effective FMEAs.
- The solid trunk of the tree represents the fundamentals for all FMEAs.
- The branches represent the various FMEA applications.
- The leaves represent the valuable outcomes of FMEAs.
- This is intended to convey that each of the various FMEA applications have the same fundamentals and philosophical roots.

 

For example, the roots of the tree can represent following philosophy and guiding principles for effective FMEAs, such as:

1. Correct procedure         2. Lessons learned
3. Trained team                 4. Focus on prevention
5. Integrated with DFR    6. Skilled facilitation
7. Management support

The tree trunk represents the fundamentals of FMEA. All types of FMEA share common fundamentals, and these are essential to successful FMEA applications.

The tree branches can include the different types of FMEAs, including:

1. System FMEA         2. Design FMEA
3. Process FMEA        4. DRBFM
5. Hazard Analysis     6. RCM or Maintenance FMEA
7. Software FMEA      8. Other types of FMEA

The leaves of the tree branches represent individual FMEA projects, with a wide variety of FMEA scopes and results. [/popup]

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Posts

  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success
  • 3 Types of MTBF Stories
  • ALT: An in Depth Description
  • Project Email Economics

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy