Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / First Steps with Data

by Fred Schenkelberg 1 Comment

First Steps with Data

First Steps with Data

Once word got out that I was taking graduate-level courses in statistics, I dreaded the knock on the door. Colleagues, some of which I knew and others from some far reach of the company, would ask if I could take a look at their data. I didn’t learn the necessary first steps with a stack of data in class.

I’ve lost count of the number of data sets I’ve reviewed and analyzed. I know there are important considerations and questions before creating the first plot. Let’s review the essential first steps you should take when presented with data.

Is there a decision related to this data?

Why are you looking at this data? Now, I find it difficult not just to jump in and start the analysis, yet, which analysis are you attempting to accomplish? A great question to answer is about the decision this dataset is to inform. Is this a comparison, an optimization, or an exploration?

If the question is “Will this design create a product that meets our reliability goal?” that helps to guide your next steps. If the decision is about which vendor better meets our requirements, that suggests a range of analysis options.

The type, quality, and quantity of data depend on the decision the analysis is to inform. Thus, when first encountering a dataset, start with what information you will need from the data. Plus, assess if the data is sufficient to provide the necessary information.

Data Collection and Errors

Let’s say the dataset provided has 1,000 entries, and all are times till each of those 1,000 products failed. This would be complete if the organization only shipped 1,000 units. If they shipped 100,000 units, what happened to the other 99,000? While just failure data is fine for some situations, it is not enough information to estimate the impact on future warranty claims.

An often-forgotten aspect of data collection is measurement error. Every measurement system has some error included. None are perfect. Understanding the measurement system used to collect the data may prompt additional questions on the quality of the data and the magnitude of the measurement error.

Another detail to understand concerns the completeness of the data. Is the data a random or not-so-random sample? Or does the dataset include measurements from all items in the population? This affects the type of analysis and how to interpret the results.

Consider the measurement frequency. If the measurement system records events as they happen, that is different than a system that checks for events once a month. Interval data requires different handling and analysis.

Data format and organization

To this point, we haven’t looked at the data within the dataset. Take a look at the data now. This is the start of the data clean-up process. Things like missing data or recording date variations impact various software packages’ ability to use the data. Are the missing data a clerical error or deliberate?

To understand the dataset, hopefully, the columns should have informative labels. “Column 1,” “column 2”, etc., doesn’t provide the necessary information about what is within the column. Dozens of columns with 4-digit numbers without labels or a legend, if not useful.

While some software packages can handle data presented using Nevada charts, not all can. This may require organizing the data for the intended analysis.

One thing that has often caused me problems is a column of numbers with a few data entries stored as text. These are hard to spot, yet when expecting numbers, most software packages balk when confronted with a field with text.

Exploring the data

Ok, you understand the decision this data is to inform and understand the dataset, including how it was collected, measurement error, and how the data is organized. Great. Now it’s time to start the analysis. Or is it?

At this point, I recommend plotting the data in a few different ways. Visualize the data to identify basics like the shape or structure of the data. Time series plots, XY plots, and others provide basic information concerning the nature of the data.

For example, if plotting a column by date collected, and there are long gaps between clusters of measurements, it may indicate the need to understand why that occurred. Another example is sudden changes in the magnitude of the data. The data starts with single-digit numbers and then jumps to 7-digit values. Was that a change in the measurement system scale being used, or does it accurately reflect what happened?

The first step is to know the data, its history, and its behavior. Then do the actual work to conduct the analysis.

Filed Under: Articles, Musings on Reliability and Maintenance Topics, on Product Reliability

About Fred Schenkelberg

I am the reliability expert at FMS Reliability, a reliability engineering and management consulting firm I founded in 2004. I left Hewlett Packard (HP)’s Reliability Team, where I helped create a culture of reliability across the corporation, to assist other organizations.

« The #1 Thing Facilitators and Technical Experts Get Wrong About Qualitative Assessments
How To Use CMMS To Support FRACAS Methodology »

Comments

  1. Larry George says

    February 22, 2023 at 4:33 PM

    Thanks for your observations on data. It’s good to ask Why are you doing what you do with the data? What would it be worth to get more data?
    You’re right, not all data come in form or relational db: records in rows, factors in columns. Inferring missing values scares me.
    Stay tuned for my next article. “While some software packages can handle data presented using Nevada charts, not all can.’ Those software packages make Kaplan-Meier reliability estimates and they usually include the variances of the reliability estimates and maybe even confidence bands. Don’t believe them.

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Article by Fred Schenkelberg
in the Musings series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success
  • 3 Types of MTBF Stories
  • ALT: An in Depth Description
  • Project Email Economics

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy