Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / A Pivotal Moment

by Gabor Szabo Leave a Comment

A Pivotal Moment

A Pivotal Moment

In this week’s edition, we dig into a scenario you’ve probably run across when working in Excel or other software, for example Minitab —at least I have, many times.

Say you have a complete dataset. The data has been collected, and you’re now getting ready to run plot it or run some sort of analysis on it. It should be plug and play, but it ends up not being the case as the data is not formatted in the right way, and you’re not able to run your analysis (it happens pretty frequently if you ask me).

The data “not being in the right format” can mean many, many different things, and we won’t go over all of them, at least not today. One example though is when you have been given a dataset where one of the variables you’re interested in was captured in groups and over multiple columns. This may have made sense as the data was being captured, but for the purposes of analysis, this sometimes presents a challenge. Let’s take the below example where data were collected from a manufacturing process characterization study. This below table shows that data on an important product characteristic, the tensile strength of a bond, were captured from three consecutive machine cycles from each of the five manufacturing lines, over three different time periods, and the results from each manufacturing line were captured in a different column.

A table of data showing hourly readings from 2pm thru 9pm of ite's on five different production lines

This makes perfect sense from the perspective of the engineer jotting things down during the study, but for analysis purposes, you want all of the observations in one column and another column to say which line each observation belongs to (just like with the Time column).

Now, you could fix this in Excel by either doing some copy-and-paste jockeying or trying to transpose the data. If the dataset is relatively small, this can be done pretty quickly, but for larger datasets, you’re probably looking at wasting time.

In R, you can do this in a couple relatively simple steps. In one simple step, actually, but we’ll also do more than just the bare minimum.

You can download the script here if you want to follow along.

Let’s start with loading the packages we’ll be using; the tidyverse and sherlock. The tidyverse is a collection of packages such as dplyr, readr, stringr, ggplot2, tidyr etc., each of which having its own set of functionality. Sometimes you’ll want to load just one of them, say dplyr, but a lot of times simply loading the tidyversewill do the trick.

We’ll then use sherlock’s load_file() function, as described last week, to read in the dataset from my GitHub repository. Make sure to change the filetype argument to “.csv” as this is a .csv file.

We’ll save the dataset into memory as bond_strength_wide (referring to the current wide format of the data)—we’ll transform it into the format we need in no time though!

# WEEK 003: A PIVOTAL MOMENT

# 0. LOADING PACKAGES ----
library(tidyverse)
library(sherlock)


# 1. READ IN DATA ----

bond_strength_wide <- load_file("https://raw.githubusercontent.com/gaboraszabo/datasets-for-sherlock/main/bond_strength_wide.csv", 
                                filetype = ".csv")

Here are the steps:

  • Use dplyr’s mutate() function to create a column called Cycle, then use base R’s rep() function to create the numbering sequence 1:3, then convert it into a factor variable.
  • Use tidyr’s pivot_longer() function to convert the data frame into a long format.
  • Use dplyr’s mutate() function to update the Line column. The update we are going to make is removing the string “Line “ from each observation to make
  • Then use dplyr’s arrange() function to arrange by the columns Time and Line. This is not really necessary for plotting purposes but provides a way to verify that you did everything right.
# 2. DATA TRANSFORMATION ----

bond_strength_long <- bond_strength_wide %>% 
    
    # 2.1 Create "Cycle" column ---- 
mutate(Cycle = rep(1:3, times = 3) %>% as_factor()) %>% 
    
    # 2.2 Convert to long form ----
pivot_longer(cols = 2:6, names_to = "Line", values_to = "Bond_Strength") %>% 
    
    # 2.3 Remove "Line" string ----
mutate(Line = Line %>% str_remove("Line ")) %>% 
    
    # 2.4 Arrange by Time and Line variables (not absolutely necessary) ----
arrange(Time, Line)


bond_strength_long

Let’s run bond_strength_long by moving the cursor over it and hitting Ctrl + Enter.

# A tibble: 45 × 4
   Time  Cycle Line  Bond_Strength
   
<chr> <fct> <chr>         <dbl>
 
 1 2PM   1     1             21.9 
 
 2 2PM   2     1             19.0 
 
 3 2PM   3     1             19.0 
 
 4 2PM   1     2             23.4 
 
 5 2PM   2     2             20.6 
 
 6 2PM   3     2             17.7 
 
 7 2PM   1     3             20.6 
 
 8 2PM   2     3              9.53
 
 9 2PM   3     3             18.2 
10 2PM   1     4             18.7 
# ℹ 35 more rows
# ℹ Use `print(n = ...)` to see more rows

It looks like everything checks out, and we are ready the plot the data.

As a first step, we are going to plot using a technique called stratification where the data are grouped and plotted by a specific variable. We do this to separate the data by that variable and ultimately to see what kind of differences exist between the groups.

We are going to unleash the power of ggplot2 to do this.

# 3. PLOT DATA ----

# 3.1 STRATIFICATION BY LINE ----
bond_strength_long %>% 
    # calling the ggplot() function and creating a blank "canvas" (coordinate system)
    ggplot(aes(x = Line, y = Bond_Strength)) + 
    # adding a geom (visual)
    geom_point(size = 3.5, color = "darkblue", alpha = 0.3) +
    # adding a custom theme
    theme_sherlock() +
    # customizing labels
    labs(title = "Bond Strength Characterization Study", 
         y     = "Bond Strength [lbf]")

Let me briefly explain the above code.

First, we take the bond_strength_long dataset and pipe it (using %>%) into the ggplot() function and specify what we want plotted on the x and y axes. This time we want to plot Bond Strength on the y axis and the Line variable on the x axis. This essentially creates a blank “canvas” for the plot — nothing has been plotted just yet.

After that, we use the + operator to add different layers to the base canvas. First, we add the function for the type of visual we want to create, which in this case is a scatterplot type of plot (geom_point() function).

We then add a custom theme, which dictates the appearance of the plot. There are many out-of-the-box themes one can use, for example theme_minimal(), theme_bw() etc.; I tend to use the theme_sherlock() from the sherlock package for a minimalistic look.

And finally, to top things off, we add a title and a custom call-out for the y axis using the labs() function.

This is what the plot looks like:

A plot comparing the bond strength data of five production lines.

Not too bad for a first try, right?

Now, we are going to further simplify what we just did by using a ready-made plotting function called draw_categorical_scatterplot(), which achieves the same thing while adding additional functionality.

# 3.2 STRATIFICATION BY LINE USING DRAW_CATEGORICAL_SCATTERPLOT() FUNCTION ----
bond_strength_long %>% 
    draw_categorical_scatterplot(y_var = Bond_Strength, grouping_var_1 = Line, plot_means = TRUE, size = 3.5)
A categorical scatter plot comparing the bond strength data from five production lines with mean values indicated with a dash.

With this function you can also:

  • Group (stratify) by up to three variables in a nested fashion
  • Plot the means of each group or connect them with a line
  • Set whether each group is displayed in a separate color
  • Set the size and transparency of the data points
  • Add jitter (a little bit of noise along the x axis) to deal with overplotting

To recap, in this week’s edition we went over how to do a basic data pivoting transformation and plotted the data both using basic ggplot2 functions and a built-in function called draw_categorical_scatterplot().

That’s it for this week—we will continue exploring this dataset next week.

Thanks for reading this newsletter! Reach out with any questions you may have.

Download this week’s script here.

Filed Under: Articles, on Tools & Techniques, R for Engineering

About Gabor Szabo

Gabor is a quality engineering and data professional and has over 15 years of experience in quality having worked in the medical device, automotive and other manufacturing industries. He holds a BS in Engineering Management.

Gabor's specialties and interests include problem solving, statistical engineering and analysis through the use of data, and developing others.

« Example ACE 3T Maintenance Procedure with Inspection & Test Plan (ITP)
Improve Critical Equipment Reliability and Reduce Maintenance Costs »

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

R for Engineering logo Photo of Gabor SzaboArticles by Gabor Szabo
in the R for Engineering article series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Posts

  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success
  • 3 Types of MTBF Stories
  • ALT: An in Depth Description
  • Project Email Economics

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy