Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / A Primer on Probability Distributions

by Dennis Craggs Leave a Comment

A Primer on Probability Distributions

A Primer on Probability Distributions

The most common types of engineering data are measurements. There can be a few, thousands, or millions of data points to analyze. Without analytic tools, one can get lost in the data.

This article presents

  • Dotplots
  • Data if frequently clustered about a central value and displays variation.
  • Frequency histograms
  • Distribution characteristics
  • Normal Distributions

Dotplot

A common way to display data is plot the counts in intervals. Here is a plot of 150 measurements where each measurement was rounded to the closest even integer value. Here each interval is a bin centered at an even integer with a width of two units. Each dot represents a measurement so the stack height represents the count. This produced a dotplot, figure 1.

Dotplot

Figure 1

The plot shows where the data is centered and how the data is distributed. The center is about 100 and the data range from 92 to 108. Note that the data has an approximate bell shape, a typical bell curve described by the normal distribution.

If one relied on numerical indices calculated from the raw data, then the average is 100.08 and the standard deviation is 2.75, which is consistent with the graphical results.

Histograms

A frequency histogram can be constructed to be similar to the dot plot. In this example, bars are used for the counts, figure 2.

Frequency Histogram

Figure 2

With more data, the height of the bars will trend higher. To compensate for this change, use a percent scale on the y-axis to create a percentile histogram, figure 3.

Percent Histogram

Figure 3

The percent histogram is useful when comparing different size datasets. From this histogram, one could project that 16% of the future data may fall in the 100±0.5 interval. Alternatively, 96% of the data is between 95 and 107.

It needs to be stated that the histogram shape and the bar heights are affected by bin sizes. For example, Figure 3 used a bin width of 2 centered at each even integer. If the bin width is decreased, the counts in each bin generally decrease. The histogram in figure 4 uses the same data as the histogram in figure 3.

Percent Histogram 2

Figure 4

These percentages are dependent on the interval width. Wider intervals will probably contain more of the data and therefore a larger percentage of the data. To standardize for this effect create a density plot by dividing the percentile by the interval width. Now the y-axis the density units are probability/width-units. The concept is similar to describing the loading on a beam in lb/ft, i.e., a density function. In this case, the histogram height scale changes, figure 5.

Density Histogram

Figure 5

If a lot more data is available, as in big data analytics, a finer resolution of the shape of the density curve is achieved by increasing the number of intervals while simultaneously decreasing their width. This leads to the creation of a smooth curve that can follow a mathematical function, known as a probability distribution.

Probability Distributions

If a function f(x) can be found to describe the smoothed density plot, then f(x) is the probability density function, which is sometimes abbreviated pdf. It describes the probability density. Probability P is the area under the function f(x) over an interval bounded by x1 and x2. Equation 1 describes the relationship,

$$ F(x_1,x_2)=\int_{x_1 }^{x_2}f(\phi )d\phi $$

(1)

Distributions have some important characteristics:

  1. The probability density f(x) is defined for all values of x.

$$ f(x) \geq 0 $$

(2)

  1. The cumulative probability F always increases, i.e., is a monotonically increasing function. So for x2>x1,

$$ F(x_2) \geq F(x_1) $$

(3)

  1. The sum of all probabilities equals 1 is equivalent to stating the area under the distribution curve equals 1,

$$ \int ^\infty_{-\infty} f(\phi)d\phi = 1 $$

(4)

From these relationships, limits that contain a fraction, P, of the population can be calculated as in equation 1. If the data is normally distributed, equation 6,

$$ f(x)={\frac{1}{{\sqrt{2\pi}}\sigma}}e^{-(x-\mu)^2/2\sigma^2}$$

(5)

The probabilities can be calculated as the area under the curve, but this is difficult. Instead a standard normal is used, equation 6,

$$ f(z)={\frac{1}{\sqrt{2\pi}}}e^{-z^2/2} $$

(6)

The integration limits in the original data are mapped to the standard normal integration limits, equations 7 and 8,

$$ z_1={\frac{(x_1-\mu)}{\sigma}} $$

(7)

and

$$ z_1={\frac{(x_2-\mu)}{\sigma}} $$

(8)

Here the z-value is the number of population standard deviation from the population average. Tables of z-values for the cumulative of the standard normal, equation 5, are readily available in most statistical references, however, there are several common variants. One variant provides the lower tail or cumulative probability for different z-values. Another type provides the upper tail probability. These can be seen graphically in figure 6.

Standard Normal

Figure 6

The lower tail probability is the area under the distribution curve highlighted in red. The upper tail probability is the remaining area in white. One has to understand how to use any statistical table of normal probabilities.

For the above plot, a z-value (x shown in the plot) of 0.385 corresponds lower tail probability of 0.65 or 65%. Similarly, the upper tail probability is 0.35 or 35%.

Usage

So how can these tables and plots be used?

A common problem is to define limits that contain some percentage, say 99%, of the population. First, one has to calculate the sample average and standard deviation. An approximation of these limits can be calculated by assuming the sample average equals the population average and the sample standard deviation equals the population standard deviation. Because the distribution is symmetrical about the average, we want 1% of the population to be apportioned equally to the upper and lower tails of the distribution. Therefore, we want to calculate 0.5% and 99.5% population limits. The equations 7 and 8 may be inverted for the calculation yielding equations 9 and 10.

$$ x_1=\mu+z_1\sigma $$

(9)

and

$$ x_2=\mu+z_2\sigma $$

(10)

From a table of the lower tail standard normal, z1=-2.575 and z2=+2.575. With the average = 100.08 and the standard deviation = 2.75, the 99% population limits are calculated as 93.64 and 106.52.

In future articles, I will expand on this topic to introduce other probability distributions, probability plots, hypothesis tests, and other topics. These methods may be applied to both large and small datasets.

Conclusions

  • Dotplots are a simple and easy way to view data.
  • Graphics help to determine if the data is normally distributed.
  • Frequency, percentage, and density histograms lead to probability distributions.
  • The normal plot and probability tables may be used to determine limits that contain desired percentiles of the data.
  • Both numerical and graphical methods are used for analysis.

 

Dennis Craggs

Big Data, Quality, and Reliability Consultant

810-964-1529

Filed Under: Articles, Big Data & Analytics, Data, on Tools & Techniques Tagged With: Statistics distributions and functions

About Dennis Craggs

I am a mechanical engineer, programmer, and statistician. My career spanned the aerospace, NASA and Teledyne CAE, and automotive, Ford and Chrysler, industries. After retirement, I started consulting as a reliability engineer and am writing articles on big data analytics. My primary goal is to assist young engineers and consult on product development issues.

« Prototypes and Learning
What is the Best Reliability Training for Me? »

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Big Data & Analytics series Article by Dennis Craggs

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • Gremlins today
  • The Power of Vision in Leadership and Organizational Success
  • 3 Types of MTBF Stories
  • ALT: An in Depth Description
  • Project Email Economics

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy