Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Podcasts
  • Courses
    • Your Courses
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
  • Barringer Process Reliability Introduction Course Landing Page
  • Upcoming Live Events
You are here: Home / Articles / The Sum of Squares Concept

by Fred Schenkelberg 2 Comments

The Sum of Squares Concept

The Sum of Squares Concept

The data analysis course professor tended to focus on the practical application of statistics.

Avoiding statistical theory was fine with me. Learning statistics for me was on how to solve problems, optimize designs, and understanding data.

Then one lecture started with the question, “Why do we sum squares regression analysis, ANOVA calculations, and with other statistical methods?” He paused waiting for one us to answer.

We didn’t. I feared the upcoming lecture would include arcane derivations and burdensome theoretical annotations. It didn’t.

The Variance Formula

You may have noticed the population parameter that describes the spread of the data, the variance, is squared. It is the second moment of the data, as the skewness is the third moment. I digress.

The lecture on why we sum squares had to do with the numerator of the variance formula.

$$ \displaystyle\large {{\sigma }^{2}}=\frac{\sum\limits_{i=1}^{n}{{{\left( {{y}_{i}}-\bar{y} \right)}^{2}}}}{n}$$

After determining the center of mass of a data set, the mean, statisticians wanted to have a convenient way to describe the spread of the data.

The spread of the data could be described by the range, the maximum value minus the minimum value, yet that wasn’t very descriptive of the sparseness or denseness of the data set.

One idea was to calculate the distance from one data point to the next from the minimum value to the next least value. I do not recall why that didn’t catch on.

Another idea was to determine the distance of each data point to the origin, or zero. If you try this idea you quickly see that it has the mean value built into the result. Data sets centered at zero would result is smaller calculated results than data sets centered far away from zero.

Ah ha! Let’s use the mean of the dataset instead of zero. That way the resulting set of distances are as if centered at zero.

Normalized is what some call this process.

So, give it a try:

  1. Find or create a small set of measurements
  2. Determine the data set’s mean
  3. Subtract the mean from every data point
  4. Sum the differences

Wait a minute, the result is close to zero. Looking at your data, some differences are positive and some negative. If you started with a symmetrical data set, half will be above zero and half below. The sum of the differences will always be near zero.

Looking at your data, some differences are positive and some negative. If you started with a symmetrical data set, half will be above zero and half below. The sum of the differences will always be near zero.

One way to solve this is to use the absolute value of the differences. That might work and once again I suspect the lecture drifted too deep into the theory end of statistics.

Another way to remove negative values is to square them. Squaring a negative number removes the negative sign every time. Cool. Yet now the differences are much larger than the actual differences.

The units are squared, too. No worries, we can use the positive square root of the result to define standard deviation.

Thus we square the differences between the individual values and the mean to avoid the sum being zero all the time.

The story the professor told is not likely the way the formula for variance came about way back when. It was entertaining and 30+ years later I do fondly recall that lecture.

Regression Analysis and Errors

The need to navigate the open seas lead to the development of the least squares method for regression analysis. The sailors needed the capability to interpret the location of stars to their location on the surface of the earth.

In 1805 Legendre published New Methods for the Determination of the Orbits of Comets. [Actually published in French, Legendre, Adrien-Marie (1805), Nouvelles méthodes pour la détermination des orbites des comètes]. Within the discussion, he described the least squares method for determining regression parameters.

This method significantly improved navigation, too.

The concept involved finding the line which minimized the distance between the line and each data point. Like the variance calculation, squaring the differences provided a meaningful result.

Note: Carl Friedrich Gauss (yes, that Gauss) published work in 1809 describing the least squares method and expanding it to include elements of the normal distribution, naturally. Robert Adrain also published his independent work on the same method in 1808.

The time was right for the development of the least squares method. For more about the development of least squares method for regression analysis see Aldrich, J. (1998). “Doing Least Squares: Perspectives from Gauss and Yule”. International Statistical Review. 66 (1): 61–81.

ANOVA and Sum of Squares

ANOVA uses the sum of squares concept as well. Let’s start by looking at the formula for sample variance,

$$ \displaystyle\large {{s}^{2}}=\frac{\sum\limits_{i=1}^{n}{{{\left( {{y}_{i}}-\bar{y} \right)}^{2}}}}{n-1}$$

The numerator is the sum of squares of deviations from the mean. The numerator is also called the corrected sum of squares, shortened as TSS or SS(Total). Meanwhile, we call the denominator the degrees of freedom.

Now do a bit of algebra and write sample variance as

$$ \displaystyle\large {{s}^{2}}=\frac{\sum\limits_{i=1}^{n}{y_{i}^{2}}-\frac{1}{n}{{\left( \sum\limits_{i=1}^{n}{{{y}_{i}}} \right)}^{2}}}{n-1}$$

There are two terms in the numerator, the first is called the raw sum of squares. The second term is called the correction term for the mean.

In a one-way ANOVA, we are interested in the effect of each treatment. We can separate the variance contributed by each treatment along with the unaccounted for deviations (a source of variation we often attribute to error).

Using only the numerator, the Corrected Sum of Squares, we then separate the contribution of the variance into separate terms.

$$ \displaystyle\large \begin{array}{l}SS(Total)=SST+SSE\\\sum\limits_{i=1}^{k}{\sum\limits_{j=1}^{{{n}_{i}}}{{{\left( {{y}_{ij}}-{{{\bar{y}}}_{\bullet \bullet }} \right)}^{2}}=}}\sum\limits_{i=1}^{k}{{{n}_{i}}{{\left( {{{\bar{y}}}_{i\bullet }}-{{{\bar{y}}}_{\bullet \bullet }} \right)}^{2}}+}\sum\limits_{i=1}^{k}{\sum\limits_{j=1}^{{{n}_{i}}}{{{\left( {{y}_{ij}}-{{{\bar{y}}}_{i\bullet }} \right)}^{2}}}}\end{array}$$

Where

  • $- {{y}_{ij}}-$ is the jth sample observation from population i.
  • $- {{n}_{i}}-$ is the number of sample observations selected from population i.
  • N is the total number of samples
  • $- {{y}_{i\bullet }}-$ is the sum(total)  of the sample measurements observed from population i.
  • $- {{y}_{\bullet \bullet }}-$ is the sum (grand total) of all the sample observations or the sum of   values
  • $- {{\bar{y}}_{i \bullet }}-$ is the average of the   sample observations from population i
  • $- {{\bar{y}}_{\bullet \bullet }}-$ is the (grand) average of all sample observations

Summary

A memorable lecture, a bit of history, and plenty of squaring differences. Hopefully, this provides a little context and helps you understand the use of sum of squares in the various statistical processes you use on a regular basis.

Hopefully, this provides a little context and helps you understand the use of sum of squares in the various statistical processes you use on a regular basis.

Filed Under: Articles, CRE Preparation Notes, Probability and Statistics for Reliability Tagged With: Regression analysis (Weibull analysis), Statistics concepts

About Fred Schenkelberg

I am the reliability expert at FMS Reliability, a reliability engineering and management consulting firm I founded in 2004. I left Hewlett Packard (HP)’s Reliability Team, where I helped create a culture of reliability across the corporation, to assist other organizations.

« The Fundamental Set of Reliability Engineering Tools
Isolation Valves for Different Services »

Comments

  1. Filomena says

    January 17, 2017 at 5:29 AM

    I liked it!!

    Reply
    • Fred Schenkelberg says

      January 17, 2017 at 8:07 AM

      Thanks, Filomena, much appreciated. Ready to take on ANOVA calculations now? cheers, Fred

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

CRE Preparation Notes

Article by Fred Schenkelberg

Join Accendo

Join our members-only community for full access to exclusive eBooks, webinars, training, and more.

It’s free and only takes a minute.

Get Full Site Access

Not ready to join?
Stay current on new articles, podcasts, webinars, courses and more added to the Accendo Reliability website each week.
No membership required to subscribe.

[popup type="" link_text="Get Weekly Email Updates" link_class="button" ][display_form id=266][/popup]

  • CRE Preparation Notes
  • CRE Prep
  • Reliability Management
  • Probability and Statistics for Reliability
  • Reliability in Design and Development
  • Reliability Modeling and Predictions
  • Reliability Testing
  • Maintainability and Availability
  • Data Collection and Use

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy