Mattingly "AI & Prompt Design: Large Language Models"
Msa
1. Intro to MSA of Continuous Data
This is the eighth in a series of articles about MSA. The focus of this article will be on assessing the
measurement reproducibility between two measurement systems that must measure the same
characteristic.
Does the following scenario seem familiar? An operator performs an online test on a product. The test
shows a failure to meet specifications. This is an expensive product, so the line supervisor takes the part
to an online tester located on an adjacent line and retests it. This time it passes. The supervisor instructs
the operator to go ahead and use the part. Both of them lose confidence in the first measurement device.
But which one provided the correct result?
There are several approaches that may be used to assess the reproducibility of two measurement
devices.
One approach that is commonly used when the measurement devices are fully automated is to perform a
standard R&R study and replacing the Operators with the Measurement Devices. Reproducibility is now
tester to tester reproducibility. The operator*part interaction becomes the tester*part interaction. While this
approach does quantify the reproducibility of the two devices, it has the drawback of providing little
additional information if a significant difference between the two is noted. It is also limited to fully
automated gages. You do have the option of adding gage as a third factor (Parts, Operators, Gages) and
analyzing the resulting designed experiment using Analysis of Variance (ANOVA).
A second approach is known as the Iso-plot. This is a very simple graphical approach. An x-y plot is
constructed with equal scales, and a 45º line is drawn through the origin. Parts are selected throughout
the expected measurement range. Each part is measured using both gages. Each part is then plotted on
the graph using the value measured by one gage on the x-axis and the value measured by the other gage
on the y-axis. Ideally, the coordinates of all parts will lie on the 45º line. If the points fall consistently above
or below the line, one gage is biased in respect to the other. The Iso-plot provides an excellent visual
assessment of the reproducibility of the two gages, but does not provide quantifiable results. A linear
regression analysis of the data used to create the Iso-plot will provide quantifiable results. Ideally, the
regression constant should equal zero and the slope should equal one. The regression output will provide
the relationship between the two gages as well as confidence and prediction limits.
A third approach is the Bland Altman plot. Parts are selected throughout the expected measurement
range. Each part is measured using both gages. Calculate the differences between the gages (Gage1 –
Gage2), the mean of each pair of measurements and the mean of the differences. Plot the differences on
an x-y plot with the differences on the y-axis and the mean of the paired measurements on the x-axis.
Draw a horizontal line for the mean of the differences. Draw two more horizontal lines at + / - 1.96
standard deviations of the differences. Approximately 95% of the differences should fall within the limits.
The mean line indicates the bias between the two gages. A trend indicates that the bias changes with the
size (a linearity difference between the gages). An increase (or decrease) in the width of the pattern with
size indicates that the variation of at least one of the gages is dependent on size. If the variation within the
limits is of no practical importance, the two gages may be used interchangeably.
Do not overlook the differences between gages when evaluating your measurement systems. Your
operators will be quick to discover gages that inconsistently accept and reject product between the gages.
They will then lose confidence in them and the test itself.
Within product variation (WPV)
This is the eleventh in a series of articles on MSA, and the first in advanced MSA topics. The
focus of this article will be on how to handle within product variation (WPV).
WPV is significant variation in the form of a product. Significant means that it detectable by
your measurement device. Examples of variation in form include shaft diameter variation
2. due to lobes, or a taper/barrel/hourglass shape; thickness variation due to parallelism and
more.
WPV can have a significant impact on the acceptability of a measurement system. While
technically part of the product not the measurement system, it definitely impacts the
measurement system. It cannot be isolated using the standard AIAG Gage R&R study, but
requires a special approach. The AIAG MSA manual touches on this and presents a Range
method for calculating the WIV effect. I show an ANOVA approach that allows an evaluation
of all of the interactions with WPV.
When I first decided to start this blog, I planned to demonstrate it using Minitab's General
Linear Model (GLM). Fortunately, the new Minitab 16 made this easier with the Expanded
Gage R&R Study feature. Note: Everything I show here can be duplicated with earlier
versions of Minitab using GLM. The graphs must be created using the Graph commands and
some creativity, and the metrics will have to be calculated by storing results and manually
performing the calculations.
You can follow along by opening the attached PDF file.
• First, determine whether WPV is a potential issue. You may know this already from
process knowledge, or suspect it from high Repeatability variation or an
Operator*Part interaction.
• Create a Gage Study worksheet. Either copy the format used in the attachment, or
use Minitab to create a full factorial design for 3 factors.
• Identify specific locations on the product to be measured by all operators. These
can be selected at random or by design. If selected by design, this must be entered
as a Fixed factor into Minitab.
• Perform study
• Analyze following the attached file. I recommend starting with all factors and 2-way
interactions in the model. Review the p-values and remove all 2-way interactions
with p-values greater than 0.05. If all interactions involving the Operator are
removed, then look at the p-value for the Operator. If the Operator p-value is
greater than 0.05, remove it also.
• Interpret the Session Window the same as a standard MSA for %SV, %Tol, etc.
• Interpret Graph1 the same as a standard MSA.
• Create Graph2 and interpret similar to Graph1. You will see two new graphs, the
Measurement by WIV graph and the Part * WIV graph. The Measurement by WIV
graph shows differences in the location measured on the part, and the interaction
shows whether this location to location difference varies by part.
Now that you understand the impact of WIV on your measurement system, what do you do
with that knowledge? It depends.
If the WIV varies randomly on the product (i.e., it is unpredictable), you cannot prevent it
from affecting your measurement system. You can recognize it as part of your total variance
equation (acknowledgements to bobdoering), but it will always affect your measurements.
If the WIV is predictable by location, you can take this into account and improve your R&R
results by specifying measurement locations in your work instructions.
Last advice: Although it is possible to remove the effects of WIV from your R&R results, SPC
3. and capability results, it will still have an impact on function. For example, a shaft with lobes
may not fit a bearing correctly, so beware.
This is the tenth in a series of articles about MSA. The focus of this article will be on minimizing the
number of MSA studies by creating families of gauges.
A frequent question in the Cove is: “Must I perform an MSA on every gauge/part combination?” The
answer is: “No. You may create families of gauges, and perform MSA studies by family.”
Wait! Not so fast! What constitutes a family of gauges? Well, there are certain criteria that must be met:
• Same gauges
• Same product characteristics
• Same tolerances or process variation depending on use of gauge
Another, less restrictive approach that could be used:
• Similar gauges
• Similar product characteristics
• State the smallest tolerance or process variation (standard deviation) that the gage family may
be used to measure and still achieve an acceptable R&R result
Let’s cover each of these in more depth:
Same/similar gauges:
Conservative approach: Same brand and model of gauge
Alternate approach: Similar gauges (brand or model not relevant)
• Same type (e.g., calipers, micrometers, 3 – point bore gage, etc.);
• Same range (e.g., 0 – 6 inch, 0 – 12 inch, etc);
• Same display (e.g., digital, dial, vernier, etc.);
• Same resolution (e.g., 0.0001, 0.05, etc.)
• Same technology, if relevant (e.g., scaled, rack and pinion gear, electrostatic capacitance
technologies for calipers)
• Same anvils, if relevant (e.g., outside, inside, step anvils for calipers)
Same/similar product characteristics:
Conservative approach: Same features
Alternate approach: Similar features (e.g., diameter, width, step etc.) within a specified range. Be cautious
when establishing the range. You may need to perform a series of MSAs to establish how large this range
may be. The impact of size will usually manifest itself through changes in the within-part variation, or by
affecting the ease with which an operator can handle the part and gauge together.
Same tolerances or process variation:
Conservative approach:
4. • Same tolerance if gauges are used for inspection
• Same process variation if gauges are used for SPC
Alternate approach:
• If gauges are used for inspection, state the smallest tolerance at which an acceptable %
Tolerance may still be achieved
• If gauges are used for SPC, state the smallest process variation standard deviation at which an
acceptable % Study Variation may still be achieved
Once you have created a gauge family, document it. Create a record for each gauge family that clearly
describes the gauges, characteristic and size range. The gauges could be identified by listing all serial
numbers, gauge calibration ID numbers, or by clearly specifying the type, range, display, resolution and
anvils.
Perform the MSA and associate it with the gauge family record.
%Tolerance for One-sided Specs
This is the ninth in a series of articles about MSA. The focus of this article will be on discussing the proper
approach to evaluating Repeatability and Reproducibility as a % of Tolerance when the tolerance is one-
sided.
The AIAG MSA manual is silent on this and no other "guru" that I am aware of has spoken on it. This is
probably because there is no one, right answer.
A recent post asked this question, and I recalled several prior posts that had all asked similar questions,
with mixed responses. It forced me to really think the matter through from the perspective of what is the
intent of the %Tolerance metric, not what is convenient or easy.
Minitab users have two options. If they enter a natural boundary as the second specification (or the range
between the boundary and the specification), Minitab will calculate %Tolerance the same way as if they
entered a two sided specification (think Cp). If they leave the other specification blank, Minitab will
calculate %Tolerance using 3 * the total R&R and the difference between the study mean and the
specification (think Cpk). See the full explanation here.
Lets look a few scenarios, and I will give you my . This is not based on any research or statistics,
but simply by thinking the problem through rationally.
• Scenario 1, Maximum Flatness: Zero is a natural boundary and there is a maximum allowable
flatness specification. Presumably, there is some incentive to drive for smaller and smaller levels
of flatness, thus moving farther and farther away from the USL. In this situation, you will
conceivably utilize the entire tolerance spectrum. This would tend to justify using the whole
range, or entering 0 as a LSL.
• Scenario 2, Minimum Hardness: I am using the example used by Minitab in the above link. In
this scenario, you are not going to strive for infinite hardness. There will be practical and
economical reasons for you to "hover" a safe distance above the LSL, and to not increase over
5. time. In this scenario, it would make more sense to use the Minitab approach. (I will casually
overlook the fact that you cannot enter infinity into the USL field).
• Scenario 3, Maximum activation force: The same logic could apply to a maximum activation
force for a push button. If you strove for zero activation force on a conventional spring-loaded
design, there would no force left to reopen the switch. Therefore, you will also “hover” a safe
distance below the maximum force and not decrease over time. In this case, the Minitab
approach of leaving the LSL field blank would also make sense.
It may make more sense to forget about the %Tolerance entirely and use the Gauge Performance Curve
approach (see my files MSA 3rd Ed.xls and MSA 3rd Ed ANOVA.xls in my earlier blogs. This approach
provides the probability of accepting a product for each specific measurement as you near the
specification.
Bottom line, I recommend that you think carefully about your specific situation, and choose the
appropriate approach based on the situation.
Technical Support Document How Minitab calculates %Tolerance
when a One-Sided Tolerance is entered for Gage R&R
Many processes operate with only a single specification limit. For example, a lumber mill cuts
beams to be perfectly straight, but the beams often warp during manufacture. Measurements of this
warping have an upper specification limit to distinguish acceptable and unacceptable degrees of
warping. However, these measurements have no lower specification limit because they cannot take
values less than zero. In fact, zero represents a perfectly straight beam with no warping. Likewise,
some processes have a lower specification limit, but no upper specification limit. A cutlery
manufacturer must ensure the hardness of its knife blades exceed 55 on the Rockwell C-scale. The
lower specification limit is 55, but no upper specification limit exists.
When evaluating Gage R&R for processes with only one specification limit, it is important for the
analysis to reflect this property. One statistic that is directly affected by specification limits is the
%Tolerance statistic, which compares the tolerance with the study variation. Ideally, the tolerance
should amply encompass the study variation, ensuring the variability due to Gage R&R and part-to-
part variation do not push the process output beyond the specification limits.
When a process has two specification limits, the tolerance equals the difference between them, and
%Tolerance equals the study variation of a given variation source divided by this tolerance.
However, this method is invalid when you provide a single specification limit. For these cases,
Minitab uses the following method:
1. Minitab calculates a one-sided process variation by dividing the Study Variation statistic by 2.
2. Minitab defines the one-sided tolerance as the absolute value of the difference between the single
specification limit and the mean value of all measurements.
3. Minitab calculates the %Tolerance statistic by dividing the one-sided process variation by the one-
sided tolerance.
ns observatioall of mean the is X and limit, ion specificatsingle the is L where 100 *L - X 2 Variation
Study Tolerance %o
If the mean of all observations is less than the lower specification limit, or greater than the upper
specification limit, the measurements are deviating strongly from their acceptable range, and
%Tolerance is not calculated.
6. Linearity
This is the third in a series of articles about MSA. The focus of this article will be on measurement
linearity.
Linearity is simply measurement bias throughout the entire range of the measurement device. For this
reason bias and linearity are often combined into a single study.
A good calibration system will check the calibration of a measurement device at a minimum of three
locations (both extremes and the middle of the measurement range). A thorough linearity study will check
at least five locations (e.g., both extremes, and at 25%, 50% and 75% of the measurement range).
Standards such as used in calibration should be used instead of actual parts unless the parts can be
measured with less measurement variation with a different measurement device. Each standard is
measured repeatedly at least ten times. All measurements must be made randomly to minimize the
appraiser recalling previous results. The results can be analyzed using statistical software such as
Minitab, or with the attached file as shown.
• Calculate the bias for each individual measurement and the average bias for each reference
standard (or part).
• Plot the individual and average biases on the y-axis of a scatter plot versus the values of the
standards on the x-axis.
• Perform a regression analysis using the individual biases as the response and the reference
values as the predictor variable.
• Plot the regression line and the 95% confidence limits for the regression line on the scatter plot.
• Plot the bias = 0 line for all reference values.
• Verify that the Bias = 0 line lies within the +/- 95% confidence limits of the regression line.
• The y – intercept and slope of the regression equation should each be approximately equal to
zero. These values may be statistically evaluated if desired per the formulae in the AIAG MSA
3rd edition manual. For practical purposes the graphical analysis is sufficient. The statistical
analysis is only necessary for borderline cases.
Ideally, linearity will be statistically equal to zero. However, this will not always be the case. There are
several possible scenarios:
• Constant bias – All measurements are offset by the same amount regardless of size. This is
essentially a calibration issue. Calibrate the measurement device and repeat the linearity study.
• Non-constant, linearly increasing/decreasing bias –The bias either increases or decreases
as the location within the measuring range increases. Possible causes: measurement scale is
proportionally small/large (gage issue), thermal bias (thermal expansion/contraction is
proportional to the size of the dimension), pressure bias (deflection under pressure is
proportional to size), etc.
• Non-constant, nonlinear bias – The bias changes in a nonlinear fashion throughout the
measurement range. Possible cause: measurement scale is nonlinear (gage issue, particularly
with electronics), gage wear in one section of the measurement range, worn standards.
7. If the linearity is acceptable within the range actually used for measurement, the gage may be accepted
for a specified range of measurement. This must be clearly noted on the gage and the practice
documented in the appropriate quality procedures.
Bias
This is the second in a series of articles about MSA. The focus of this article will be on measurement bias,
sometimes referred to as accuracy.
Bias is the difference between the actual value of a part and the average measured value of that part. In
other words, a measurement device that has bias will consistently over or under state the true value of the
part.
In most cases, a separate study of measurement bias is not performed if the measurement device has
been calibrated. The reason for this is simple. Calibration is intended to detect and correct any
measurement bias found. As I stated in Part 1, calibration and measurement uncertainty are outside of
the scope of this series and is better left to experts in those fields. However, I will state that all calibration
programs are not created equal. Some less equal calibration programs may take a single measurement of
a standard and then make a determination on whether there is measureable bias in the gage. This
overlooks the fact that taking a second or third measurement could provide different results than found in
the first measurement. There are also other less obvious sources of bias from which a calibration system,
no matter how well designed and implemented, will not protect you.
I will go through a few examples of bias that you could encounter:
• Measurement device bias – As we discussed in Part 1, all measurements vary to some extent.
if the device has sufficient resolution to see it. The failure mode of a weak calibration system is
to base the calibration on a single measurement. The solution is to take multiple measurements
of the standard and compare the average of these measurements to the standard before making
a determination of the magnitude of the bias and making an adjustment. Even better, a 1-
sample t-Test may be used to determine the statistical significance of the bias provided the
required sample size is established in advance using the maximum allowable bias and desired
alpha and beta risks.
• Temperature bias – Many products will change size with changes in temperature. The
magnitude of this change in size may or may not add significant bias depending on the materials
involved. What may be less commonly known is that the measurement device will also change
size with temperature. How often does the appraiser carry the measurement device in a pocket
or in their hand warming it up to body temperature? Temperature not only affects mechanical
dimensions, but also electrical. Resistance changes with temperature affecting many electrical
measurements. An extremely important aspect of calibration performed by internal lab is
normalizing both the standard and the gage at standard temperatures before calibration.
• Humidity bias – Certain materials will swell or shrink with changes in moisture content. Critical
measurements should be made at standard humidity conditions after a lengthy normalization
time.
• Pressure bias – Materials that are compressible such as rubber or foam are notoriously difficult
to measure due to the deformation of the part under pressure. But did you realize that the steel
shaft diameter that you are measuring may also be understated depending on whether you used
the ratchet thimble on the micrometer or not?
8. • Cosine error bias – Not just for CMMs! Test indicators, less commonly used these days, are
also susceptible to cosine error. Did you realize that the ball on the tip of a CMM probe can
introduce potential bias? All touches made with the tip must be made perpendicular to the
surface of the part. When this is not done the diameter of the sphere will introduce what is called
cosine error. The larger the sphere used, the larger the cosine error.
• Measurement procedure bias – Your measurement procedure can also introduce bias. How
do you measure the diameter of a shaft (randomly, max, min, average of max-min, using CMM)?
What about the location on the shaft (middle, end, multiple locations)? The effect of this bias
depends on the application of the part. Does the shaft need to slip into a hole? The average
diameter reported by a CMM will understate the effective diameter of the shaft. It may measure
in specification and not fit into a ring gage.
Stability
This is the fourth in a series of articles about MSA. The focus of this article will be on measurement
stability.
Stability is simply measurement bias throughout an extended period of time. This is where calibration falls
short from an MSA perspective. Calibration is a series of snapshots widely spaced in time taken under
controlled environmental conditions. A stability study is a series of repeated measurements taken under
actual usage conditions. The purpose is to verify that the bias of the gage does not change over time due
to environmental conditions or other causes.
A stability study is performed by selecting a measurement standard (ideal) or a master sample part that is
midrange of the expected measurement range. Note: This may be enhanced by adding standards/master
parts at the low and high ends of the expected measurement range. On a periodic basis, measure the
standard 3 – 5 times. The period should be based on knowledge of what may influence the measurement
system. For example, if ambient temperature variation is expected to be the major source of variation,
make hourly checks throughout the day. If the source of variation is expected to be long term drift, take
daily or weekly measurements.
Analyze the data using Xbar/R or Xbar/s control charts (use separate charts if you measured at the
low/middle/high ends of the expected measurement range). The subgroups are comprised of the 3 -5
measurements and measure short term repeatability of the measurement device. If the control chart is in
a state of statistical control throughout the study period, the gage stability is acceptable. There is no
numerical acceptance criterion. If the control chart is out of control, analyze the patterns.
• For example, the influence of temperature would be expected to appear as cyclical trends that
coincides with the ambient temperature.
• If the gage operates on plant utilities (e.g., air pressure) abrupt shifts could occur based on plant
demand on the utilities (e.g., air pressure).
• Single points out of control could be the result of a gage that is overly sensitive to operator
technique.
• Runs could be the result of different measurement methods
A stability study will also provide an estimate of the within operator repeatability of the gage.
StdDevrepeatability = Rbar/d*2 (d2 may be used if the number of subgroups is greater than 20).
9. R&R
This is the fifth in a series of articles about MSA. The focus of this article will be on measurement
repeatability and reproducibility commonly referred to as a gage R&R study.
This article will deal solely with the AIAG MSA methodology. The AIAG methodology is the
methodology required by many customers, particularly in the automotive industry. Whether you
agree with it or not, it is a standard approach and has widespread acceptance. Most suppliers
have no option other than to comply. I will deal with other approaches in a later article.
This is where calibration completely separates from MSA. There is no equivalent in calibration to R&R.
Calibration, bias studies, linearity studies and stability studies have all focused on measurement bias.
R&R studies focus on measurement variation. Let’s first start with definitions. What is Repeatability? What
is Reproducibility? What is an Operator by Part interaction?
Repeatability is the measurement variation observed when a single operator measures one part multiple
times.
Reproducibility is the measurement variation observed when multiple operators measure one part
multiple times. Depending on the measurement system, AND how the MSA study is designed
Reproducibility may also be the measurement variation observed when multiple measurement stations or
devices measure one part multiple times. For example, a measurement device may consist of a fully
automated measurement device comprised of multiple stations. Each station is dimensionally unique and
the difference contributes to measurement variation. Another example is multiple, fully automated
measurement devices that measure the same characteristic. Each device has a slightly different
measurement bias and contributes to the measurement variation. Yet another example is a semi-
automated measurement device that is manually loaded. The resulting measurement is influenced by the
manner in which the product is loaded into the fixture. Each operator that loads the product has a slightly
different technique for loading that influences the measurement variation.
Operator by Part Interaction is a situation where the result of an operator’s measurement technique is
influenced by the part itself. For example, two operators measure a shaft diameter using techniques that
are identical in all respects except one. Operator A takes measurements at the midpoint of the shaft
length. Operator B measures at one end of the shaft. Two shafts out of ten have burrs on the ends.
Operator A’s measurements are not affected by the burr. Operator B’s measurements are affected by the
burr. This will result in an interaction between the operator and the part itself.
Part Selection
The first step in an effective R&R study is to determine the use of the gage itself. Will it be used for part
inspection to a tolerance, for process control, for statistical studies (e.g., a hypothesis test, capability
study, DOE, etc.), or for a combination of these? This is very important because it influences the selection
and quantity of parts needed for the R&R study.
If the gage is used solely for part inspection, the selection of parts is not critical because the part variation
is not included in the calculation of the R&R metric, %Tolerance (i.e., P/T Ratio). Some will recommend
that parts representing the full spread of the tolerance be used. While this does not hurt, it is not really
necessary. If a gage linearity study has been performed, the change in bias over the tolerance spread is
known. If a gage linearity study has not been performed and there is a linearity issue an R&R study will
not detect it.
If the gage is used for part inspection or for statistical tests, the selection of parts is critical because the
part variation is part of the calculation of the R&R metric, % Study Variation (i.e., %GRR). It is vital that
the parts selected for the study reflect the actual variation of the process. That is, the StdDev of the parts
equals the StdDev of the process. Some statistical packages, such as Minitab, allow the entry of the
10. historical StdDev of the process. If your software has this option, use it, entering the process StdDev from
a capability study or calculated from SPC charts. If the software does not have the feature, manual
calculations using the historical value are still possible as follows"
% Study Variation = 100 * [StdDevR&R / StdDevTotal Variation]
StdDevTotal Variation = SQRT[StdDevR&R^2 + StdDevPart Variation^2]
Manually substitute the StdDev from a capability study for StdDevPart Variation
How many operators, trials and parts do I use?
The recommended standard is to have three operators measure ten parts three times each. Is this always
the best approach? What flexibility do we have in modifying this? To answer this question, we need to
look at how the data are used by the ANOVA calculations.
Source of Variation degrees of freedom (n-1)
Reproducibility (3 operators) 2
Parts (10 parts) 9
Pure Error (Repeatability) 78
Total Variation (90 measurements) 89
The 10/3/3 approach provides very good estimates of the total variation and the repeatability. The least
reliable estimate of variation will be the Reproducibility because it has the smallest degrees of freedom. If
concessions must be made, it is better to run fewer trials in order to maintain or increase the number of
operators. The total number of measurements should be maintained near 90. The number of parts may
be reduced, if (and only if) an independent estimate of part variation (such as from a capability study) is
available and used as described in the previous section.
Selection of Operators
Always use the actual operators that will perform the measurement. Do not use personnel that will not
perform the measurement task. Select the operators randomly. Do not handpick the best operators. If
only one operator performs the measurement task (e.g., complex analytical equipment), perform the study
with that operator only. There is no Reproducibility component in that situation
Measurement of Parts
Parts should be introduced randomly to each operator by an independent party that is not involved in the
actual measurements. This is to prevent potential measurement bias caused by an operator remembering
a previous measurement and consciously or unconsciously adjusting the next measurement to match.
Parts should be measured using the same method that will normally be used. If Reproducibility is
adversely affected by the use of different methods, you need to know that. If there is significant within-part
variation in form that adversely affects Repeatability, you need to know that also.
What method do I use?
In the MSA manual, there are two optional methods: the Range method and the ANOVA method. Both
methods will provide very similar results. The Range method uses simpler math, but the ANOVA method
can detect a potential Operator x Part interaction. If you have software available, use the ANOVA method.
It provides additional information. The only compelling reason for using the Range method is if you must
perform manual calculations.
continued in next blog entry
Comparison of Two Gages
11. This is the eighth in a series of articles about MSA. The focus of this article will be on assessing the
measurement reproducibility between two measurement systems that must measure the same
characteristic.
Does the following scenario seem familiar? An operator performs an online test on a product. The test
shows a failure to meet specifications. This is an expensive product, so the line supervisor takes the part
to an online tester located on an adjacent line and retests it. This time it passes. The supervisor instructs
the operator to go ahead and use the part. Both of them lose confidence in the first measurement device.
But which one provided the correct result?
There are several approaches that may be used to assess the reproducibility of two measurement
devices.
One approach that is commonly used when the measurement devices are fully automated is to perform a
standard R&R study and replacing the Operators with the Measurement Devices. Reproducibility is now
tester to tester reproducibility. The operator*part interaction becomes the tester*part interaction. While this
approach does quantify the reproducibility of the two devices, it has the drawback of providing little
additional information if a significant difference between the two is noted. It is also limited to fully
automated gages. You do have the option of adding gage as a third factor (Parts, Operators, Gages) and
analyzing the resulting designed experiment using Analysis of Variance (ANOVA).
A second approach is known as the Iso-plot. This is a very simple graphical approach. An x-y plot is
constructed with equal scales, and a 45º line is drawn through the origin. Parts are selected throughout
the expected measurement range. Each part is measured using both gages. Each part is then plotted on
the graph using the value measured by one gage on the x-axis and the value measured by the other gage
on the y-axis. Ideally, the coordinates of all parts will lie on the 45º line. If the points fall consistently above
or below the line, one gage is biased in respect to the other. The Iso-plot provides an excellent visual
assessment of the reproducibility of the two gages, but does not provide quantifiable results. A linear
regression analysis of the data used to create the Iso-plot will provide quantifiable results. Ideally, the
regression constant should equal zero and the slope should equal one. The regression output will provide
the relationship between the two gages as well as confidence and prediction limits.
A third approach is the Bland Altman plot. Parts are selected throughout the expected measurement
range. Each part is measured using both gages. Calculate the differences between the gages (Gage1 –
Gage2), the mean of each pair of measurements and the mean of the differences. Plot the differences on
an x-y plot with the differences on the y-axis and the mean of the paired measurements on the x-axis.
Draw a horizontal line for the mean of the differences. Draw two more horizontal lines at + / - 1.96
standard deviations of the differences. Approximately 95% of the differences should fall within the limits.
The mean line indicates the bias between the two gages. A trend indicates that the bias changes with the
size (a linearity difference between the gages). An increase (or decrease) in the width of the pattern with
size indicates that the variation of at least one of the gages is dependent on size. If the variation within the
limits is of no practical importance, the two gages may be used interchangeably.
Do not overlook the differences between gages when evaluating your measurement systems. Your
operators will be quick to discover gages that inconsistently accept and reject product between the gages.
They will then lose confidence in them and the test itself.