Você está na página 1de 16

Gauging Gage Part 1: Is 10 Parts

Enough?
"You take 10 parts and have 3 operators measure each 2 times."
This standard approach to a Gage R&R experiment is so common, so accepted, so ubiquitous
that few people ever question whether it is effective. Obviously one could look at whether 3 is
an adequate number of operators or 2 an adequate number of replicates, but in this first of a
series of posts about "Gauging Gage," I want to look at 10. Just 10 parts. How accurately can
you asses your measurement system with 10 parts?
Assessing a Measurement System with 10 Parts
I'm going to use a simple scenario as an example. I'm going to simulate the results of 1,000
Gage R&R Studies with the following underlying characteristics:
1. There are no operator-to-operator differences, and no operator*part interaction.
2. The measurement system variance and part-to-part variance used would result in a
%Contribution of 5.88%, between the popular guidelines of <1% is excellent and
>9% is poor.
Sono looking ahead herebased on my 1,000 simulated Gage studies, what do you think the
distribution of %Contribution looks like across all studies? Specifically, do you think it is
centered near the true value (5.88%), or do you think the distribution is skewed, and if so, how
much do you think the estimates vary?
Go ahead and think about it...I'll just wait here for a minute.
Okay, ready?
Here is the distribution, with the guidelines and true value indicated:

The good news is that it is roughly averaging around the true value.
However, the distribution is highly skeweda decent number of observations estimated
%Contribution to be at least double the true value with one estimating it at about SIX time the
true value! And the variation is huge. In fact, about 1 in 4 gage studies would have resulted in
failing this gage.
Now a standard gage study is no small undertakinga total of 60 data points must be collected,
and once randomization and "masking" of the parts is done it can be quite tedious (and possibly
annoying to the operators). So just how many parts would be needed for a more accurate
assessment of %Contribution?
Assessing a Measurement System with 30 Parts
I repeated 1,000 simulations, this time using 30 parts (if you're keeping score, that's 180 data
points). And then for kicks, I went ahead and did 100 parts (that's 600 data points). So now
consider the same questions from before for these countsmean, skewness, and variation.
Mean is probably easy: if it was centered before, it's probably centered still.
So let's really look at skewness and how much we were able to reduce variation:

Skewness and variation have clearly decreased, but I suspect you thought variation would have
decreased more than it did. Keep in mind that &Contribution is affected by your estimates of
repeatability and reproducibility as well, so you can only tighten this distribution so much by
increasing number of parts. But still, even using 30 partsan enormous experiment to
undertakestill results in this gage failing 7% of the time!
So what is a quality practitioner to do?
I have two recommendations for you. First, let's talk about %Process. Often times the
measurement system we are evaluating has been in place for some time and we are simply
verifying its effectiveness. In this case, rather than relying on your small sampling of parts to
estimate the overall variation, you can use the historical standard deviation as your estimate
and eliminate much of the variation caused by the same sample size of parts. Just enter your
historical standard deviation in the Options subdialog in Minitab:

Then your output will include an additional column of information called %Process. This column
is the equivalent of the %StudyVar column, but using the historical standard deviation (which
comes from a much larger sample) instead of the overall standard deviation estimated from the
data collected in your experiment:

My second recommendation is to include confidence intervals in your output. This can be done
in the Conf Int subdialog:

Including confidence intervals in your output doesn't inherently improve the wide variation of
estimates the standard gage study provides, but it does force you to recognize just how much
uncertainty there is in your estimate. For example, consider this output from the gageaiag.mtw
sample dataset in Minitab with confidence intervals turned on:

For some processes you might accept this gage based on the %Contribution being less than
9%. But for most processes you really need to trust your data, and the 95% CI of (2.14, 66.18)
is a red flag that you really shouldn't be very confident that you have an acceptable
measurement system.
So the next time you run a Gage R&R Study, put some thought into how many parts you use
and how confident you are in your results!


Gauging Gage Part 2: Are 3 Operators
or 2 Replicates Enough?
In Part 1 of Gauging Gage, I looked at how adequate a sampling of 10 parts is for a Gage
R&R Study and providing some advice based on the results.
Now I want to turn my attention to the other two factors in the standard Gage experiment: 3
operators and 2 replicates. Specifically, what if instead of increasing the number of parts in the
experiment (my previous post demonstrated you would need an unfeasible increase in parts),
you increased the number of operators or number of replicates?
In this study, we are only interested in the effect on our estimate of overall Gage variation.
Obviously, increasing operators would give you a better estimate of of the operator term and
reproducibility, and increasing replicates would get you a better estimate of repeatability. But I
want to look at the overall impact on your assessment of the measurement system.
Operators
First we will look at operators. Using the same simulation engine I described in Part 1, this time
I did two different simulations. In one, I increased the number of operators to 4 and continued
using 10 parts and 2 replicates (for a total of 80 runs); in the other, I increased to 4 operators
and still used 2 replicates, but decreased the number of parts to 8 to get back close to the
original experiment size (64 runs compared to the original 60).
Here is a comparison of the standard experiment and each scenario laid out here:


It may not be obvious in the graph, but increasing to 4 operators while decreasing to 8 parts
actually increased the variation in %Contribution seen...so despite requiring 4 more runs this is
the poorer choice. And the experiment that involved 4 operators but maintained 10 parts (a
total of 80 runs) showed no significant improvement over the standard study.
Replicates
Now let's look at replicates in the same manner we looked at parts. In one run of simulations
we will increase replicates to 3 while continuing to use 10 parts and 3 operators (90 runs), and
in another we will increase replicates to 3 and operators to 3, but reduce parts to 7 to
compensate (63 runs).
Again we compare the standard experiment to each of these scenarios:


Here we see the same pattern as with operators. Increasing to 3 replicates while compensating
by reducing to 7 parts (for a total of 63 runs) significantly increases the variation in
%Contribution seen. And increasing to 3 replicates while maintaining 10 parts shows no
improvement.
Conclusions about Operators and Replicates in Gage Studies
As stated above, we're only looking at the effect of these changes to the overall estimate of
measurement system error. So while increasing to 4 operators or 3 replicates either showed no
improvement in our ability to estimate %Contribution or actually made it worse, you may have a
situation where you are willing to sacrifice that in order to get more accurate estimates of the
individual components of measurement error. In that case, one of these designs might actually
be a better choice.
For most situations, however, if you're able to collect more data, then increasing the number of
parts used remains your best choice.

Gauging Gage Part 3: How to Sample
Parts
In Parts 1 and 2 of Gauging Gage we looked at the numbers of parts, operators, and
replicates used in a Gage R&R Study and how accurately we could estimate %Contribution based
on the choice for each. In doing so, I hoped to provide you with valuable and interesting
information, but mostly I hoped to make you like me. I mean like me so much that if I told you
that you were doing something flat-out wrong and had been for years and probably screwed
somethings up, you would hear me out and hopefully just revert back to being indifferent
towards me.
For the third (and maybe final) installment, I want to talk about something that drives me
crazy. It really gets under my skin. I see it all of the time, maybe more often than not. You
might even do it. If you do, I'm going to try to convince you that you are very, very wrong. If
you're an instructor, you may even have to contact past students with groveling apologies and
admit you steered them wrong. And that's the best-case scenario. Maybe instead of admitting
error, you will post scathing comments on this post insisting I am wrong and maybe even
insulting me despite the evidence I provide here that I am, in fact, right.
Let me ask you a question:
When you choose parts to use in a Gage R&R Study, how do you
choose them?
If your answer to that question required anymore than a few words - and it can be done in one
wordthen I'm afraid you may have been making a very popular but very bad decision. If
you're in that group, I bet you're already reciting your rebuttal in your head now, without even
hearing what I have to say. You've had this argument before, haven't you? Consider whether
your response was some variation on the following popular schemes:
1. Sample parts at regular intervals across the range of measurements typically seen
2. Sample parts at regular intervals across the process tolerance (lower spec to upper
spec)
3. Sample randomly but pull a part from outside of either spec
#1 is wrong. #2 is wrong. #3 is wrong.
You see, the statistics you use to qualify your measurement system are all reported relative to
the part-to-part variation and all of the schemes I just listed do not accurately estimate your
true part-to-part variation. The answer to the question that would have provided the most
reasonable estimate?
"Randomly."
But enough with the small talkthis is a statistics blog, so let's see what the statistics say.
In Part 1 I described a simulated Gage R&R experiment, which I will repeat here using the
standard design of 10 parts, 3 operators, and 2 replicates. The difference is that in only one set
of 1,000 simulations will I randomly pull parts, and we'll consider that our baseline. The other
schemes I will simulate are as follows:
1. An "exact" sampling - while not practical in real life, this pulls parts corresponding
to the 5th, 15th, 25, ..., and 95th percentiles of the underlying normal distribution
and forms a (nearly) "exact" normal distribution as a means of seeing how much
the randomness of sampling affects our estimates.
2. Parts are selected uniformly (at equal intervals) across a typical range of parts seen
in production (from the 5th to the 95th percentile).
3. Parts are selected uniformly (at equal intervals) across the range of the specs, in
this case assuming the process is centered with a Ppk of 1.
4. 8 of the 10 parts are selected randomly, and then one part each is used that lies
one-half of a standard deviation outside of the specs.
Keep in mind that we know with absolute certainty that the underlying %Contribution is
5.88325%.
Random Sampling for Gage
Let's use "random" as the default to compare to, which, as you recall from Parts 1 and 2,
already does not provide a particularly accurate estimate:

On several occasions I've had people tell me that you can't just sample randomly because you
might get parts that don't really match the underlying distribution.
Sample 10 Parts that Match the Distribution
So let's compare the results of random sampling from above with our results if we could
magically pull 10 parts that follow the underlying part distribution almost perfectly, thereby
eliminating the effect of randomness:

There's obviously something to the idea that the randomness that comes from random sampling
has a big impact on our estimate of %Contribution...the "exact" distribution of parts shows much
less skewness and variation and is considerably less likely to incorrectly reject the measurement
system. To be sure, implementing an "exact" sample scheme is impossible in most cases...since
you don't yet know how much measurement error you have, there's no way to know that you're
pulling an exact distribution. What we have here is a statistical version of chicken-and-the-egg!
Sampling Uniformly across a Typical Range of Values
Let's move on...next up, we will compare the random scheme to scheme #2, sampling
uniformly across a typical range of values:

So here we have a different situation: there is a very clear reduction in variation, but also a very
clear bias. So while pulling parts uniformly across the typical part range gives much more
consistent estimates, those estimates are likely telling you that the measurement system is
much better than it really is.
Sampling Uniformly across the Spec Range
How about collecting uniformly across the range of the specs?

This scheme results in an even more extreme bias, with qualifying this measurement system a
certainty and in some cases even classifying it as excellent. Needless to say it does not result in
an accurate assessment.
Selectively Sampling Outside the Spec Limits
Finally, how about that scheme where most of the points are taken randomly but just one part is
pulled from just outside of each spec limit? Surely just taking 2 of the 10 points from outside of
the spec limits wouldn't make a substantial difference, right?

Actually those two points make a huge difference and render the study's results
meaningless! This process had a Ppk of 1 - a higher-quality process would make this result even
more extreme. Clearly this is not a reasonable sampling scheme.
Why These Sampling Schemes?
If you were taught to sample randomly, you might be wondering why so many people would use
one of these other schemes (or similar ones). They actually all have something in common that
explains their use: all of them allow a practitioner to assess the measurement system across a
range of possible values. After all, if you almost always produce values between 8.2 and 8.3
and the process goes out of control, how do you know that you can adequately measure a part
at 8.4 if you never evaluated the measurement system at that point?
Those that choose these schemes for that reason are smart to think about that issue, but just
aren't using the right tool for it. Gage R&R evaluates your measurement system's ability to
measure relative to the current process. To assess your measurement system across a range of
potential values, the correct tool to use is a "Bias and Linearity Study" which is found in the
Gage Study menu in Minitab. This tool establishes for you whether you have bias across the
entire range (consistently measuring high or low) or bias that depends on the value measured
(for example, measuring smaller parts larger than they are and larger parts smaller than they
are).
To really assess a measurement system, I advise performing both a Bias and Linearity Study as
well as a Gage R&R.
Which Sampling Scheme to Use?
In the beginning I suggested that a random scheme be used but then clearly illustrated that the
"exact" method provides even better results. Using an exact method requires you to know the
underlying distribution from having enough previous data (somewhat reasonable although
existing data include measurement error) as well as to be able to measure those parts
accurately enough to ensure you're pulling the right parts (not too feasible...if you know you can
measure accurately, why are you doing a Gage R&R?). In other words, it isn't very realistic.
So for the majority of cases, the best we can do is to sample randomly. But we can do a reality
check after the fact by looking at the average measurement for each of the parts chosen and
verifying that the distribution seems reasonable. If you have a process that typically shows
normality and your sample shows unusually high skewness, there's a chance you pulled an
unusual sample and may want to pull some additional parts and supplement the original
experiment.

Você também pode gostar