Você está na página 1de 10

Cpk or Ppk: Which should you use?

Your customer has asked you to report the Cpk of the product you are sending. You know that to compute the Cpk, you need to have the product specifications, and that you need to have the mean and sigma. As you gather the information, someone asks, "Which sigma do they want?" You know that Cpk is calculated by dividing by 3 sigma. But which sigma should you use, estimated or calculated? Which is correct? Which would you report? Naturally, most of us would use the sigma that makes the Cpk look the best. But the sigma that makes the Cpk look best may not accurately reflect what you or your customer need to know about the process. Confusion over calculating Cpk by two different methods is one reason that a new index, Ppk, was developed. Ppk uses the calculated sigma from the individual data.

Sigma of the individuals:

Given that Ppk uses the calculated sigma, it is no longer necessary to use the calculated sigma in Cpk. The only acceptable formula for Cpk uses the estimated sigma.

Estimated sigma:

Given that Ppk uses the calculated sigma, it is no longer necessary to use the calculated sigma in Cpk. The only acceptable formula for Cpk uses the estimated sigma. In 1991, the ASQC/AIAG Task Force published the "Fundamental Statistical Process Control" reference manual, which shows the calculations for Cpk as well as Ppk. These should be used to eliminate confusion about calculating Cpk. So which value is best to report, Cpk or Ppk? Although they show similar information, they have slightly different uses. Estimated sigma and the related capability indices (Cp, Cpk, and Cr) are used to measure the potential capability of a system to meet customer needs. Use it when you want to analyze a system's aptitude to perform. Actual or calculated sigma (sigma of the individuals) and the related indices (Pp, Ppk, and Pr) are used to measure the performance of a system to meet customer needs. Use it when you want to measure a system's actual process performance. Once you determine which capability index you will use, it can easily be calculated using software such as SQCpack or CHARTrunner.

Matt Savage

How can Cpk be good with data outside the specifications?


A customer who called our technical support line recently could not understand why his Cpk, calculated by SQCpack, was above 1.0 when his data was not centered between the specifications and some of the data was outside the specification. How can you have a good Cpk when you have data outside the specification and/or data which is not centered on the target/nominal value? To calculate Cpk, you need to know only three pieces of information: the process average, the variation in the process, and the specification(s). First, find out if the mean (average) is closest to the upper or lower specification. If the process is centered, then either Zupper or Zlower can be used, as you will see below. If you only have one specification, then the mean will be closest to that specification since the other one does not exist. To measure the variation in the process, use the estimated sigma (standard deviation). If you decide to use the standard deviation from the individual data, you should use the Ppk calculation, since Ppk uses this sigma. To calculate the estimated sigma, divide the average range, R-bar, by d2. The d2 value to use depends on the subgroup size and will come from a table of constants shown below. If your subgroup size is one, you will use the average moving range, MRbar. d2 values Subgroup size 1 2 3 4 5 6 7 d2 1.128 1.128 1.693 2.059 2.326 2.534 2.704

You, of course, provide the specifications. Now that you have these 3 pieces of information, the Cpk can be easily calculated. For example, lets say your process average is closer to the upper specification. Then Cpk is calculated by the following: Cpk = (USL - Mean) /( 3*Est. sigma). As you can see, the data is not directly used. The data is only indirectly used. It is used to determine mean and average range, but the raw data is not used in the Cpk calculation. Here is an example that might serve to clarify. Suppose you have the following example of 14 subgroups with a subgroup size of 2 Sample No. Average Range

1 2 3 4 5 6 7 8 9 10 11 12 13 14 Average

0.03 0.10 0.05 1.00 1.50 1.10 1.10 1.10 1.25 1.00 0.75 0.75 1.00 1.20

0.06 0.20 0.10 0.00 1.50 1.50 1.00 1.01 1.20 0.30 0.76 0.50 1.10 1.40

0.045 0.150 0.075 0.500 1.500 1.300 1.050 1.055 1.225 0.650 0.755 0.625 1.050 1.300

0.030 0.100 0.050 1.000 0.000 0.400 0.100 0.090 0.050 0.700 0.010 0.250 0.100 0.200

The mean, X-bar, is 0.8057 and the average range, R-bar, is 0.220. For this example, the upper specification is 2.12, the target value is 1.12, and the lower specification is 0.12. In the data shown above, more than 21% of the data is outside the specification, so you would expect Cpk to be low, right? As it turns out, Cpk is relatively healthy at 1.17. (Yes, for this example, we have ignored the first cardinal rule: Before one looks at Cpk, the process must be in control.) Before we go on, lets check the math. Mean = 0.8057 Average range = 0.2200 Est. sigma = R-bar / d2 = 0.2200/1.128 =0.1950 Cpk = smallest of (Zupper and Zlower) / 3 Zlower = (Mean - LSL) / Est. sigma = (.8057 - 0.12) / .1950 = .6857 / .1950 Zlower = 3.516 Zupper is larger, so in this example, Cpk = Zlower / 3 = 3.516 / 3 Cpk = 1.172 So what gives? Here is an example where Cpk is good, yet the process is not centered and data is outside of at least one of the specifications. The reason Cpk is good is because the average range is understated and thus when you divide by the estimated sigma (which uses the average range), it over-inflates Cpk. The reason the average range is understated will be discussed in a future article. One last note, if you look at this data on a control chart, you will quickly see that it is not in control. Therefore, the Cpk statistic should be ignored when the process is not in control.

Matt Savage

How do we determine process capability if the process isn't normal?


Cp, Cpk, Pp, and Ppk do not necessarily rely on the normal distribution. If any of these indices increases, you know that the process capability has improved. What you do not know is how that improvement translates into good product. This requires knowledge of the distribution of the individual units produced by the process. The Central Limit Theorem refers to averages and this works for the control chart, but it doesn't work for the histogram. Therefore, we generally make the assumption of the normal distribution in order to estimate the percent out of specification (above, below, and total). For non-normal distributions, we first estimate some parameters using the data. We then use these parameters and follow a Pearson curve fitting procedure to select an appropriate distribution. Since the relationship between the standard deviation and the percent within CAN vary differently from the normal distribution for distributions that are not normal, e.g., plus and minus one sigma may not equal 68.26%, plus and minus 2 sigma may not equal 95.44%, etc., we try to transform the capability indices into something comparable. With this distribution equation, we integrate in from the tails to the upper and lower specifications respectively. Once the percent out-of-spec above and below the respective spec limits are estimated, the z values (for the normal with the same mean and standard deviation) associated with those same percents are determined. Then, Cpk and Ppk are calculated using their respective estimates for the standard deviation. This makes these values more comparable to those that people are used to seeing. For more information refer to Appendix C of the SQCpack User Guide. This appendix is in the attached PDF file.

Is Cpk the best capability index?


Cpk has been a popular capability index for many years and perhaps because of its momentum it continues to remain popular. But is it the best index to use? Answering this question assumes that there is one best index, which is a different discussion altogether. Lets agree that there several other useful capability indices. Two other indices that can be beneficial are Ppk and Cpm.As mentioned in a previous article on Cpk, "Cpk or Ppk: Which should you use," Cpk uses only the estimated sigma to measure variation. While this is acceptable, the estimated sigma can be artificially low depending on the subgroup size, sample interval, or sampling plan. This in turn can lead to an over-inflated CpkFor a process that drifts, such as the process shown in the chart below, the estimated sigma will usually be artifcially low. This is because the estimated sigma looks at only variation within subgroups.

Ppk, on the other hand, uses the standard deviation from all of the data. We can call this the sigma of the individual values or sigmai. Sigma of the individual values looks at variation within and between subgroups. For a process that exhibits drifting, estimated sigma would not pick up the total variation in the process and thus the Cpk becomes a cloudy statistic. In other words, one can not be sure it is a valid statistic.In contrast to Cpk, Ppk, which uses the sigma of the individual values, would pick up all the variation in the process. Again, sigmai uses between and within subgroup variation. So if there is drifting in the process, sigmai would typically be larger than the estimated sigma, sigmae, and thus Ppk would, as it should, be lower than Cpk.Here is a quick review of the formulae for Cpk and Ppk: Cpk = Zmin/3 where Zmin Zmin = (USL Mean) / est.sigma or = (Mean - LSL) / est. sigma Ppk = Zmin/3 where Zmin Zmin = (USL Mean) / sigmai = (Mean - LSL) / sigmai

We should be concerned with how well the process is behaving, therefore Ppk might be preferred over Cpk. Ppk is a more conservative approach to answering the question, "How good is my process?" Watch for a future article discussing the relatively new capability index, Cpm, and how it stacks up against Cpk and Ppk.

Should you calculate Cpk when your process is not in control?


The AIAG Statistical Process Control reference manual (p. 13) states: "The process must first be brought into statistical control by detecting and acting upon special causes of variation. Then its performance is predictable, and its capability to meet customer expectations can be assessed. This is the basis for continual improvement." True, but to take it one step further, if the process is not in a state of statistical control then the validity of a Cpk value is questionable. Suppose your customer requires you to provide a Cpk value and does not require control charts. Or perhaps the customer is willing to accept lack of control as long as the Cpk is acceptable. You provide a "good" Cpk number and relax, knowing that your customer is satisfied. But have you really satisfied your customers need, which is to ensure that your product or service is within an acceptable specification region and consistent over time? It is certainly possible to calculate Cpk even when a process is not in control, but one might ask what value this calculation provides. Rather than state "You should never calculate Cpk when the process is out of control," I prefer to say that the less predictable your process is, the less meaningful Cpk is or the less value Cpk carries. While it is easy to say that one should never calculate Cpk when the process is out-of-control, it is not always practical, since customers may dictate otherwise. One of the reasons that minimal emphasis should be placed on Cpk when the process is not in control is predictability. Customers want good Cpk values as well as some confidence that in the future, Cpk will be consistent or improved over previous capability studies. [This topic will be addressed in a future article.] Another reason that you should not put too much weight on Cpk when the process is not in control is due to the underlying statistics that are used in calculating Cpk. Since Cpk uses the range, a process can appear "better" simply because the range is not a fair representation of the process variability when the process is not in control or predictable. If the process is in control, one could conclude that the range is sufficient for calculating Cpk.

Chart produced using SQCpack A hypothetical example might clarify the point: Suppose I have 100 pieces of data that are grouped into units of 5 each. The chart above shows how the control chart and histogram of the data might look. In this example, the process is in control, and the Cpk = 1.252. Suppose I am evaluating another process whose mean and specifications are the same, but the Cpk = 1.803. Most of us would want to have the second process with the higher Cpk, but is the process necessarily better? Unless you determine if the process is in statistical control, you can not fairly answer this question.

As it turns out, the data is exactly the same, but what has changed is the order in which the data was grouped in the samples. This caused the range of the subgroups and R-bar (the average range) to be different. In the second data set, the data was rearranged so that the data within the sample is similar. The sigma of the individuals does not change, but the estimated sigma, which is used in the control limits and Cpk calculations, changes between the two distributions. With this example, determining if the process is in control before looking at Cpk pays off. Since the control chart in the second example, shown below, is not in control, you cannot be sure that its Cpk is a good representation of process capability. The first process, on the other hand, is in control and thus its Cpk is a good predictor of process capability.

Chart produced using SQCpack If you do not have the control chart to evaluate for process control, you might be tempted to select the second process as being "better" on the basis of the higher Cpk. As this example illustrates, you cannot fairly evaluate Cpk without first establishing process control. You can use software such as SQCpack or CHARTrunner to create control charts and calculate Cpk.

The capability index dilemma: Cpk, Ppk, or Cpm


Lori, one of our customers, phoned to ask if Cpk is the best statistic to use in a process that slits metal to exacting widths. As a technical support analyst, I too wondered what index would be best suited for her application. Perhaps Cpk, Ppk, Cpm, or some other index offers the best means of reporting the capability of her product or process. Each of these capability indices can be calculated using software such as SQCpack and CHARTrunner. Loris process capability index, Cpk, has never dipped below 2 and typically averages above 3. Given this high degree of capability, she might consider reducing variation about the target. While the Cpk and Ppk are well accepted and commonly-used indices, they may not provide as much information as Lori needs to continue to improve the process. This is especially true if the target is not the mid-point of specifications. Cpm incorporates the target when calculating the standard deviation. Like the sigma of the individuals formula, compares each observation to a reference value. However, instead of comparing the data to the mean, the data is compared to the target. These differences are squared. Thus any observation that is different from the target observation will increase the standard deviation.

As this difference increases, so does the Cpm. And as this index becomes larger, the Cpm gets smaller. If the difference between the data and the target is small, so too is the sigma. And as this sigma gets smaller, the Cpm index becomes larger. The higher the Cpm index, the better the process, as shown in the diagrams below. In these 3 charts the process is the same, but as the process becomes more centered, the Cpm gets better.

This Cpm is good.

This Cpm is better.

This Cpm is best.

In these 3 charts, the process stays centered about the target, but as the variation is reduced, the Cpm gets better.

This Cpm is good.

This Cpm is better.

This Cpm is best.

We can use Loris raw data to provide an example of how Cpm is calculated:
Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7 Sample 8 Sample 9

obs 1 obs 2 obs 3 obs 4 obs 5

90.741 102.300 98.642 106.069 97.635

102.711 100.882 103.314 98.569 96.639

104.066 105.620 96.165 100.412 96.316

106.602 95.978 96.265 95.869 84.872

100.904 108.558 94.882 98.573 108.588

104.922 100.243 97.053 111.042 99.068

112.738 108.145 98.679 103.788 105.664

102.388 104.159 100.204 99.328 94.157

97.825 95.209 91.273 93.430 98.263

And the specifications are: USL = 145, Target = 105, LSL = 60

Cpm = (145 60) / (6 * Cpm = 1.91 (Cpk = 2.51)

In a process with both upper and lower specifications, the target is typically the midpoint of these. When such a high degree of capability exists, one may want to ask the customer if the target value is ideal. Lori should check with her customer to determine if he or she wants a small shift toward one of the specifications. Regardless of the target in relation to the specifications, the focus should always be on making the product to target with minimum variation. Cpm is the capability index that accurately depicts this. Reference: L.J. Chan, S.K. Cheng, and F.A. Spiring, A New Measure of Process Capability: Cpm, Journal of Quality Technology, Vol.. 20, No. 3, July, 1989, p. 16.

Calculating capability indices with one specification


The following formula for Cpk is easily found in most statistics books, as well as in software products such as SQCpack and CHARTrunner. Cpk = Zmin / 3 Zmin = smaller of Zupper or Zlower Zupper = [(USL Mean) / Estimated sigma*] Zlower = [(Mean LSL) / Estimated sigma*] Estimated sigma = average range / d2 And, weve all learned that generally speaking, the higher the Cpk, the better the product or process that you are measuring. That is, as the process improves, Cpk climbs. What is not apparent, however, is how to calculate Cpk when you have only one specification or tolerance. For example, how do you calculate Cpk when you have an upper tolerance and no lower tolerance? When faced with a missing specification, you could consider: 1. 2. Not calculating Cpk since you dont have all of the variables. Entering in an arbitrary specification. Ignoring the missing specification and calculating Cpk on the only Z value.

An example may help to illustrate the outcome of each option. Lets assume you are making plastic pellets and your customer has specified that the pellets should have a low amount of moisture. The lower the moisture content, the better. No more than .5 is allowed. If the product has too much moisture, it will cause manufacturing problems. The process is in statistical control.

It is not likely your customer would be happy if you went with option A and decided not to calculate a Cpk. Going with option B, you might argue that the lower specification limit (LSL) is 0 since it is impossible to have a moisture level below 0. So with USL at .5 and LSL at 0, Cpk is calculated as follows: If USL = .5, X-bar = .0025, and estimated sigma = .15, then: Zupper = [(.5 - .0025) / .15] = 3.316, Zlower = [(.0025 0) / .15] = .01667 and Zmin = .01667 Cpk = .01667 /3 = .005 Your customer will probably not be happy with a Cpk of .005 and this number is not representative of the process. Example C assumes that the lower specification is missing. Since you do not have a LSL, Zlower is missing or nonexistent. Zmin therefore becomes Zupper and Cpk is Zupper/3. Zupper = 3.316 (from above) Cpk = 3.316 / 3 = 1.10. A Cpk of 1.10 is more realistic than .005 for the data given in this example and is representative of the process. As this example illustrates, setting the lower specification equal to 0 results in a lower Cpk. In fact, as the process improves (moisture content decreases) the Cpk will decrease. When the process improves, Cpk should increase. Therefore, when you only have one specification, you should enter only that specification, and treat the other specification as missing. An interesting debate (well, about as interesting as statistics gets) occurs with what to do with Cp (or Pp). Most textbooks show Cp as the difference between both specifications (USL LSL) divided by 6 sigma. Because only one specification exists, some suggest that Cp can not be calculated. Another suggestion is to look at ~ of the Cp. For example, instead of evaluating [(USL Mean) + (Mean LSL)] / 6*sigma, instead think of Cp as (USL Mean) / 3*sigma or (Mean LSL) / 3*sigma. You might note that when you only have one specification, this becomes the same formula as Cpk. Example capability analysis from the free Cpk calculator: The following custom analysis is based on 24 data points. The upper specification is 15. The lower specification is 5. Subgroup size is 1. The mean is 10.57. The minimum is 7.4. The maximum is 16.2. The estimated standard deviation is 1.16 Cpk is 1.27. This Cpk is considered fair based on the following scale: 0 to less than 1.0 is unacceptable - sometimes called "not capable" Greater than 1 to 1.33 is fair Greater than 1.33 to 1.66 is acceptable Greater than 1.66 is exceptional Ppk is 0.88 This is considered fair based on the scale above. Ppk is a measure of Capability similar to Cpk except that the actual standard deviation is used in the calculation - rather than an estimate of standard deviation.

Cp is 1.44 When Cp is greater than 1.0, the (six sigma) spread of your data is smaller than the width of your specification limits. Cp by itself does not tell where the process average is compared to the specification limits. Instead, it tells you the maximum your Cpk can become if the process average is at the center of the upper and lower specification limits, assuming the same variation. When Cp is greater than Cpk, the process should be adjusted so that the process average is centered between the upper and lower specifications to achieve the maximum Cpk. If no changes are made to this process it will produce approximately 0.01% outside of specifications or 67 defects per million. Capability analysis is based on these important assumptions: 1) Your data is normally distributed. This means a histogram of the data shows a normal bell curve. If your data is not normally distributed, software such as CHARTrunner or SQCpack can help with your capability analysis. 2) A control chart of the data shows no out-of-control conditions. When out-of-control conditions exist, the capability information is not reliable because the process is not predictable. Software such as CHARTrunner or SQCpack can help with your control charts. 3) The measurment system can demonstrate a %R&R less than 30%. Software such as GAGEpack can help you perform a measurement systems analysis. Review other links on this page to learn more about capability analysis. Analysis provided by PQ Systems, Inc.