Você está na página 1de 5

Data Tables to Analyze Trade-Off between Performance Measures and

Creating ROC Curves

In Excel, you can create data tables to see how different input values affect the result of a formula
without having to re-type or copy the formula for each input value. This is known as data tables. We will
use data tables extensively to analyze the effects of the cut-off on different performance measures.

Setup
A one-variable Data Table can be set up in two different ways: row oriented or column oriented. We
take the column-oriented data table as an example. In the example, we will calculate the sensitivity, 1-
specificity, overall error rate and accuracy based on different cut-offs in a Logistic Regression Example.

In your Excel file, after you have run your Logistic Regression, create a template for the one-
variable data table, as displayed the picture below. Note the formulas for sensitivity, 1-specificity,
overall error rate and accuracy.

The formulas
Select (highlight) the range of cells that contains the
formulas and values that you want to analyze the
effects off on the formulas.

Running Data-Table

On the Data tab in the Menu click What-if Analysis and then select Data Table to activate the
dialogue box.

As the table is column-oriented, enter the cell reference in the column input cell box. Note that
the cell reference must be an input and should not contain any formula. Using the example shown,
the input cell is G116. Press OK to calculate the table. Your table should then fill up with numbers!

A good sanity check is to verify that:

The sensitivity for a cut-off of zero is always 1 and for a cut-off of one is always 0
The 1-sensitivity for a cut-off of zero is always 1 and for a cut-off of zero is always 0
Further, for a cut-off of zero, the accuracy should be the probability of class 1 in your data and
for a cut-off of one the accuracy should be the probability of class 0 in your data.
Plotting the trade-off

We can now plot any of the error measures as a function of the cut-off. As an example we will plot an
ROC curve. An ROC curve is a plot of the sensitivity versus 1-specificity. To plot an ROC curve, you need
the values of sensitivity and v 1-specificity from the data table above.

Select Insert on the Excel Ribbon, and select Scatter and Scatter with straight lines and
markers, like the picture below.

Excel will then create a a blank figure. Select the blank figure and right click it, then click Select
data.
In the pop-up window, click add. Then in the window, you can select the X value and Y value,
and you can type in the name of the Series.

Select the 1-specificity values as Series X values, and the sensitivity values as series Y values.
Click OK.

You should now have a ROC curve of, in our case, the validation data.

Note, the default values in


Excel set the axis max
values to 1.2 there is no
reason to extend ROC plots
beyond 1! Fix the axis to
get a nicer looking graph.
Often times we want to compare the performance on the training and the validation data. To
compare the ROC Curves of the training and validation data in the same plot, you need to create
the sensitivity and 1-specificity values for the training data. Then you select the curve in your
plot and right click on it and select Select Data. In the dialog box you can select Add and
repeat the procedure to create a training data ROC curve.

Your final graphs should look something like the figure below. Make sure that your graph is
readable in black and white, unless you are sure that your audience will be reading it off a
computer screen and/or printing it in color.

Você também pode gostar