Area Under ROC - aragorn/home GitHub Wiki

AUC
- the Area Under the Receiver Operating Characteristic Curve
- Introduction to ROC Curves
- Plotting and Intrepretating an ROC Curve
- The Area Under an ROC Curve

Comparing ROC Curves

The graph at right(above) shows three ROC curves representing excellent, good, and worthless tests plotted on the same graph. The accuracy of the test depends on how well the test separates the group being tested into those with and without the disease in question. Accuracy is measured by the area under the ROC curve. An area of 1 represents a perfect test; an area of .5 represents a worthless test. A rough guide for classifying the accuracy of a diagnostic test is the traditional academic point system:

.90-1 = excellent (A)
.80-.90 = good (B)
.70-.80 = fair (C)
.60-.70 = poor (D)
.50-.60 = fail (F)

Explanations

Roc curve, AUR(AUCOC), 민감도, 특이도
- http://newsight.tistory.com/53

Some quotes - 1

J. Yi, Y. Chen, J. Li, S. Sett, and T. W. Yan. Predictive model performance: Offline and online evaluations. In KDD, pages 1294–1302, 2013.

4.1 AUC

Consider a binary classifier that produces the probability of an event, p. p and 1-p, the probability the event does not occur, represent the degree to which each case is a member of one of the two events. A threshold is necessary in order to predict the class membership. AUC, or the Area under the ROC (Receiver Operating Characteristic) Curve[12, 33], provides a discriminative measure across all possible range of thresholds applied to the classifier.

Comparing the probabilities involves the computation of four different fractions in a confusion matrix: the true positive rate (TPR) or sensitivity, the true negative rate (TNR) or specificity, the false positive rate (FPR) or commission errors, and false negative rate (FNR) or omission errors. These four scores and other measures of accuracy derived from the confusion matrix such as precision, recall, or accuracy all depend on the threshold.

The ROC curve is a graphical depiction of sensitivity (or TPR) as a function of commission error (or FPR) of a binary classifier as its threshold varies. AUC is computed as follows:

sort records with descending order of the model predicted scores
calculate TPR and FPR for each predicted value
plot ROC curve
Calculate the AUC using trapezoid approximation

Empirically, AUC is a good and reliable indicator of the predictive power of any scoring model. For sponsored search, AUC, especially AUC measured only on mainline ads, is one of the most reliable indicators of the predictive power of the models. A good model (AUC>0.8) usually has statistically significant improvement if AUC improves by 1 point (0.01).

The benefits of using the AUC for predictive modeling include:

AUC provides a single-number discrimination score summarizing overall model performance over all possible range of thresholds. This enables avoiding the subjectivity in the threshold selection.
It is applicable to any predictive model with scoring function.
The AUC score is bounded between [0,1] with the score of 0.5 for random predictions, and 1 for perfect predictions.
AUC can be used for both offline and online monitoring of predictive models.