ROC curve : Receiver Operating Characteristic curve
A graph showing the performance of a classification model _at all classification thresholds.
ROC Curve showing TP Rate vs. FP Rate at different classification thresholds |
---|
⇒ Instead of computing the points in an ROC curve (Evaluate a logistic regression model many times with different classification thresholds), USE sorting-based algorithm AUC. |
- 2 parameters :
True Positive Rate
,False Positive Rate
- True Positive Rate TPR == RECALL :
TPR = TP/(TP+FN)
- False Positive Rate (FPR) :
FPR = FP/(FP+TN)
- True Positive Rate TPR == RECALL :
- Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives. The following figure shows a typical ROC curve.
AUC : Area under the ROC Curve
AUC measures the entire two-dimensional area underneath the entire ROC curve (integral calculus) from (0,0) to (1,1) |
---|
AUC provides an aggregate measure of performance across all possible classification thresholds.
- as the probability that the model ranks a random positive example more highly than a random negative example.
e.g, Positive and negative examples ranked in ascending order of logistic regression score:
- AUC : the probability that a random positive (green) example is positioned to the right of a random negative (red) example.
AUC Ranges from 0 to 1
- A model whose predictions are 100% wrong has an AUC of 0.0; one whose predictions are 100% correct has an AUC of 1.0.
Pros
- Scale-invariant : It measures how well predictions are ranked, rather than their absolute values.
- Cassification-threshold-invariant : It measures the quality of the model's predictions irrespective of what classifcication threshold is chosen.
Cons
- Scale invariance is not always desirable.
- e.g, we really do need well calibrated probability outputs, and AUC won't tell us about that.
- Classification-threshold invariance is not always desirable.
- Wide disparities in the cost of FN vs. FP : it may be critical to minimize one type of classification error.
e.g, email spam detection : you likely want to prioritize minimizing false positives (even if that results in a significant increase of false negatives) ⇒ AUC ISNOT a useful metric.
- Wide disparities in the cost of FN vs. FP : it may be critical to minimize one type of classification error.
- Scale invariance is not always desirable.
AUC PR (Area Under PR Curve - precision-recall)
Focusing on the minority class, Useful for imbalanced classification problems.- ranges from 0 to 1
- The Higher AUC PR the higher-quality model.
AUC ROC (Area Under Curve - Receiving operation characteristic)
Calculating the FP(False Positive) rate and TP(True Positive) rate for a set of predictions by the model under different thresholds.- ranges 0~1
- The higher AUC ROC the higher-quality model
AUC & ROC Interpretation
ROC |
---|
An ROC curve with one diagonal line running from (0,0) to (1,1). TP and FP rates increase linearly at the same rate. |
This ROC curve has an AUC of 0.5, meaning it ranks a random positive example higher than a random negative example 50% of the time. As such, the corresponding classification model is basically worthless, as its predictive ability is no better than random guessing. |
An ROC curve that arcs right and up from (0,0) to (1,1). FP rate increases at a faster rate than TP rate. |
This ROC curve has an AUC between 0 and 0.5, meaning it ranks a random positive example higher than a random negative example less than 50% of the time. The corresponding model actually performs worse than random guessing! If you see an ROC curve like this, it likely indicates there's a bug in your data. |
ROC curves produce AUC values greater than 0.5 |
---|
The TP rate is 1.0 for all FP rates. |
This is the best possible ROC curve, as it ranks all positives above all negatives. It has an AUC of 1.0. (In practice, if you have a "perfect" classifier with an AUC of 1.0, you should be suspicious, as it likely indicates a bug in your model. For example, you may have overfit to your training data, or the label data may be replicated in one of your features.) |
TP rate increases at a faster rate than FP rate. |
This ROC curve has an AUC between 0.5 and 1.0, meaning it ranks a random positive example higher than a random negative example more than 50% of the time. Real-world binary classification AUC values generally fall into this range. |
Understanding Q
[AUC and Scaling Predictions]
How would multiplying all of the predictions from a given model by 2.0 (for example, if the model predicts 0.4, we multiply by 2.0 to get a prediction of 0.8) change the model's performance as measured by AUC?
- ❌ It would make AUC terrible, since the prediction values are now way off.
- ❌ It would make AUC better, because the prediction values are all farther apart.
- ⭕ No change. AUC only cares about relative prediction scores.
→ AUC is based on the relative predictions : only cares about relative rankings
- ANY TRANSFORMATION OF THE PREDICTION that PRESERVES THE RELATIVE RANKING has no effect on AUC.
- Not the case for other metrics such as squared error, log loss, or prediction bias.