[EXAMTOPIC] AUC & ROC curve

ROC curve : Receiver Operating Characteristic curve

A graph showing the performance of a classification model _at all classification thresholds.

ROC Curve showing TP Rate vs. FP Rate at different classification thresholds
Instead of computing the points in an ROC curve (Evaluate a logistic regression model many times with different classification thresholds), USE sorting-based algorithm AUC.
  • 2 parameters : True Positive Rate, False Positive Rate
    • True Positive Rate TPR == RECALL : TPR = TP/(TP+FN)
    • False Positive Rate (FPR) : FPR = FP/(FP+TN)
  • Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives. The following figure shows a typical ROC curve.

AUC : Area under the ROC Curve

AUC measures the entire two-dimensional area underneath the entire ROC curve (integral calculus) from (0,0) to (1,1)
  • AUC provides an aggregate measure of performance across all possible classification thresholds.

    • as the probability that the model ranks a random positive example more highly than a random negative example.
  • e.g, Positive and negative examples ranked in ascending order of logistic regression score:

    • AUC : the probability that a random positive (green) example is positioned to the right of a random negative (red) example.
  • AUC Ranges from 0 to 1

    • A model whose predictions are 100% wrong has an AUC of 0.0; one whose predictions are 100% correct has an AUC of 1.0.
  • Pros

    1. Scale-invariant : It measures how well predictions are ranked, rather than their absolute values.
    2. Cassification-threshold-invariant : It measures the quality of the model's predictions irrespective of what classifcication threshold is chosen.
  • Cons

    1. Scale invariance is not always desirable.
      • e.g, we really do need well calibrated probability outputs, and AUC won't tell us about that.
    2. Classification-threshold invariance is not always desirable.
      • Wide disparities in the cost of FN vs. FP : it may be critical to minimize one type of classification error.
        e.g, email spam detection : you likely want to prioritize minimizing false positives (even if that results in a significant increase of false negatives) ⇒ AUC ISNOT a useful metric.
  • AUC PR (Area Under PR Curve - precision-recall)
    Focusing on the minority class, Useful for imbalanced classification problems.

    • ranges from 0 to 1
    • The Higher AUC PR the higher-quality model.
  • AUC ROC (Area Under Curve - Receiving operation characteristic)
    Calculating the FP(False Positive) rate and TP(True Positive) rate for a set of predictions by the model under different thresholds.

    • ranges 0~1
    • The higher AUC ROC the higher-quality model

AUC & ROC Interpretation

ROC
An ROC curve with one diagonal line running from (0,0) to (1,1). TP and FP rates increase linearly at the same rate.
This ROC curve has an AUC of 0.5, meaning it ranks a random positive example higher than a random negative example 50% of the time. As such, the corresponding classification model is basically worthless, as its predictive ability is no better than random guessing.
An ROC curve that arcs right and up from (0,0) to (1,1). FP rate increases at a faster rate than TP rate.
This ROC curve has an AUC between 0 and 0.5, meaning it ranks a random positive example higher than a random negative example less than 50% of the time. The corresponding model actually performs worse than random guessing! If you see an ROC curve like this, it likely indicates there's a bug in your data.
ROC curves produce AUC values greater than 0.5
The TP rate is 1.0 for all FP rates.
This is the best possible ROC curve, as it ranks all positives above all negatives. It has an AUC of 1.0. (In practice, if you have a "perfect" classifier with an AUC of 1.0, you should be suspicious, as it likely indicates a bug in your model. For example, you may have overfit to your training data, or the label data may be replicated in one of your features.)
TP rate increases at a faster rate than FP rate.
This ROC curve has an AUC between 0.5 and 1.0, meaning it ranks a random positive example higher than a random negative example more than 50% of the time. Real-world binary classification AUC values generally fall into this range.

Understanding Q

[AUC and Scaling Predictions]

How would multiplying all of the predictions from a given model by 2.0 (for example, if the model predicts 0.4, we multiply by 2.0 to get a prediction of 0.8) change the model's performance as measured by AUC?

  • ❌ It would make AUC terrible, since the prediction values are now way off.
  • ❌ It would make AUC better, because the prediction values are all farther apart.
  • ⭕ No change. AUC only cares about relative prediction scores.
    → AUC is based on the relative predictions : only cares about relative rankings
    • ANY TRANSFORMATION OF THE PREDICTION that PRESERVES THE RELATIVE RANKING has no effect on AUC.
    • Not the case for other metrics such as squared error, log loss, or prediction bias.