The credit risk world likes to work with ‘odds’ and related quantities so these are covered today.
You could just do everything in terms of probability, i.e. PD, which is unambiguous. PD lies in [0,1] and a small number (like 0.002) is a better customer than a bigger number (like 0.013). In typical modelling situations (in Australia, in the good times..), a lot of PDs would have one or two or even three leading zeroes and these numbers are not handy for transcription or to quickly convey which zones they lie in.
It goes without saying that it often more palatable to format a PD as a percentage, e.g. PD = 0.013 as PD = 1.3%.
‘Odds’ have a special status because they are intimately linked with logistic regression, the main PD-modelling statistical tool. Odds can be worked out from the PD, and vice versa, as follows:
- odds = 1/PD – 1
- PD = 1/(1 + odds)
For example, odds = 8 means exactly the same thing as PD = 1/9 = 0.1111..
Odds are generally taken to be the Good:Bad odds; thus a bigger number for odds is a better situation. I have seen analysts using Odds the other way up i.e. the Bad:Good odds. You can come out alive but it will confuse your colleagues; +/- changes of sign will cascade through and graphs will tilt the opposite way.
One step closer to the logistic zone is to transform to “log_odds”.
- log_odds = ln(odds)
- odds = exp(log_odds)
‘ln’ means natural logs, i.e. to the base ‘e’. Actually, mathematicians always mean natural logs when they say log and as a matter of pride would never mention the base, or contemplate a base other than ‘e’ unless it was a neat way to summarise a problem that had structure particular to integral bases. Ambiguity can arise: computer systems that are tech-oriented, like SAS or MATLAB, assume ‘log’ means ln, whereas those that are business-oriented, like MS/Excel, assume that ‘log’ means log_to_base_10. It also doesn’t help that ‘ln’ is not comfortable in speech.
By ‘log’ I always mean natural log, and I use log10 or log2 to mean logs to base 10 or 2. For the meantime, the terminology ‘log_odds’ will be used, which is easy in speech, but if anyone can suggest better nomenclature they are welcome to put it forward.
If we’ve taken the right choices so far, a bigger number for log_odds is a better situation. Note that log_odds can be negative (when odds < 1 which is when PD > 0.5).
To make the numbers more convenient to handle, it is common practice to convert the log_odds to a ‘score’ on a user-friendly scale that wouldn’t involve negatives or decimal places. For the first time in this chain of transformation, arbitrary scaling constants are involved in this choice: one for location and one for scale (spread). A typical approach is illustrated below:
- for location: bang a stake in the ground at the point that will represent odds of 1 (== log_odds of zero == PD of 0.5): so, for example, choose a score of 500 to represent this point (which BTW would be a lousy customer)
- for scale: this is normally done by specifying how many points it takes to double the odds (PDO). A comfortable choice would be PDO=20, which says that a score of 520 <=> odds=2, 540 <=> odds=4, 560 <=> odds=8 etc.
Because log_odds is a logarithmic scale, the above choices work out and amount to a linear transformation of log_odds to score. The two scaling parameters, and hence the transformations from log_odds to score and back, will depend on these fairly arbitrary choices.
PDO=20 gives a nice granularity to the scores, which will mostly land in the 500-800 zone and you won’t feel the need to use decimal points i.e. whole-number scores suffice. As long as PDO is chosen to be positive, it will still be the case that a bigger score is a better situation.
All the above transformations are absolute arithmetic ones that always apply, irrespective of context such as outcome window, default definition, calibration, closed goods in/out, etc. If you find you disagree with someone via these calcs, it means you started from different contexts and therein lies the entire explanation for your