Continuing part 1, perhaps we should note that the subject ‘hazard’ here is in its very specific statistical sense, and is not the moral hazard issue which has been a serious subject of debate in the context of taxpayer rescues of financial institutions.
Mathematically, the hazard function is defined by h(x) = f(x) / (1 – F(x) )
Although its definition has a continuous context, the time granularity of our data imposes a discreteness on the hazard: for example, if our data is monthly then the hazards we calculate will be “1-month” hazards.
The meaning of the formula is that the hazard is a conditional probability : the probability of default in the time grain immediately following time=x , given that the account hasn’t defaulted yet (i.e. anywhere in the time interval from 0 up to x). Thus, h(12) would be the probability that an account that has been good for its first 12 MOB, might go bad in MOB=13.
1 – F(x) is also called S(x) and given the name survival function, i.e. the probability of not defaulting before time x.
Hazard is not the same thing as the probability distribution. I tend to illustrate this point with a familiar example from human mortality. What is the probability that a person would die in their 100th year? The likely interpretation of this is to visualise a distribution of all the ages 0-125 and a distribution curve with a peak somewhere in grandparent zone and tailing off sharply such that the chance a person dies during their 100th year would be very low – less than 1%. This is the chance that a newly born person might die in their 100th year. By contrast, the one-year hazard at age 99 is rather high – over 30%. This is a chance that someone who has survived to age 99 dies during the next year (their 100th year).
Upcoming posts will discuss uses and interpretations of all these items in the context of default (and churna>) analytics.