Continuing part 1, perhaps we should note that the subject ‘hazard’ here is in its very specific statistical sense, and is not the *moral hazard* issue which has been a serious subject of debate in the context of taxpayer rescues of financial institutions.

Mathematically, the hazard function is defined by h(x) = f(x) / (1 – F(x) )

Although its definition has a continuous context, the time granularity of
our data imposes a discreteness on the hazard: for example, if our data
is monthly then the hazards we calculate will be “1-month” hazards.

The meaning of the formula is that the hazard is a *conditional probability*
: the probability of default in the time grain immediately following
time=x , given that the account hasn’t defaulted yet (i.e.
anywhere in the time interval from 0 up to x). Thus, h(12) would be the
probability that an account that has been good for its first 12 MOB,
might go bad in MOB=13.

1 – F(x) is also called S(x) and given the name *survival function*, i.e. the probability of not defaulting before time x.

Hazard is not the same thing as the probability distribution. I tend
to illustrate this point with a familiar example from human mortality.
What is the probability that a person would die in their 100th year? The
likely interpretation of this is to visualise a distribution of all the
ages 0-125 and a distribution curve with a peak somewhere in
grandparent zone and tailing off sharply such that the chance a person
dies during their 100th year would be very low – less than 1%. This is
the chance that a newly born person might die in their 100th year. By
contrast, the one-year hazard at age 99 is rather high – over 30%. This
is a chance that someone who has survived to age 99 dies during the next
year (their 100th year).

Upcoming posts will discuss uses and interpretations of all these items in the context of default (and churn

a>) analytics.
### Like this:

Like Loading...

*Related*

## Most popular