You are currently browsing the tag archive for the ‘hazard’ tag.
So far the event we’ve been considering as the subject of analysis has been default, with occasional mention of churn.
Default, being progressive, lends itself to analysis of its stages, such as the events of going 30DPD or 60DPD. In addition to default hazard, one can analyse 30DPD hazard and 60DPD hazard. One advantage, especially for monitoring, is that these events occur slightly sooner. A statistical advantage is that these events are more numerous than default events. Given an intuition, or perhaps a model, of how 30DPD and 60DPD profiles relate to default profiles, they could be a useful analytical tool.
That segues into the roll rates discussion AWML.
The relationship however need not be straightforward. For example, there may be a spike of 30DPD or 60DPD at MOB=2 or 3, due to bugs or carelessness with the administration of new re-payment schedules. Most of those would not roll through to default.
One of the uses of a hazard curve is as a sanity check on your data and the technicalities of the default definition.
If you regularly find yourself analysing millions of records, you will know that every conceivable weird and wobbly data bug will happen, as well as a few that could never have been conceived of. Recalling a typical example from a loan portfolio: there were 10,000 accounts that opened and closed on the same day. This not surprisingly was some artefact of how the data systems coped with proposals or quotes (or something), but in reality these accounts were NTU and there was never any exposure to risk in their respect. But, in amongst a quarter of a million accounts, it would be possible to miss their presence and to do some default analytics – and even some model building – including these accounts as “closed good” accounts.
<digress for a war story> One of those accounts even managed to span two months! It appeared in two consecutive calendar month snapshot datasets – somehow allowed by time zone differences and the exact timing of month-end processing. A casual analysis might have assumed that this represented two months of exposure to risk – see also the comments about time grains <end digression>
But coming to the point of this post, I have found that estimating the default and churn hazard is an excellent “sanity check” on the data that will quickly show up most issues that you would want to know about. The issue mentioned above showed up as a massive spike in the churn hazard at MOB=1.
Other features that might be noticeable in churn hazard curves are peaks of churn around key account ages, such as at MOB=6 if the product has a teaser rate for the first 6 months. Multiples of 12 MOB may also occur in certain pay-annual-interest-in-advance type of products. These examples would be features that one might be on the lookout for, so finding them would be “reassuring” feedback rather than “alerting” feedback.
Sanity checking is not only noticing what you didn’t expect, but also confirming what you did expect.
Features found in the default hazard curves may give important feedback about the way the default definition works. For example, with a 90DPD definition one may be expecting zero hazard for MOB=1,2,3 but there may in fact be genuine defaults in that zone triggered by supplementary business rules. However, what can happen is that the totality of rules in the default definition don’t quite produce the desired effect in practice. One example I recall caused the year-in-advance loans to reflect as default after only 30DPD. This showed up as a spike at 12,24,36 MOB and caused a review of the default definition as applied to this (relatively small) portion of the loan book.
The data cleaning and sanity checking stage is helped by having some experience in similar analyses on similar products. But even in a completely new context, some data wobblies will produce such an unnatural effect on the hazard curve that you will be immediately alerted to follow up.
Hazard curves, being longitudinal, only help you examine default tendencies that relate to MOB. Cross-sectional effects, such as a sudden worsening in credit conditions in the economy, would be monitored in other ways.
What shape does a typical default hazard curve have?
Note that this post is about default hazard – the churn hazard curve is a completely different matter.
Recall that the hazard at any particular MOB is indicating the instantaneous chance that a good account of that MOB age might go bad. So, where the curve is highest is showing the most dangerous age for accounts.
For most products, the hazard will be very close to zero for the first 3 or 4 months. This depends on the details of your default definition, but for example a simple 90DPD type of definition can’t produce a default in MOB 1,2 or 3. Some default definitions can be triggered even in those first MOBs via business rules about bankruptcy etc.
For some situations – like a new product to market – there can be an issue of “application fraud” or “soft fraud” whereby new accounts come on book that perhaps never had an intention to make any repayments. Such a situation would show up as a spike in hazard around the 4-5 MOB.
Aside from application fraud, typical CC hazard curves tend to rise rapidly to a maximum by 9-12 MOB and then to decline slowly to stable plateau at maybe half the peak hazard level. Hazard doesn’t decline to zero because no matter how old an account is, there remains a residual chance that it can go into default.
In practice, one gets relatively little chance to study the hazard behaviour at long MOB – say, over 36 months – because that calls for data going back more than 3 years – rather a long time in credit markets.
On a technical point, a constant hazard corresponds to an exponential distribution for the waiting time until first default.
It would be fairly easy to confuse the notions of hazard curve and probability density function, since (for default on a typical credit product) both start at zero and climb to a peak and then decline.
The more data used in the analysis, the smoother the curves will be, but whatever the case the cumulative density function (“emergence curve”) will always be much smoother than the hazard and pdf.
For the reasons in the above two paragraphs, I recommend presenting default analytical work via the cdf graph using a non-technical name like “emergence curve” or “default profile”. Please send in your preferred nomenclatures in case there is some consensus we could publicise. My slight preference is for “default profile” which is neutral and non-technical and easily accommodates “churn profile” or “cross-sell profile” when one analyses some other waiting time quantity such as these.
The above paragraph is about presenting and communicating the results; but for analytical insight, I recommend that the analyst should be looking at the hazard curves as well – for discussion next time.
Continuing part 1, perhaps we should note that the subject ‘hazard’ here is in its very specific statistical sense, and is not the moral hazard issue which has been a serious subject of debate in the context of taxpayer rescues of financial institutions.
Mathematically, the hazard function is defined by h(x) = f(x) / (1 – F(x) )
Although its definition has a continuous context, the time granularity of our data imposes a discreteness on the hazard: for example, if our data is monthly then the hazards we calculate will be “1-month” hazards.
The meaning of the formula is that the hazard is a conditional probability : the probability of default in the time grain immediately following time=x , given that the account hasn’t defaulted yet (i.e. anywhere in the time interval from 0 up to x). Thus, h(12) would be the probability that an account that has been good for its first 12 MOB, might go bad in MOB=13.
1 – F(x) is also called S(x) and given the name survival function, i.e. the probability of not defaulting before time x.
Hazard is not the same thing as the probability distribution. I tend to illustrate this point with a familiar example from human mortality. What is the probability that a person would die in their 100th year? The likely interpretation of this is to visualise a distribution of all the ages 0-125 and a distribution curve with a peak somewhere in grandparent zone and tailing off sharply such that the chance a person dies during their 100th year would be very low – less than 1%. This is the chance that a newly born person might die in their 100th year. By contrast, the one-year hazard at age 99 is rather high – over 30%. This is a chance that someone who has survived to age 99 dies during the next year (their 100th year).
Upcoming posts will discuss uses and interpretations of all these items in the context of default (and churn) analytics.
Introducing some technical terms that arise in default analytics.
This nomenclature comes from the field of statistics called survival analysis, which is well established and readily found in text books or wiki entries etc. If you don’t mind reading maths you will find better guidance there than in this post. The name ‘survival’ arose because the subject matter was/is mostly mortality or onset/re-occurrence of disease in populations or test cohorts. This is not too far from the study of the onset of default, so happily (?if this is an appropriate word) a great deal of well established statistical theory and practice is available for the study of default. This applies mainly to PD rather than LGD modelling.
Some of these terms and their equivalents in banking terminology are covered below.
Survival analysis is an essentially longitudinal activity, although the data it is based on will often be cross-sectional in structure.
The key variable x is the waiting time until default. This means the MOB of the first default. This variable x will have a distribution (probability density function) f(x), from which can be derived (by integration) the cumulative density function F(x). The pdf is not intuitive for non-technical audiences and I recommend only showing the cdf which is a monotonic rising curve that is easy to interpret: F(24), for example, would show the probability of going bad on or before MOB=24. This can also be interpreted as “what proportion of this population will have gone bad within 2 years”.
Note that PD alwyas needs to be related to some time window, so “PD” alone is a vague concept and one needs to be specifying something like “probability of going bad within the first 24 MOB” (for a longitudinal PD) or “probability of going bad within the next 12 calendar months” (for a cross-sectional PD).
I avoid using the stats terminology of pdf or cdf because they don’t sound intuitive, and particularly the word “cumulative” can mean so many different things in various contexts. Some more business-intuitive term is preferable. Some colleagues have called the cdf the “emergence curve” which is quite descriptive as it makes one think of the bads “emerging” with the passing of time, as the curve climbs up. An emergence curve is visually comfortable to absorb (being an integral, it will be quite smoothe) and shows at a glance the values of the 12-month PD or 24-month PD or any other x-month PD. Another business-friendly term is “default profile”, which sits comfortably with “churn profile” for the cdf of waiting time until closed-good.
But none of these is the hazard curve > continued next time…
An important basic concept in default analytics is “exposed to risk” by which we mean risk of going into default unless otherwise specified (one might otherwise be studying risk/propensity of churn, cross-sell etc.)
Abbreviated ETR in this note but AFAIK this isn’t common so won’t be added to the abbreviations list.
Often probabilities are estimated by dividing the number of events that did happen by the number of events that could have happened, and ETR is basically that italicised bit i.e. the denominator of the fraction. The ‘hazards’ and risk PDs of default analytics are just special cases of this situation.
A typical setting is when building an Application PD model: the modelling mart will have some number of accounts that started out at open date (MOB=0), and a certain target OW of (say) 24 months; at the simplest level all the accounts are ETR of going into default within the OW.
However, if account #1 opened only 18 months ago and is still not in default, then although it has been ETR for 18 months, it hasn’t been ETR for 24 months and is not quite the same unit of modelling information as an older account #2 that did survive 24 months. Account #1 has reached the horizon and is said to have been censored. Model builders wouldn’t normally be dealing with these out-of-time (OOT) cases because, knowing that 24 months OW was the target, they would have chosen a sample window (SW) that was at least 24 months before the horizon in its entirety.
But what about account #3 that opened 30 months ago but closed good, i.e. without ever going into default, at MOB=18? Account #3, like account #1, was only ETR for 18 months and is not quite like account #2. There was no way it could have contributed a default event for MOB=19-24 as it was not ETR for 19-24.
That segues into the closed good in vs closed good out discussion AWML but meanwhile opinions and contributions would be welcome from those who have views on the issues. People who study mortality risk have similar issues whereby, for example, they study all individuals for a certain time window. People may emigrate and so be ETR for only a portion of the TW, because one can’t reliably trace their subsequent mortality (survive or die?) in another country. But, you don’t assume they survive (or die); rather you use their information appropriately with respect to their lesser overall ETR.
Because application modelling is longitudinal, the focus is on the first default, so ETR is mostly a matter of the account still being open and not ever having previously been in default. For behavioural modelling which is essentially cross-sectional, there is the additional issue of whether an account is ETR of fresh default or whether it is still included in some previous default episode – link to the re-ageing issue AWML.
There may be subleties in the ETR concept, such as deceased account holders, dormant accounts – are these ETR? Or in a product like reverse mortgage, is there a default risk at all?