You are currently browsing the category archive for the ‘Default Analytics’ category.
While on the subject of “validation” – it can have a range of meanings when applied to credit risk models.
At the most general level it means review by an external authority. This could cover a wider scope than merely reviewing the models themselves. All aspects of how the modelling methodology was chosen, executed, implemented, and integated with the business might be considered. Naturally an external technical review of the models may be a valuable subtask.
Validation using data is a more concrete approach. Widest scope is achieved by having a sample of the bank’s exposures scored by a relevant external agency with similar models for comparison with the bank’s own results. Whilst this covers the most bases, it is hard to do it well in practice because of the difficulty of reproducing the same data environment – for example categorical predictors may need to be ‘mapped’.
Validation using the bank’s own data is the easiest and perhaps most familiar context. Various more specific technical terms apply. Some examples:
- during the model building phase it is good practice to hold out a ‘validation’ sample as a protection against over-fitting. This is also called cross-validation. The validation sample used is randomly selected from the modelling mart to guarantee neutrality with respect to all data effects.
- a proposed new model can be run on ‘out of time’ data – cohorts that are before (‘backtesting’) or after the sample window represented in the modelling mart. This is likely to be instructive and reassuring but does not carry the guarantee that pure cross-validation does.
- the routine monitoring of the performance of models once they have been implemented may also be considered to be ongoing ‘validation’ and is the first line of defence.
The simplest setting is validation of an individual component, especially PD. Last week’s post touched on the more difficult context of validating that the chain of models PD-EAD-LGD work together correctly.
Aren’t there some aspects of Basel – like long term cycle issues – that defy validation? Or rather, rely on judgement rather than analysis?
Nothing to do with airlines, we speak here of validating expected loss against actual loss.
A point made by Bruce M in recent comments is that there needs to be consistency in the modelling methodology behind the suite of models for the risk components PD, EAD and LGD. One task that should bring this point to the fore is the validation of EL against AL.
The PD (and EAD) models can be easily validated because their predicted outcomes become certain after 12 months. LGD is hard because
- the observation period starts later: if an account defaults in the 11th month of the 12-month outcome window, observation of the actual LGD outcome (i.e. actual loss) can only begin at that point, which is already 11 months later than the sample cohort.
- the observation period may be long
- ideally one needs to wait for the longest AL to resolve, but one can’t know in advance how long this will be
This means that ELs can only be reliably validated against ALs if the sample cohorts are quite far back in time – perhaps 2-3 years depending on product.
Nevertheless an adequate job can be done on more recent cohorts, considering that even on recent cohorts, at least some of the ALs will be known. I recommend a graphic approach showing EL vs AL for many quarterly cohorts simultaneously, with certain ALs in a bold colour, and as-yet-unresolved defaults shown on a possible – probable – worst case basis via suitable graphic clues (e.g. colours, hatching, error bars). Such a display will show a ‘fan’ effect, whereby older cohorts have a more certain EL-AL reconciliation, whereas for more recent cohorts the zone for AL fans out. (EL is a historic fact and is always known exactly)
Carrying out an EL-AL validation is a good way to review the consistency of model approaches and to detect those situations that fall between the cracks.
Drawing together several themes, today’s post recommends how to assemble modelling marts that will be representative for use in Basel context.
Basel context is a cross-sectional context: at some point in time, such as the most recent calendar month end, the bank must assess the risk components (PD, EAD, LGD and hence [or otherwise?] the expected loss EL) for the time exposure of the next 12 months. As the point in time is fixed and the coverage is all at-risk exposures, accounts will be encountered in all stages of credit status (and any MOB): G, I, point-in-time bad B, episodic bad E, plus whatever collections and recoveries statuses may obtain.
PD models for this context would primarily be behavioural models, built to predict a 12-month OW. (BTW earlier posts discuss the transitional use of application PDs for this purpose.) EAD and LGD models are needed. Several modelling marts are therefore needed. How many, and how assembled in order to be representative when put to work together in Basel duty?
My suggestions below are open to discussion & debate – tell us if you have alternative views or practices.
- The underlying sampling frame is to pick a point in time and observe all accounts at that point in time. Because of the need for 12 month OW, this point in time will be at least 12 months before the data horizon (current time)
- This sample frame can be overlaid to increase the modelling mart: e.g. take several points in time, a month or a quarter apart. Naturally, the additional information is correlated but that presents no great problem as long as one doesn’t treat it as independent. A limitation is that as ones reaches further back into history, the models become less relevant to the future.
- Plan to segment the fairly extensively; a cross-section will include many diverse animals better handled in their own (albeit small) cages than handled with one cover-all model. “Segmentation” is a popular word but you could also call this “decision-tree”, CART, etc.
- Each segment = separate mart = completely separate model
- Segment PD behavioural: at minimum need to segment E from G. Recall that E is an account that is not point-in-time bad but is episodic bad i.e. has not yet re-aged. Further subsegmentation is likely to be sensible, into say the various levels of I (Indeterminate). Naturally, no PD model is required for status B or C,R, etc.
- Target variable PD: whether the start of a new bad episode is encountered during the following 12 months. A definitional issue AWML arises as to how to handle segment E.
- Segmenting LGD: may leave this for another day …
Default episodes have varying lengths. This can lead to a bias called related to the statistical issue called “length-based sampling”
For building an LGD modelling mart, a typical approach would be to collect all the bad episodes that impinge on a certain time window. However this introduces a length-based bias, because the longer episodes have more chance to be represented. Longer episodes are, in turn, quite likely to be correlated with non-average losses.
To get unbiased sampling for building a behavioural mart, specify a sample window and only include bad episodes that started during that window. This will exclude accounts that are already in the middle of a bad episodes at the start of the time window.
Continuing the re-aging thread, a note circulated by APRA had a clear grip of the issue, and proposed:
“APRA’s proposed solution is to only allow the recording of a second default event after the loan has been in the non-default status for a period of at least 12 months”
‘Fraid I can’t give a direct reference as I only have an undated photocopy to hand, entitled “Multiple defaults in the retail portfolio” – it would have been about 2004. Please post to the blog any update on these issues that you may know of.
APRA’s concern was to “require the number of observations in bank’s PD and LGD databases to be equal” because of the traps of otherwise having mis-matched bases for PD and LGD. My preferred way of describing this – via “bad episodes” – is semantically different but hopefully faithful to the essence of the problem; it also lends itself to other difficulties that will be met.
Re-capping points from the last couple of posts:
- recognise that default definition starts with a point-in-time definition but also has a derived episodic dimension: every transition from good to bad at a point in time begins a bad episode which is a relatively long interval of time.
- the rule which specifies when the bad episode can end is an integral part of the default definition and is called the re-aging rule.
- these bad episodes will then be relatively few in number and will be the basic units of modelling
‘Relatively long’ and ‘Relatively few’ represent implicit recommendations to choose a re-aging rule that produces few, long, congealed bad episodes rather than the opposite. Technically, you could get out alive with a rule that makes many sporadic episodes but you will get a lot of unnecessary headaches: multiple non-independent episodes, large numbers of zero-loss LGD points, multiplicities within a year, and in general a dilution of modelling power through not aligning model constructs with a sensible grip on reality.
With this understanding, the APRA proposal says that the re-aging rule should allow a bad episode to end after 12 continuous non-bad months have elapsed. This seems a good choice and will produce well-congealed bad episodes. A particular merit is that two bad episodes for any particular account within any 12-month period is never possible. This is helpful because a lot of modelling (e.g. behavioural) has a 12-month OW and the chance of any multiplicity would be a nusiance.
Thinking in database terms, one would have only one source of default information: a table of default episodes, keyed by account and start date. Of course, bad episodes are well behaved constructs being distinct for any account and not overlapping. Depending how one implements the rule there can be a slight wobbly about whether a new episode can begin immediately that the previous one ends – imagine an account with B then 12G then B again – you decide how you like to treat this case – it’s not a showstopper.
For any longitudinal modelling, looking for the first default is equivalent to looking for the first start of a bad episode.
APRA’s concern that number of observations should be equal is trivially met because the table of default episodes is the common data source for either the PD modelling or the LGD modelling.
So does that solve everything? Not quite, just clears some problems so that we can face the more subtle ones standing in the shadows behind, AWML.
PS any corrections or updates on APRA or other regulatory opinions would be most welcome.
Continuing the re-aging theme: a clear episodic definition of default is important as the basis for LGD modelling.
Whether one thinks of this issue in terms of the re-aging rule, or in terms of default episodes, is two sides of the same coin: re-aging is the rule that determines when the episode ends, and the default episode is the period of time from the initial triggering of the (point-in-time) default definition until that end point. I find it easier to talk in terms of the default episodes (a.k.a. “bad episodes”) because those are the indivisible modelling units.
One has to be able to clearly identify, enumerate and isolate the separate default episodes. If your default definition doesn’t produce this level of clarity, there will be some ugly problems in the LGD modelling phase.
The ideal is a fairly heavily “congealed” approach, that tends to produce few, long, well separated episodes rather than many, potentially short and frequent ones. The motivation is that each episode becomes a modelling unit for LGD. Common sense and business knowledge would suggest that the modelling of LGD issues would be more coherent with a more congealed approach – otherwise one might end up with a larger mart of bad episodes, many of them short and ending in no loss, and many of them correlated and to some extent duplicating each other.
Also the re-aging rule should be invariant to time granularity – it wouldn’t accord with intuition if a change from monthly to weekly data (for example) could substantially change the number and extent of the default episodes. Hence a rule referring to a re-aging period in absolute time units (e.g. X months) is sensible.
These issues were identified and addressed in an APRA note some years ago AWML.
Maybe time to bring up a subject that contains more difficulties than one would expect: re-aging. When an account has gone into default – at some point in time – how long can it be before the account can again be considered ‘good’, and under what circumstances.
Re-aging needs to be part and parcel of the default definition. The default definitions in typical context are really point-in-time default definitions, easy to relate to if one imagines an account running along longitudinally in a good status, and then at some first point in time triggering the default definition, whatever that is (something like 90DPD on an amount of at least $100).
But the difficulties are, what happens next? Suppose the customer makes some partial or full payment, such that in the next grain of time (e.g. the next month) their point-in-time status is not in default. Perhaps they are fully current (=zero DPD), or perhaps their partial payment has pulled them back to a 30DPD or 60DPD status. How does this affect modelling and other activities?
It does not affect application PD modelling, which is longitudinal from the start of the account (MOB=0), and the modelling target is “went bad ever within a certain OW”; as soon as any account first triggers default, it has established its target status as “went bad” and what happens beyond doesn’t matter for the PD model.
It’s a more complicated story for the LGD model AWML.
The first step is to recognise that besides the point-in-time aspect of default, there is also an episodic aspect, which is the interval of time until the account can be considered good again. Why is this episodic definition needed? Can’t we manage just with applying the point-in-time definition at each successive point in time? The problem is that, depending on the granularity of time (e.g. monthly), it would then be possible to have many separate bad episodes for an account within a fairly short time window such as a 12-month window. An account’s status might go something like GGBGGBBGGGB. This patchy pattern then causes headaches for any cross-sectional analyses, and particularly for the basis of the LGD modelling.
The common-sense feeling is that the above pattern represents one extended bad episode, not three separate bad points (months) separated by good points. In banking language there needs to be a re-aging rule that says the account can’t be considered G immediately that the point-in-time default conditions don’t hold. Instead, there is a new status which is “not in default but still in a re-aging period”.
My preferred terminology is to call this situation “not bad but still in a bad episode”: and to use “E” as the code for any such time grains. Thus the above pattern would be GGBEEBBEEEB (if there is a re-aging rule that says an account must be good for several successive months before it can be fully G again.
It would be convenient if one could assume independence of the two main agencies: default and churn.
Although this is likely to be assumed in the interests of keeping things simple, it is unfortunately a doubtful assumption. There may well be a correlation against the bank’s interests in the form of better credit risks finding it easier (then poor credit risks) to re-finance elsewhere on favourable terms. Then, higher churn (earlier closure) may be correlated with lower PD. Full modelling of such a situation would require the joint modelling of default and churn.
Churn is not a ‘risk’ in the Basel meaning(s) but is referred to as such in this post in the sense that it is an uncertain event with unfavourable financial consequence for the bank: opportunity loss of revenue.
So far the event we’ve been considering as the subject of analysis has been default, with occasional mention of churn.
Default, being progressive, lends itself to analysis of its stages, such as the events of going 30DPD or 60DPD. In addition to default hazard, one can analyse 30DPD hazard and 60DPD hazard. One advantage, especially for monitoring, is that these events occur slightly sooner. A statistical advantage is that these events are more numerous than default events. Given an intuition, or perhaps a model, of how 30DPD and 60DPD profiles relate to default profiles, they could be a useful analytical tool.
That segues into the roll rates discussion AWML.
The relationship however need not be straightforward. For example, there may be a spike of 30DPD or 60DPD at MOB=2 or 3, due to bugs or carelessness with the administration of new re-payment schedules. Most of those would not roll through to default.