You are currently browsing Clive’s articles.
.. continuing a ramble prompted by the Normal / Levy thread :
This deceptively simple question is at the heart of many modelling issues.
Interpreted at a shallow level, this could be a question about whether the values of some variables are within the range they have moved in historically.
At a deeper level, the question is whether models built on history will continue to be fit for purpose in future application.
Models cover a wide range from the deterministic ones of physics, that embody precise understanding of the mechanisms, through to empirical models that merely claim to capture useful patterns. It could be an interesting thread to fill in this continuum with examples; financial models, and especially econometric models, would be up the “empirical” end. There may be various nomenclatures for this kind of discussion – noting for example the von Mises (/Austrian) link provided earlier where “time invariant” is used as a descriptor of certain models.
So, sure, there is wide confidence that the models of physics will still work the same way in the future, but only qualified confidence in (especially) those models that have anything to do with human behaviour, or other complex systems.
Empirical models tend to depend heavily on the choice of the data that “calibrates” them (enough for another thread on this topic). Also, to the extent that they rely on patterns without understanding the drivers of those patterns, there may come a time when they unexpectedly perform less well – perhaps even catastrophically so – than before.
Footnote: I like Andrew’s comment “models are good tools but bad masters” and another well known one “all models are wrong, but some are useful”. Perhaps this latter one should have an addendum “…some of the time”.
Some contributions on the Normal vs Levy thread suggest that wider musings on modelling issues may be fruitful.
By the standards set by one poster as the minimum “to understand financial modelling”, my score looks to be 1/5, so readers need not expect breakthroughs on particular questions in this field.
While not disputing that Levy is likely to be more correct than Normal (in the modelling contexts cited) I wonder how much of the problem with the model can be attributed to this choice.
Doesn’t this issue segue into deeper issues concerning the complex systems that drive the processes whose outputs are finally observed as univariate distributions?
I plan a couple of exploratory posts.
Paraphrasing an emailed question from Dominik (who IIUC is not from Australia): is there information out there about the credit risks associated with different categories of business? This is outside my zone (mostly retail, i.e. individuals).
Dominik asks: “I need to set up (for a loan granting purposes) a kind of a rating matrix for different unconnected types of business such as a poultry business or shipyard.”
IIRC in Australia there exists a well codified hierarchical classification of business types, starting at super categories (like agriculture, mining, ..) and moving through a couple of layers down to very specific categories (like “coffin maker”). Analysts concerned with non-retail credit risk would probably have some experience or information about the credit risk characteristics of these hierarchies, but, as Andrew has commented elsewhere, they would be reluctant to share this knowledge as it would be part of the bank’s competitive advantage. However, without sharing the content, perhaps some readers would share some analytical or modelling tips?
From very slight involvement I seem to recall that factors like size of the business, turnover, nature of assets, and (especially) recent financial performance could be more important than fine classifications of business type. Some of these in turn (like the assets) may be more relevant to LGD than to PD.
Dominik further: “I thought about comparing data from different stock exchanges considering some parameters like a market cycle etc”
This wouldn’t be an easy route, given that listed companies are a very select sample of all the medium to large businesses out there. However, there is plenty of received wisdom (and analysis) about cyclical versus non-cyclical sectors of any stock exchange and/or country. Poultry, and coffin makers: non-cyclical! But credit risk – as some recent ASX cases illustrate – will depend heavily on capital structure (gearing) and the management of that company.
Even with a poultry business, if the management borrows to the hilt and pursues an aggressive acquisition strategy, at the same time trying to challenge the purchasing power of the big retailers – they could easily end up with egg on their faces (sorry).
Any advices from those who work in the non-retail area would be a significant improvement on the above and would be appreciated.
While on the subject of “validation” – it can have a range of meanings when applied to credit risk models.
At the most general level it means review by an external authority. This could cover a wider scope than merely reviewing the models themselves. All aspects of how the modelling methodology was chosen, executed, implemented, and integated with the business might be considered. Naturally an external technical review of the models may be a valuable subtask.
Validation using data is a more concrete approach. Widest scope is achieved by having a sample of the bank’s exposures scored by a relevant external agency with similar models for comparison with the bank’s own results. Whilst this covers the most bases, it is hard to do it well in practice because of the difficulty of reproducing the same data environment – for example categorical predictors may need to be ‘mapped’.
Validation using the bank’s own data is the easiest and perhaps most familiar context. Various more specific technical terms apply. Some examples:
- during the model building phase it is good practice to hold out a ‘validation’ sample as a protection against over-fitting. This is also called cross-validation. The validation sample used is randomly selected from the modelling mart to guarantee neutrality with respect to all data effects.
- a proposed new model can be run on ‘out of time’ data – cohorts that are before (‘backtesting’) or after the sample window represented in the modelling mart. This is likely to be instructive and reassuring but does not carry the guarantee that pure cross-validation does.
- the routine monitoring of the performance of models once they have been implemented may also be considered to be ongoing ‘validation’ and is the first line of defence.
The simplest setting is validation of an individual component, especially PD. Last week’s post touched on the more difficult context of validating that the chain of models PD-EAD-LGD work together correctly.
Aren’t there some aspects of Basel – like long term cycle issues – that defy validation? Or rather, rely on judgement rather than analysis?
Nothing to do with airlines, we speak here of validating expected loss against actual loss.
A point made by Bruce M in recent comments is that there needs to be consistency in the modelling methodology behind the suite of models for the risk components PD, EAD and LGD. One task that should bring this point to the fore is the validation of EL against AL.
The PD (and EAD) models can be easily validated because their predicted outcomes become certain after 12 months. LGD is hard because
- the observation period starts later: if an account defaults in the 11th month of the 12-month outcome window, observation of the actual LGD outcome (i.e. actual loss) can only begin at that point, which is already 11 months later than the sample cohort.
- the observation period may be long
- ideally one needs to wait for the longest AL to resolve, but one can’t know in advance how long this will be
This means that ELs can only be reliably validated against ALs if the sample cohorts are quite far back in time – perhaps 2-3 years depending on product.
Nevertheless an adequate job can be done on more recent cohorts, considering that even on recent cohorts, at least some of the ALs will be known. I recommend a graphic approach showing EL vs AL for many quarterly cohorts simultaneously, with certain ALs in a bold colour, and as-yet-unresolved defaults shown on a possible – probable – worst case basis via suitable graphic clues (e.g. colours, hatching, error bars). Such a display will show a ‘fan’ effect, whereby older cohorts have a more certain EL-AL reconciliation, whereas for more recent cohorts the zone for AL fans out. (EL is a historic fact and is always known exactly)
Carrying out an EL-AL validation is a good way to review the consistency of model approaches and to detect those situations that fall between the cracks.
Drawing together several themes, today’s post recommends how to assemble modelling marts that will be representative for use in Basel context.
Basel context is a cross-sectional context: at some point in time, such as the most recent calendar month end, the bank must assess the risk components (PD, EAD, LGD and hence [or otherwise?] the expected loss EL) for the time exposure of the next 12 months. As the point in time is fixed and the coverage is all at-risk exposures, accounts will be encountered in all stages of credit status (and any MOB): G, I, point-in-time bad B, episodic bad E, plus whatever collections and recoveries statuses may obtain.
PD models for this context would primarily be behavioural models, built to predict a 12-month OW. (BTW earlier posts discuss the transitional use of application PDs for this purpose.) EAD and LGD models are needed. Several modelling marts are therefore needed. How many, and how assembled in order to be representative when put to work together in Basel duty?
My suggestions below are open to discussion & debate – tell us if you have alternative views or practices.
- The underlying sampling frame is to pick a point in time and observe all accounts at that point in time. Because of the need for 12 month OW, this point in time will be at least 12 months before the data horizon (current time)
- This sample frame can be overlaid to increase the modelling mart: e.g. take several points in time, a month or a quarter apart. Naturally, the additional information is correlated but that presents no great problem as long as one doesn’t treat it as independent. A limitation is that as ones reaches further back into history, the models become less relevant to the future.
- Plan to segment the fairly extensively; a cross-section will include many diverse animals better handled in their own (albeit small) cages than handled with one cover-all model. “Segmentation” is a popular word but you could also call this “decision-tree”, CART, etc.
- Each segment = separate mart = completely separate model
- Segment PD behavioural: at minimum need to segment E from G. Recall that E is an account that is not point-in-time bad but is episodic bad i.e. has not yet re-aged. Further subsegmentation is likely to be sensible, into say the various levels of I (Indeterminate). Naturally, no PD model is required for status B or C,R, etc.
- Target variable PD: whether the start of a new bad episode is encountered during the following 12 months. A definitional issue AWML arises as to how to handle segment E.
- Segmenting LGD: may leave this for another day …
Default episodes have varying lengths. This can lead to a bias called related to the statistical issue called “length-based sampling”
For building an LGD modelling mart, a typical approach would be to collect all the bad episodes that impinge on a certain time window. However this introduces a length-based bias, because the longer episodes have more chance to be represented. Longer episodes are, in turn, quite likely to be correlated with non-average losses.
To get unbiased sampling for building a behavioural mart, specify a sample window and only include bad episodes that started during that window. This will exclude accounts that are already in the middle of a bad episodes at the start of the time window.
Continuing the re-aging thread, a note circulated by APRA had a clear grip of the issue, and proposed:
“APRA’s proposed solution is to only allow the recording of a second default event after the loan has been in the non-default status for a period of at least 12 months”
‘Fraid I can’t give a direct reference as I only have an undated photocopy to hand, entitled “Multiple defaults in the retail portfolio” – it would have been about 2004. Please post to the blog any update on these issues that you may know of.
APRA’s concern was to “require the number of observations in bank’s PD and LGD databases to be equal” because of the traps of otherwise having mis-matched bases for PD and LGD. My preferred way of describing this – via “bad episodes” – is semantically different but hopefully faithful to the essence of the problem; it also lends itself to other difficulties that will be met.
Re-capping points from the last couple of posts:
- recognise that default definition starts with a point-in-time definition but also has a derived episodic dimension: every transition from good to bad at a point in time begins a bad episode which is a relatively long interval of time.
- the rule which specifies when the bad episode can end is an integral part of the default definition and is called the re-aging rule.
- these bad episodes will then be relatively few in number and will be the basic units of modelling
‘Relatively long’ and ‘Relatively few’ represent implicit recommendations to choose a re-aging rule that produces few, long, congealed bad episodes rather than the opposite. Technically, you could get out alive with a rule that makes many sporadic episodes but you will get a lot of unnecessary headaches: multiple non-independent episodes, large numbers of zero-loss LGD points, multiplicities within a year, and in general a dilution of modelling power through not aligning model constructs with a sensible grip on reality.
With this understanding, the APRA proposal says that the re-aging rule should allow a bad episode to end after 12 continuous non-bad months have elapsed. This seems a good choice and will produce well-congealed bad episodes. A particular merit is that two bad episodes for any particular account within any 12-month period is never possible. This is helpful because a lot of modelling (e.g. behavioural) has a 12-month OW and the chance of any multiplicity would be a nusiance.
Thinking in database terms, one would have only one source of default information: a table of default episodes, keyed by account and start date. Of course, bad episodes are well behaved constructs being distinct for any account and not overlapping. Depending how one implements the rule there can be a slight wobbly about whether a new episode can begin immediately that the previous one ends – imagine an account with B then 12G then B again – you decide how you like to treat this case – it’s not a showstopper.
For any longitudinal modelling, looking for the first default is equivalent to looking for the first start of a bad episode.
APRA’s concern that number of observations should be equal is trivially met because the table of default episodes is the common data source for either the PD modelling or the LGD modelling.
So does that solve everything? Not quite, just clears some problems so that we can face the more subtle ones standing in the shadows behind, AWML.
PS any corrections or updates on APRA or other regulatory opinions would be most welcome.
Continuing the re-aging theme: a clear episodic definition of default is important as the basis for LGD modelling.
Whether one thinks of this issue in terms of the re-aging rule, or in terms of default episodes, is two sides of the same coin: re-aging is the rule that determines when the episode ends, and the default episode is the period of time from the initial triggering of the (point-in-time) default definition until that end point. I find it easier to talk in terms of the default episodes (a.k.a. “bad episodes”) because those are the indivisible modelling units.
One has to be able to clearly identify, enumerate and isolate the separate default episodes. If your default definition doesn’t produce this level of clarity, there will be some ugly problems in the LGD modelling phase.
The ideal is a fairly heavily “congealed” approach, that tends to produce few, long, well separated episodes rather than many, potentially short and frequent ones. The motivation is that each episode becomes a modelling unit for LGD. Common sense and business knowledge would suggest that the modelling of LGD issues would be more coherent with a more congealed approach – otherwise one might end up with a larger mart of bad episodes, many of them short and ending in no loss, and many of them correlated and to some extent duplicating each other.
Also the re-aging rule should be invariant to time granularity – it wouldn’t accord with intuition if a change from monthly to weekly data (for example) could substantially change the number and extent of the default episodes. Hence a rule referring to a re-aging period in absolute time units (e.g. X months) is sensible.
These issues were identified and addressed in an APRA note some years ago AWML.