Representative cross-sectional sampling

17 June, 2008 in Basel II, Credit Risk, Default Analytics | Tags: LGD, marts, modelling marts, sampling

Drawing together several themes, today’s post recommends how to assemble modelling marts that will be representative for use in Basel context.

Basel context is a cross-sectional context: at some point in time, such as the most recent calendar month end, the bank must assess the risk components (PD, EAD, LGD and hence [or otherwise?] the expected loss EL) for the time exposure of the next 12 months. As the point in time is fixed and the coverage is all at-risk exposures, accounts will be encountered in all stages of credit status (and any MOB): G, I, point-in-time bad B, episodic bad E, plus whatever collections and recoveries statuses may obtain.

PD models for this context would primarily be behavioural models, built to predict a 12-month OW. (BTW earlier posts discuss the transitional use of application PDs for this purpose.) EAD and LGD models are needed. Several modelling marts are therefore needed. How many, and how assembled in order to be representative when put to work together in Basel duty?

My suggestions below are open to discussion & debate – tell us if you have alternative views or practices.

The underlying sampling frame is to pick a point in time and observe all accounts at that point in time. Because of the need for 12 month OW, this point in time will be at least 12 months before the data horizon (current time)
This sample frame can be overlaid to increase the modelling mart: e.g. take several points in time, a month or a quarter apart. Naturally, the additional information is correlated but that presents no great problem as long as one doesn’t treat it as independent. A limitation is that as ones reaches further back into history, the models become less relevant to the future.
Plan to segment the fairly extensively; a cross-section will include many diverse animals better handled in their own (albeit small) cages than handled with one cover-all model. “Segmentation” is a popular word but you could also call this “decision-tree”, CART, etc.
Each segment = separate mart = completely separate model
Segment PD behavioural: at minimum need to segment E from G. Recall that E is an account that is not point-in-time bad but is episodic bad i.e. has not yet re-aged. Further subsegmentation is likely to be sensible, into say the various levels of I (Indeterminate). Naturally, no PD model is required for status B or C,R, etc.
Target variable PD: whether the start of a new bad episode is encountered during the following 12 months. A definitional issue AWML arises as to how to handle segment E.
Segmenting LGD: may leave this for another d

ay …

We get older

6 comments

28 June, 2008 at 02:10

Bruce M

Clive, that sounded really complicated.

It isn’t really.

To assess the probability of a default within the next 12 months. Sample a population using historic data and assess the default rate over the following 12 months. ie. 10,000 clients at as 1/1/2006, 100 defaults prior to 1/1/2007. default rate is 1%. Obviously that is a “point-in-time” rating, work must be done to derive a “Long run average” figure (needed to calc pillar1 capital).

Split the exposures into segments that rank risk. A standard scorecard type model does it. However, impairment experience (current or recent) will be very predictive and skew the model. Ignore it for the main segmentation model and split the impaired exposures into separate buckets. Current defaults stand alone.

As the default outcome includes those cases that default and then “cure”, a set of cure rates are needed. I add these to the LGD model. A careful measurement of default cases and the eventual outcome (cure or loss event) gives a set of cure rates. However, these vary across exposures and time. A simple model can allow for this.

Add in a model to estimate exposure change between now and default (EAD model). Then knock up the LGD model, which would be a probability verses cash flow model.

The real good trick is plan how the stress testing/ validation/ pillar 3 outputs fit in BEFORE you start building the models.

30 June, 2008 at 00:55

Clive

Bruce, delighted to get some input from someone who has worked through the full chain of models.

Isn’t the LGD model where many of the issues lie?

When building a mart for the LGD modelling, which instances are included? Will there be one row for each client in default @1/1/2006, plus one row for each client that goes into default during the 12-month outcome window. If an account was good @1/1/2006 and in default @ 1/4/2006 and cured @ 1/8/2006 and into default anew @ 1/11/2006, will this be one row or two rows in the LGD mart?

The issues I (imagined I) saw were:

1. LGD modelling may have to be based on different (especially, earlier) data than PD modelling because of the length of time needed for the Loss situation to chrystalise and become measurable. Thinking of products (such as HL) where a default episode may last for a year or two before its outcome is known.

Your “cure rates” model, presumably estimated from earlier data, seems to cover for this point. NNB colleagues do something similar, factoring the LGD model into the product of an PDR (probability of going from default to recoveries – i.e. non-cure) and a LGR (loss given recoveries).

2. Possible “length based sampling” bias that might arise from certain ways of sampling default episodes for inclusion into the LGD mart. On some sampling bases longer episodes will be over-represented.

3. Definitional (and related statistical) issues caused by accounts that have multiple default + cure cycles. This point may be negligible for products like CC.

Thank you for taking the time to contribute.

1 July, 2008 at 18:51

Bruce M

There are a few issues around sampling length and measurement. The key thought process is to be consistent in your methodology from sampling historic data through to deploying the model.

So, a PD would be a default event in the next 12 months. I treat this as a trigger measure. Therefore if a exposure hits default and then recovers or even progresses to loss and is writen off before the end of the 12 months window it is still included. However, the PD from this measure overestimates the possible losses unless adjusted to allow for “cures”.
So to be constistent with the PD approach, the LGD model must consider cases where the default ‘trigger’ is hit but subject to limits. ie. if an exposure, rolls into and out of default a number of times over a twelve month period. These are not always separate events in the lgd sample. A bit of thought is required to be consistent with the PD approach (ie this example would be one default event in the PD model).
Another issue, which you mentioned, is censoring. A default that has not yet progressed to loss, is not a ‘cure’ unless the is a definate observed event. So if an exposure hits default and then is cleared by the client and the exposure closed, this is a cure. If a default recovers to a reasonable state (judgement needed here) for a long enough period, this could be a ‘cure’. Again, the historic data must allow enough time for the behaviour to be observed and not assume an outcome.

So, a few issues to work through but not that difficult.
As usual with modelling it is very important to be consistent in your methodology.

1 July, 2008 at 22:59

Clive

Agreed that the most important message is for consistency across models.

No doubt many valid approaches exist in the industry, internally consistent but different from one another.

At NNB (some years ago) the situation had developed of LGD and EAD models being built in silos by different modelling teams than the PD models. This had the potential to lead to the kind of inconsistency that APRA warned about (earlier post refers).

I aimed to bind the approaches into a consistent whole by unifying analysis around a tightly defined “bad episode” construct. A different way of saying this is a “strictly applied re-aging definition”. One manifestation of this approach would be a single dataset as the only source for default information for any team, whether they were modelling PD, EAD or LGD. Each record would be an episode (and vise versa).

It didn’t happen exactly like that but the message about consistency got through.

Thanks again for your contributions.

2 July, 2008 at 19:07

Bruce M

I totally agree.

Although there is many different modelling methodologies, internal consistency is critical.

In the development stage, we used a single large SAS dataset to store the data in one place for all models. It contained 100’s of millions of data items. Each line represented one exposure as at a point in time (month end). The data was then “folded”, so historical data would be summarised and brought forward, and future outcome data would be ‘folded’ back.

Also, the whole process involved no more than 4 people in one team. I like to keep things simple.

2 July, 2008 at 23:23

EL-AL validation « ozrisk.net

[…] point made by Bruce M in recent comments is that there needs to be consistency in the modelling methodology behind the suite of models for […]

Representative cross-sectional sampling

Google Advertisement

Most popular

Recent Comments

Most read recently

Associations

Banking & Risk Blogs

Banks

Basel II

Information Services

Islamic Finance

Of Interest

Other ADIs

Regulators

Categories

We get older

Some Rights Reserved

Financial Times Asia

BBC Business

6 comments

Blog Stats

Follow “ozrisk.net”

Representative cross-sectional sampling

Share this:

Like this:

Related

Google Advertisement

Most popular

Recent Comments

Most read recently

Associations

Banking & Risk Blogs

Banks

Basel II

Information Services

Islamic Finance

Of Interest

Other ADIs

Regulators

Categories

Categories

We get older

Some Rights Reserved

Financial Times Asia

BBC Business

6 comments

Blog Stats

Follow “ozrisk.net”