You are currently browsing the category archive for the 'Basel II' category.

Paraphrasing an emailed question from Dominik (who IIUC is not from Australia): is there information out there about the credit risks associated with different categories of business? This is outside my zone (mostly retail, i.e. individuals).

Dominik asks: “I need to set up (for a loan granting purposes) a kind of a rating matrix for different unconnected types of business such as a poultry business or shipyard.”

IIRC in Australia there exists a well codified hierarchical classification of business types, starting at super categories (like agriculture, mining, ..) and moving through a couple of layers down to very specific categories (like “coffin maker”). Analysts concerned with non-retail credit risk would probably have some experience or information about the credit risk characteristics of these hierarchies, but, as Andrew has commented elsewhere, they would be reluctant to share this knowledge as it would be part of the bank’s competitive advantage. However, without sharing the content, perhaps some readers would share some analytical or modelling tips?

From very slight involvement I seem to recall that factors like size of the business, turnover, nature of assets, and (especially) recent financial performance could be more important than fine classifications of business type. Some of these in turn (like the assets) may be more relevant to LGD than to PD. 

Dominik further: “I thought about comparing data from different stock exchanges considering some parameters like a market cycle etc”

This wouldn’t be an easy route, given that listed companies are a very select sample of all the medium to large businesses out there. However, there is plenty of received wisdom (and analysis) about cyclical versus non-cyclical sectors of any stock exchange and/or country. Poultry, and coffin makers: non-cyclical! But credit risk - as some recent ASX cases illustrate - will depend heavily on capital structure (gearing) and the management of that company.

Even with a poultry business, if the management borrows to the hilt and pursues an aggressive acquisition strategy, at the same time trying to challenge the purchasing power of the big retailers - they could easily end up with egg on their faces (sorry). 

Any advices from those who work in the non-retail area would be a significant improvement on the above and would be appreciated.

While on the subject of “validation” - it can have a range of meanings when applied to credit risk models.

At the most general level it means review by an external authority. This could cover a wider scope than merely reviewing the models themselves. All aspects of how the modelling methodology was chosen, executed, implemented, and integated with the business might be considered. Naturally an external technical review of the models may be a valuable subtask.  

Validation using data is a more concrete approach. Widest scope is achieved by having a sample of the bank’s exposures scored by a relevant external agency with similar models for comparison with the bank’s own results. Whilst this covers the most bases, it is hard to do it well in practice because of the difficulty of reproducing the same data environment - for example categorical predictors may need to be ’mapped’.

Validation using the bank’s own data is the easiest and perhaps most familiar context. Various more specific technical terms apply. Some examples:

  • during the model building phase it is good practice to hold out a ‘validation’ sample as a protection against over-fitting. This is also called cross-validation. The validation sample used is randomly selected from the modelling mart to guarantee neutrality with respect to all data effects.
  • a proposed new model can be run on ‘out of time’ data - cohorts that are before (’backtesting’) or after the sample window represented in the modelling mart. This is likely to be instructive and reassuring but does not carry the guarantee that pure cross-validation does.
  • the routine monitoring of the performance of models once they have been implemented may also be considered to be ongoing ‘validation’ and is the first line of defence. 

The simplest setting is validation of an individual component, especially PD. Last week’s post touched on the more difficult context of validating that the chain of models PD-EAD-LGD work together correctly.

Aren’t there some aspects of Basel - like long term cycle issues - that defy validation? Or rather, rely on judgement rather than analysis?

Nothing to do with airlines, we speak here of validating expected loss against actual loss.

A point made by Bruce M in recent comments is that there needs to be consistency in the modelling methodology behind the suite of models for the risk components PD, EAD and LGD. One task that should bring this point to the fore is the validation of EL against AL.

The PD (and EAD) models can be easily validated because their predicted outcomes become certain after 12 months. LGD is hard because

  • the observation period starts later: if an account defaults in the 11th month of the 12-month outcome window, observation of the actual LGD outcome (i.e. actual loss) can only begin at that point, which is already 11 months later than the sample cohort.
  • the observation period may be long
  • ideally one needs to wait for the longest AL to resolve, but one can’t know in advance how long this will be

This means that ELs can only be reliably validated against ALs if the sample cohorts are quite far back in time - perhaps 2-3 years depending on product.

Nevertheless an adequate job can be done on more recent cohorts, considering that even on recent cohorts, at least some of the ALs will be known. I recommend a graphic approach showing EL vs AL for many quarterly cohorts simultaneously, with certain ALs in a bold colour, and as-yet-unresolved defaults shown on a possible - probable - worst case basis via suitable graphic clues (e.g. colours, hatching, error bars). Such a display will show a ‘fan’ effect, whereby older cohorts have a more certain EL-AL reconciliation, whereas for more recent cohorts the zone for AL fans out. (EL is a historic fact and is always known exactly)

Carrying out an EL-AL validation is a good way to review the consistency of model approaches and to detect those situations that fall between the cracks.

 

A useful professional resource is provided by Ross Gayler in the form of the “Credit Risk Analytics Occasional Newsletter”

About this newsletter

    Intended Audience, Content & Frequency

The Credit Risk Analytics Occasional Newsletter is intended for predictive modelling analysts working in retail credit risk management, for example, credit scoring, Basel II modelling, or fraud modelling. It aims to distribute topical, publicly available information that might otherwise be difficult to find because of being dispersed over a wide range of sources. The contents will include information such as conference announcements and pointers to resources like research papers, software, and discussion groups. The frequency of newsletter issues is expected to be low, with time-critical announcements being made as needed and other issues appearing irregularly a few times per year.

Subscribing & Unsubscribing

This newsletter is distributed only by email. To subscribe or unsubscribe from the newsletter or update your mailing address, send an email to Ross Gayler, whose eml address is Ross.Gayler followed after the at sign by VedaAdvantage dot com.

Drawing together several themes, today’s post recommends how to assemble modelling marts that will be representative for use in Basel context.

Basel context is a cross-sectional context: at some point in time, such as the most recent calendar month end, the bank must assess the risk components (PD, EAD, LGD and hence [or otherwise?] the expected loss EL) for the time exposure of the next 12 months. As the point in time is fixed and the coverage is all at-risk exposures, accounts will be encountered in all stages of credit status (and any MOB): G, I, point-in-time bad B, episodic bad E, plus whatever collections and recoveries statuses may obtain.

PD models for this context would primarily be behavioural models, built to predict a 12-month OW. (BTW earlier posts discuss the transitional use of application PDs for this purpose.)  EAD and LGD models are needed. Several modelling marts are therefore needed. How many, and how assembled in order to be representative when put to work together in Basel duty?

My suggestions below are open to discussion & debate - tell us if you have alternative views or practices.

  1. The underlying sampling frame is to pick a point in time and observe all accounts at that point in time. Because of the need for 12 month OW, this point in time will be at least 12 months before the data horizon (current time)
  2. This sample frame can be overlaid to increase the modelling mart: e.g. take several points in time, a month or a quarter apart. Naturally, the additional information is correlated but that presents no great problem as long as one doesn’t treat it as independent. A limitation is that as ones reaches further back into history, the models become less relevant to the future. 
  3. Plan to segment the fairly extensively; a cross-section will include many diverse animals better handled in their own (albeit small) cages than handled with one cover-all model. “Segmentation” is a popular word but you could also call this “decision-tree”, CART, etc.
  4. Each segment = separate mart = completely separate model 
  5. Segment PD behavioural: at minimum need to segment E from G. Recall that E is an account that is not point-in-time bad but is episodic bad i.e. has not yet re-aged. Further subsegmentation is likely to be sensible, into say the various levels of I (Indeterminate). Naturally, no PD model is required for status B or C,R, etc.
  6. Target variable PD: whether the start of a new bad episode is encountered during the following 12 months. A definitional issue AWML arises as to how to handle segment E.
  7. Segmenting LGD: may leave this for another day  …

Default episodes have varying lengths. This can lead to a bias called related to the statistical issue called “length-based sampling”

For building an LGD modelling mart, a typical approach would be to collect all the bad episodes that impinge on a certain time window. However this introduces a length-based bias, because the longer episodes have more chance to be represented. Longer episodes are, in turn, quite likely to be correlated with non-average losses.

To get unbiased sampling for building a behavioural mart, specify a sample window and only include bad episodes that started during that window. This will exclude accounts that are already in the middle of a bad episodes at the start of the time window.

Continuing the re-aging thread, a note circulated by APRA had a clear grip of the issue, and proposed:

“APRA’s proposed solution is to only allow the recording of a second default event after the loan has been in the non-default status for a period of at least 12 months”

‘Fraid I can’t give a direct reference as I only have an undated photocopy to hand, entitled “Multiple defaults in the retail portfolio” - it would have been about 2004. Please post to the blog any update on these issues that you may know of.

APRA’s concern was to “require the number of observations in bank’s PD and LGD databases to be equal” because of the traps of otherwise having mis-matched bases for PD and LGD. My preferred way of describing this - via “bad episodes” - is semantically different but hopefully faithful to the essence of the problem; it also lends itself to other difficulties that will be met.

Re-capping points from the last couple of posts:

  • recognise that default definition starts with a point-in-time definition but also has a derived episodic dimension: every transition from good to bad at a point in time begins a bad episode which is a relatively long interval of time.
  • the rule which specifies when the bad episode can end is an integral part of the default definition and is called the re-aging rule.
  • these bad episodes will then be relatively few in number and will be the basic units of modelling

‘Relatively long’ and ‘Relatively few’ represent implicit recommendations to choose a re-aging rule that produces few, long, congealed bad episodes rather than the opposite. Technically, you could get out alive with a rule that makes many sporadic episodes but you will get a lot of unnecessary headaches: multiple non-independent episodes, large numbers of zero-loss LGD points, multiplicities within a year, and in general a dilution of modelling power through not aligning model constructs with a sensible grip on reality.

With this understanding, the APRA proposal says that the re-aging rule should allow a bad episode to end after 12 continuous non-bad months have elapsed. This seems a good choice and will produce well-congealed bad episodes. A particular merit is that two bad episodes for any particular account within any 12-month period is never possible. This is helpful because a lot of modelling (e.g. behavioural) has a 12-month OW and the chance of any multiplicity would be a nusiance.

Thinking in database terms, one would have only one source of default information: a table of default episodes, keyed by account and start date. Of course, bad episodes are well behaved constructs being distinct for any account and not overlapping. Depending how one implements the rule there can be a slight wobbly about whether a new episode can begin immediately that the previous one ends - imagine an account with B then 12G then B again - you decide how you like to treat this case - it’s not a showstopper.

For any longitudinal modelling, looking for the first default is equivalent to looking for the first start of a bad episode.

APRA’s concern that number of observations should be equal is trivially met because the table of default episodes is the common data source for either the PD modelling or the LGD modelling.

So does that solve everything? Not quite, just clears some problems so that we can face the more subtle ones standing in the shadows behind, AWML.

PS any corrections or updates on APRA or other regulatory opinions would be most welcome. 

Continuing the re-aging theme: a clear episodic definition of default is important as the basis for LGD modelling.

Whether one thinks of this issue in terms of the re-aging rule, or in terms of default episodes, is two sides of the same coin: re-aging is the rule that determines when the episode ends, and the default episode is the period of time from the initial triggering of the (point-in-time) default definition until that end point. I find it easier to talk in terms of the default episodes (a.k.a. “bad episodes”) because those are the indivisible modelling units.

One has to be able to clearly identify, enumerate and isolate the separate default episodes. If your default definition doesn’t produce this level of clarity, there will be some ugly problems in the LGD modelling phase.

The ideal is a fairly heavily “congealed” approach, that tends to produce few, long, well separated episodes rather than many, potentially short and frequent ones. The motivation is that each episode becomes a modelling unit for LGD. Common sense and business knowledge would suggest that the modelling of LGD issues would be more coherent with a more congealed approach - otherwise one might end up with a larger mart of bad episodes, many of them short and ending in no loss, and many of them correlated and to some extent duplicating each other. 

Also the re-aging rule should be invariant to time granularity - it wouldn’t accord with intuition if a change from monthly to weekly data (for example) could substantially change the number and extent of the default episodes. Hence a rule referring to a re-aging period in absolute time units (e.g. X months) is sensible.

These issues were identified and addressed in an APRA note some years ago AWML.  

Maybe time to bring up a subject that contains more difficulties than one would expect: re-aging. When an account has gone into default - at some point in time - how long can it be before the account can again be considered ‘good’, and under what circumstances.

Re-aging needs to be part and parcel of the default definition. The default definitions in typical context are really point-in-time default definitions, easy to relate to if one imagines an account running along longitudinally in a good status, and then at some first point in time triggering the default definition, whatever that is (something like 90DPD on an amount of at least $100).

But the difficulties are, what happens next? Suppose the customer makes some partial or full payment, such that in the next grain of time (e.g. the next month) their point-in-time status is not in default. Perhaps they are fully current (=zero DPD), or perhaps their partial payment has pulled them back to a 30DPD or 60DPD status. How does this affect modelling and other activities?

It does not affect application PD modelling, which is longitudinal from the start of the account (MOB=0), and the modelling target is “went bad ever within a certain OW”; as soon as any account first triggers default, it has established its target status as “went bad” and what happens beyond doesn’t matter for the PD model.

It’s a more complicated story for the LGD model AWML.

The first step is to recognise that besides the point-in-time aspect of default, there is also an episodic aspect, which is the interval of time until the account can be considered good again. Why is this episodic definition needed? Can’t we manage just with applying the point-in-time definition at each successive point in time? The problem is that, depending on the granularity of time (e.g. monthly), it would then be possible to have many separate bad episodes for an account within a fairly short time window such as a 12-month window. An account’s status might go something like GGBGGBBGGGB. This patchy pattern then causes headaches for any cross-sectional analyses, and particularly for the basis of the LGD modelling.

The common-sense feeling is that the above pattern represents one extended bad episode, not three separate bad points (months) separated by good points. In banking language there needs to be a re-aging rule that says the account can’t be considered G immediately that the point-in-time default conditions don’t hold. Instead, there is a new status which is “not in default but still in a re-aging period”.

My preferred terminology is to call this situation ”not bad but still in a bad episode”: and to use “E” as the code for any such time grains. Thus the above pattern would be GGBEEBBEEEB (if there is a re-aging rule that says an account must be good for several successive months before it can be fully G again.

 

 

 

Something that might be appreciated by the ozrisk community is a series of book reviews. Amazon reveals a couple of dozen books particularly relevant to credit risk analytics, and Basel. Would any of you readers out there like to offer a review or at least an opinion on books you have used?

I don’t currently have any books to hand, but recall favourable impressions of Lyn Thomas’s book and Naeem Siddiqi’s work in the shape of SAS training materials.

Credit Scoring and Its Applications 
by Lyn C. Thomas , David B. Edelman , Jonathan N. Crook

Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring 
by Naeem Siddiqi

 

It would be convenient if one could assume independence of the two main agencies: default and churn.

Although this is likely to be assumed in the interests of keeping things simple, it is unfortunately a doubtful assumption. There may well be a correlation against the bank’s interests in the form of better credit risks finding it easier (then poor credit risks) to re-finance elsewhere on favourable terms. Then, higher churn (earlier closure) may be correlated with lower PD. Full modelling of such a situation would require the joint modelling of default and churn.

Churn is not a ‘risk’ in the Basel meaning(s) but is referred to as such in this post in the sense that it is an uncertain event with unfavourable financial consequence for the bank: opportunity loss of revenue. 

So far the event we’ve been considering as the subject of analysis has been default, with occasional mention of churn.

Default, being progressive, lends itself to analysis of its stages, such as the events of going 30DPD or 60DPD. In addition to default hazard, one can analyse 30DPD hazard and 60DPD hazard. One advantage, especially for monitoring, is that these events occur slightly sooner. A statistical advantage is that these events are more numerous than default events. Given an intuition, or perhaps a model, of how 30DPD and 60DPD profiles relate to default profiles, they could be a useful analytical tool.

That segues into the roll rates discussion AWML.

The relationship however need not be straightforward. For example, there may be a spike of 30DPD or 60DPD at MOB=2 or 3, due to bugs or carelessness with the administration of new re-payment schedules. Most of those would not roll through to default.

One of the uses of a hazard curve is as a sanity check on your data and the technicalities of the default definition.

If you regularly find yourself analysing millions of records, you will know that every conceivable weird and wobbly data bug will happen, as well as a few that could never have been conceived of. Recalling a typical example from a loan portfolio: there were 10,000 accounts that opened and closed on the same day. This not surprisingly was some artefact of how the data systems coped with proposals or quotes (or something), but in reality these accounts were NTU and there was never any exposure to risk in their respect. But, in amongst a quarter of a million accounts, it would be possible to miss their presence and to do some default analytics - and even some model building - including these accounts as “closed good” accounts.

<digress for a war story> One of those accounts even managed to span two months! It appeared in two consecutive calendar month snapshot datasets - somehow allowed by time zone differences and the exact timing of month-end processing. A casual analysis might have assumed that this represented two months of exposure to risk - see also the comments about time grains <end digression>

But coming to the point of this post, I have found that estimating the default and churn hazard is an excellent “sanity check” on the data that will quickly show up most issues that you would want to know about. The issue mentioned above showed up as a massive spike in the churn hazard at MOB=1.

Other features that might be noticeable in churn hazard curves are peaks of churn around key account ages, such as at MOB=6 if the product has a teaser rate for the first 6 months. Multiples of 12 MOB may also occur in certain pay-annual-interest-in-advance type of products. These examples would be features that one might be on the lookout for, so finding them would be ”reassuring” feedback rather than “alerting” feedback.

Sanity checking is not only noticing what you didn’t expect, but also confirming what you did expect.

Features found in the default hazard curves may give important feedback about the way the default definition works. For example, with a 90DPD definition one may be expecting zero hazard for MOB=1,2,3 but there may in fact be genuine defaults in that zone triggered by supplementary business rules. However, what can happen is that the totality of rules in the default definition don’t quite produce the desired effect in practice. One example I recall caused the year-in-advance loans to reflect as default after only 30DPD. This showed up as a spike at 12,24,36 MOB and caused a review of the default definition as applied to this (relatively small) portion of the loan book. 

The data cleaning and sanity checking stage is helped by having some experience in similar analyses on similar products. But even in a completely new context, some data wobblies will produce such an unnatural effect on the hazard curve that you will be immediately alerted to follow up.

Hazard curves, being longitudinal, only help you examine default tendencies that relate to MOB. Cross-sectional effects, such as a sudden worsening in credit conditions in the economy, would be monitored in other ways.   

 

What shape does a typical default hazard curve have?

Note that this post is about default hazard - the churn hazard curve is a completely different matter.

Recall that the hazard at any particular MOB is indicating the instantaneous chance that a good account of that MOB age might go bad. So, where the curve is highest is showing the most dangerous age for accounts.

For most products, the hazard will be very close to zero for the first 3 or 4 months. This depends on the details of your default definition, but for example a simple 90DPD type of definition can’t produce a default in MOB 1,2 or 3. Some default definitions can be triggered even in those first MOBs via business rules about bankruptcy etc.

For some situations - like a new product to market - there can be an issue of ”application fraud” or “soft fraud” whereby new accounts come on book that perhaps never had an intention to make any repayments. Such a situation would show up as a spike in hazard around the 4-5 MOB.

Aside from application fraud, typical CC hazard curves tend to rise rapidly to a maximum by 9-12 MOB and then to decline slowly to stable plateau at maybe half the peak hazard level. Hazard doesn’t decline to zero because no matter how old an account is, there remains a residual chance that it can go into default.

In practice, one gets relatively little chance to study the hazard behaviour at long MOB - say, over 36 months - because that calls for data going back more than 3 years - rather a long time in credit markets.

On a technical point, a constant hazard corresponds to an exponential distribution for the waiting time until first default.  

It would be fairly easy to confuse the notions of hazard curve and probability density function, since (for default on a typical credit product) both start at zero and climb to a peak and then decline.

The more data used in the analysis, the smoother the curves will be, but whatever the case the cumulative density function (”emergence curve”) will always be much smoother than the hazard and pdf.

For the reasons in the above two paragraphs, I recommend presenting default analytical work via the cdf graph using a non-technical name like “emergence curve” or “default profile”. Please send in your preferred nomenclatures in case there is some consensus we could publicise. My slight preference is for “default profile” which is neutral and non-technical and easily accommodates “churn profile” or “cross-sell profile” when one analyses some other waiting time quantity such as these.

The above paragraph is about presenting and communicating the results; but for analytical insight, I recommend that the analyst should be looking at the hazard curves as well - for discussion next time.

Continuing part 1, perhaps we should note that the subject ‘hazard’ here is in its very specific statistical sense, and is not the moral hazard issue which has been a serious subject of debate in the context of taxpayer rescues of financial institutions.

Mathematically, the hazard function is defined by h(x) = f(x) / (1 - F(x) )

Although its definition has a continuous context, the time granularity of our data imposes a discreteness on the hazard: for example, if our data is monthly then the hazards we calculate will be “1-month” hazards.

The meaning of the formula is that the hazard is a conditional probability : the probability of default in the time grain immediately following time=x , given that the account hasn’t defaulted yet (i.e. anywhere in the time interval from 0 up to x). Thus, h(12) would be the probability that an account that has been good for its first 12 MOB, might go bad in MOB=13.

1 - F(x) is also called S(x) and given the name survival function, i.e. the probability of not defaulting before time x.

Hazard is not the same thing as the probability distribution. I tend to illustrate this point with a familiar example from human mortality. What is the probability that a person would die in their 100th year? The likely interpretation of this is to visualise a distribution of all the ages 0-125 and a distribution curve with a peak somewhere in grandparent zone and tailing off sharply such that the chance a person dies during their 100th year would be very low - less than 1%. This is the chance that a newly born person might die in their 100th year. By contrast, the one-year hazard at age 99 is rather high - over 30%. This is a chance that someone who has survived to age 99 dies during the next year (their 100th year).

Upcoming posts will discuss uses and interpretations of all these items in the context of default (and churn) analytics.

Introducing some technical terms that arise in default analytics.

This nomenclature comes from the field of statistics called survival analysis, which is well established and readily found in text books or wiki entries etc. If you don’t mind reading maths you will find better guidance there than in this post. The name ’survival’ arose because the subject matter was/is mostly mortality or onset/re-occurrence of disease in populations or test cohorts. This is not too far from the study of the onset of default, so happily (?if this is an appropriate word) a great deal of well established statistical theory and practice is available for the study of default. This applies mainly to PD rather than LGD modelling.

Some of these terms and their equivalents in banking terminology are covered below.

Survival analysis is an essentially longitudinal activity, although the data it is based on will often be cross-sectional in structure.

The key variable x is the waiting time until default. This means the MOB of the first default. This variable x will have a distribution (probability density function) f(x), from which can be derived (by integration) the cumulative density function F(x). The pdf is not intuitive for non-technical audiences and I recommend only showing the cdf which is a monotonic rising curve that is easy to interpret: F(24), for example, would show the probability of going bad on or before MOB=24. This can also be interpreted as “what proportion of this population will have gone bad within 2 years”.

Note that PD alwyas needs to be related to some time window, so “PD” alone is a vague concept and one needs to be specifying something like “probability of going bad within the first 24 MOB” (for a longitudinal PD) or “probability of going bad within the next 12 calendar months” (for a cross-sectional PD).

I avoid using the stats terminology of pdf or cdf because they don’t sound intuitive, and particularly the word “cumulative” can mean so many different things in various contexts. Some more business-intuitive term is preferable. Some colleagues have called the cdf the “emergence curve” which is quite descriptive as it makes one think of the bads “emerging” with the passing of time, as the curve climbs up. An emergence curve is visually comfortable to absorb (being an integral, it will be quite smoothe) and shows at a glance the values of the 12-month PD or 24-month PD or any other x-month PD. Another business-friendly term is “default profile”, which sits comfortably with ”churn profile” for the cdf of waiting time until closed-good.

But none of these is the hazard curve > continued next time…

Harking back to the issue of time granularity, and anticipating some default analytic calculations yet to come, let’s get back to small details and look inside the smallest data unit i.e. the time grain.

For typical retail products this granularity would be a month, which is the example carried forward in this post.

Monthly data warehoused for analysis purposes would typically be on a calendar month basis. An alternative (for CC?) might be data on a monthly payment cycle basis.

Even though a grain is ’small’ there is still latitude for vagueness because data recorded against a month may relate in several ways to the time axis within that month:

  • point in time at the beginning of the month
  • point in time in the middle of the month
  • point in time at the end of the month
  • the whole time window comprising that month

For most cross-sectional studies, the time axis is calendar date and the ’status’ variables like account balance would usually relate to the end of the month, as that would be their most up-to-date value. Other variables that summarise or count transactions (for example) would relate to the whole time window. Certain calculated values (like hazards AWML) may relate to the mid-point of the month.

In cross-sectional studies there is no difficulty in finding the point-in-time variables as at the beginning of a month, because these will be the (end-of-month) values from the previous month’s record - i.e. closing balance for Feb = opening balance for March etc.

If numeric date values are used as the key on a data table, they would most logically perhaps be set equal to the last day of each month, which is unfortunately a bit messy and harder (for a human) to remember than the obvious choice of the 1st of each month.

A non-numeric-date month key like “200805″ avoids specifying any particular part of the month, and leaves it up to the user to figure the time relationships from the metadata. A slight disadvantage of such a key is that date arithmetic (figuring out the difference between two dates) becomes non-trivial.

Longitudinal studies would typically rely on performance data for each individual account that is stored cross-sectionally i.e. by calendar month. This introduces a slight wrinkle because the account opening date can be anywhere within a month, whereas the performance data is only available at month ends. So the first performance measurement point an account reaches may come up in only 1-2 days (if the account opened on the 29-30th of a month) or alternatively may represent up to 30 days of exposure-to-risk. Longitudinal studies have MOB rather than calendar date as their time axis, and this means that the MOB=1 analysis really represents on average about 0.5 months of exposure, and likewise all subsequent MOB points really represent on average half a month less. (This example assumes your MOB counting convention starts at 1 rather than from 0.) But in any case, it would be most representative to start at 0.5 and count upwards as 1.5, 2.5, etc.

The above may sound picky, but it can quite easily come about that one analyst’s 12-month OW is another analyst’s 13-month OW due to choices at this level, and this could make a significant change to risk measures.

Further intra-grain issues will be met when calculating hazards. This basically means dividing the number of defaults (at a certain MOB) by the number of accounts that were exposed-to-risk of default. In a longitudinal study the number of accounts exposed-to-risk will always be declining, as accounts close good or go into default. Good practice would therefore be to find the average number (month_start + month_end)/2 of exposed-to-risk accounts during that month for use in the denominator of the hazard.

Actuaries are good at these deliberations because of the care and expertise put into estimation of mortality statistics. If you can’t find a tame actuary, the recommended approach is large diagrams on whiteboards and a bottle of headache tablets.  

 

 

In the context of retail applicants, application scorecards etc., is there a well defined meaning for “risk appetite”?

My feeling is that it could be referring to either the marginal or the average situation, depending which hat is worn.

The risk management function acts as the gatekeeper, drawing the line on acceptable levels of risk - risk appetite - by setting cut-offs. This is wearing a marginal hat: the cutoff is the margin. For standard retail products it may conveniently be set in terms of PD, although more completely it would be an expected loss calculation. To caricature this risk manager, he doesn’t mind how profitable the applicants are, as long as they are just above the cut-off.

OTOH the business will be looking at the overall profitability of the product/campaign/portfolio whatever. The business manager is more likely to phrase his risk appetite targets in terms of average PD. To caricature the business manager, he doesn’t mind if a number of poor decisions are made around the margins as long as the venture as a whole makes a good return.

So is the setting of risk appetite about trying to decline applicants below a certain marginal PD, or is it about trying to achieve a certain average PD for the accepts? 

Complicating the discussion is the role played by volume. Higher cut-offs naturally mean lower volumes of successful applicants and this frustrates the assumptions on the business case.

  

The previous post discussed setting of scorecard cut-off by the criterion of marginal profit, which translates to marginal PD. But what about average profitability (or average PD) as a measure for a portfolio or as a criterion for arriving at a cut-off?

The general issue of marginal vs average cost, revenue & profitability is a familiar one in business economics and won’t be revisited here.

However, the particular feature of the debate that is relevant to setting a scorecard cut-off is the set of assumptions made about the TTD (through-the-door) population.

A cut-off is by its nature a marginal issue. If the cut-off is at (say) odds of 15:1 then we know that the only accepts will be those with PD of 1/16 or better. But what will the average PD of all the accepts be? That will depend on the distribution of the TTD - proportionately how many applicants there are in each risk band. Obviously the average PD of the accepts will be better than 1/16, but how much better will depend on a calculation based on the shape of the TTD distribution. A typical assumption is that future TTD will be like past TTD for similar products, but this assumption can sometimes turn out to be quite wrong. The drivers of TTD are a complex mix of marketing, the competitiveness of the product, actions taken by competitors, and the economic climate. In plain English, you might have an excellent scorecard, but if lousy applicants walk through the door, it will be hard to do good business.

Nevertheless, it is natural for the business to ask questions about average profitability (equivalently, average PD) because that characterises the overall returns on the portfolio. Also note that besides the shape of the TTD distribution, a big parameter assumption is the volume of TTD. Volumes are important for diluting the ‘fixed cost’ aspects of the business costings.

So the business will probably want to target a certain average PD. The cut-off decision, though, is strictly a marginal one, which has only an indirect effect on the average, mediated by the TTD assumptions. Modellers should communicate these levels of uncertainty to the business as it is all too easy for a computer printout to look infallible.

Odds provide a useful frame for considering that important business question: where to set the score cut-off.

The basic business logic is to set the cut-off at the score where marginal profitability equals zero - i.e. if you moved the cut-off any lower you would be losing money on each additional applicant so approved, whereas if you set the cut-off higher you would be leaving money on the table. Easy to say, but not so easy to do, because the concept of marginal costs is a movable feast depending on accounting treatments and assumptions about fixed and variable costs, as well as the context within the current business strategy. 

But anyway, the odds allow one to frame the question in an easy-to-grasp way: how many goods does it take to offset one bad? If the answer is 15, it means that your tipping point is at 15:1 odds, which can be converted to the score as per previous post. This would then be the cut-off. This post assumes a simple automatic accept/decline score, ignoring ‘refer’ bands and contested decisions and overrides etc. 

To arrive at “15″ would involve a full revenue/cost modelling through the product cycle (lifetime customer value?), for 15 goods versus 1 bad. Naturally the “cost” that dominates here is the credit loss of principal (LGD) for the default.

Don’t pay any attention to the example value “15″ used above - it’s going to make a lot of difference what product is involved, secured vs unsecured, limits, etc.  

The credit risk world likes to work with ‘odds’ and related quantities so these are covered today.

You could just do everything in terms of probability, i.e. PD, which is unambiguous. PD lies in [0,1] and a small number (like 0.002) is a better customer than a bigger number (like 0.013). In typical modelling situations (in Australia, in the good times..), a lot of PDs would have one or two or even three leading zeroes and these numbers are not handy for transcription or to quickly convey which zones they lie in.

It goes without saying that it often more palatable to format a PD as a percentage, e.g. PD = 0.013 as PD = 1.3%.

‘Odds’ have a special status because they are intimately linked with logistic regression, the main PD-modelling statistical tool. Odds can be worked out from the PD, and vice versa, as follows:

  • odds = 1/PD - 1
  • PD    = 1/(1 + odds)

For example, odds = 8 means exactly the same thing as PD = 1/9 = 0.1111.. 

Odds are generally taken to be the Good:Bad odds; thus a bigger number for odds is a better situation. I have seen analysts using Odds the other way up i.e. the Bad:Good odds. You can come out alive but it will confuse your colleagues; +/- changes of sign will cascade through and graphs will tilt the opposite way.

One step closer to the logistic zone is to transform to “log_odds”.

  • log_odds = ln(odds)
  • odds        = exp(log_odds)

‘ln’ means natural logs, i.e. to the base ‘e’. Actually, mathematicians always mean natural logs when they say log and as a matter of pride would never mention the base, or contemplate a base other than ‘e’ unless it was a neat way to summarise a problem that had structure particular to integral bases. Ambiguity can arise: computer systems that are tech-oriented, like SAS or MATLAB, assume ‘log’ means ln, whereas those that are business-oriented, like MS/Excel, assume that ‘log’ means log_to_base_10. It also doesn’t help that ‘ln’ is not comfortable in speech.

By ‘log’ I always mean natural log, and I use log10 or log2 to mean logs to base 10 or 2. For the meantime, the terminology ‘log_odds’ will be used, which is easy in speech, but if anyone can suggest better nomenclature they are welcome to put it forward.

If we’ve taken the right choices so far, a bigger number for log_odds is a better situation. Note that log_odds can be negative (when odds < 1 which is when PD > 0.5).

To make the numbers more convenient to handle, it is common practice to convert the log_odds to a ’score’ on a user-friendly scale that wouldn’t involve negatives or decimal places. For the first time in this chain of transformation, arbitrary scaling constants are involved in this choice: one for location and one for scale (spread). A typical approach is illustrated below:

  • for location: bang a stake in the ground at the point that will represent odds of 1 (== log_odds of zero == PD of 0.5): so, for example, choose a score of 500 to represent this point (which BTW would be a lousy customer)
  • for scale: this is normally done by specifying how many points it takes to double the odds (PDO). A comfortable choice would be PDO=20, which says that a score of 520 <=> odds=2, 540 <=> odds=4, 560 <=> odds=8 etc.

Because log_odds is a logarithmic scale, the above choices work out and amount to a linear transformation of log_odds to score. The two scaling parameters, and hence the transformations from log_odds to score and back, will depend on these fairly arbitrary choices.

PDO=20 gives a nice granularity to the scores, which will mostly land in the 500-800 zone and you won’t feel the need to use decimal points i.e. whole-number scores suffice. As long as PDO is chosen to be positive, it will still be the case that a bigger score is a better situation.   

All the above transformations are absolute arithmetic ones that always apply, irrespective of context such as outcome window, default definition, calibration, closed goods in/out, etc. If you find you disagree with someone via these calcs, it means you started from different contexts and therein lies the entire explanation for your disagreement.

Following the comments on indeterminate, it is timely to introduce some shorthand notation that will help in discussions that follow.

The issue is the point-in-time default definition. Your default definition should in the first instance produce a decision at every point in time as to whether the account is in default or not. The set of possible points in time is determined by the time granularity of your data systems. A typical situation would be monthly data for CC with default flagged for >=90DPD assuming the outstanding balance exceeds some materiality parameter(s).

But why point-in-time default definition? Because this is not the final default story; the re-ageing logic still needs to be superimposed. Re-ageing involves an extension of the point-in-time default definition to the concept of a default episode, which has temporal extent i.e. it is a time window having a start date and an end date. Today’s post, however, covers only the point-in-time default issue, and the qualifier “point-in-time” will be left out to avoid clutter.

Default history for any particular account can be summarised by the string of consecutive default statuses: for example GGGGGGIIIBBG shows the account was ’good’ for the first 6 months, ‘indeterminate’ for the following three months and then ’bad’ for two months but then ’good’ again in the 12th month. These 12 months could be the first 12 months since the account opened, if you are doing longitudinal analysis, or it could be the 12 months of a cross sectional analysis, in which case it might represent something like MOB 33-44.

The definition details of ‘bad’ and ‘good’ will be particular to each institution and product, but status codes that I have found useful include:

  • B = Bad, i.e. point-in-time in default
  • I = Indeterminate. Optional status, not all situations require that one should need to distinguish these from G and B i.e. ask the question: how does ‘I’ differ from ‘G’?
  • R = in recoveries
  • C = in collections
  • W = has been written off
  • G = Good, i.e. not bad nor any other status with a higher precedence
  • U = Undrawn. This can apply to loan accounts that have been set-up and are open on the books, but where the capital has not been drawn down yet. For HLs there can be a few months delay if there are hold-ups in transfer. In the meantime, they appear as accounts with zero balance outstanding. This only applies when this situation happens at the beginning of an account’s history, i.e. not for zero balance accounts that can occur later. Undrawn does not apply to some products such as CC because they are only activated when the first transaction is made.
  • D = Dormant. It may be useful to identify accounts that appear to be dormant, i.e. have returned to a zero balance and there is no customer initiated activity for a long time. Because of Basel treatment, but also for commercial reasons, the bank may want to identify these and do something about them. 

In the PD model building world, “indeterminate” seems to have more than one meaning. If any readers feel they could give a balanced view of common usage in Australia (or elsewhere), please do so and this blog will record it and adopt it.

Meaning #1: a status of an account at a point in time which is not “in default” but is some way down the track to being considered “in default”. For example, if the CC default definition requires 90DPD, accounts might be called “indeterminate” if they are 60DPD - or whatever other “not completely good” your default definition might permit.

Meaning #1.1: a meaning derived from #1 can then be evolved for the status of an account across a time window such as an OW for modelling purposes. “Indeterminate” might now mean “ever went indeterminate during the OW without ever going intodefault during the OW”. However, one also sees composite definitions of Indeterminate across an OW such as “ever went 60DPD or went 30DPD on two occasions”.

The idea behind the above definitions of Indeterminate is that the account, whilst known not to be “Bad”, is also known not to be completely “Good”. IIUC these above meanings are the most common in the banking industry but your corrections will be tallied and recorded here.

It will also be handy to adopt the likewise common terminology “Bad” for “in default”, along with “Good” and “Indeterminate” and their abbreviations “B,G,I”. These are used in textbooks for technical formulas like odds ratios and information values AWML. In today’s post this usage remains casual and by “G” might be meant “not B” or perhaps in another context “not I nor B”.

Meaning #1.1.1: A special situation related to #1.1 deserves noting. As it stands, #1.1 means that the account was known not to have gone bad during the OW. It isn’t a situation of doubt as to the outcome (contrary to what the English word “indeterminate” connotes). However, one particular case does involve reasonable doubt: when an account has reached the penultimate stage at the end of the OW - for example, a CC has gone 60DPD in the last month of the OW where the default definition is 90DPD. Unlike other Indeterminates that may have gone 60DPD and then rehabilitated, with this ”horizoned” account one doesn’t know which of the categories G, I or B it should really belong to. OK, it belongs to “I”, but not in the same sense as an account that reached 60DPD and then rehabilitated during the OW.    

Meaning #2: More like the natural English usage, this meaning covers situations where one isn’t sure about assigning “G” or “B”. For example, consider application modelling with an OW of 24 months:

  • An account closes good after only 2 MOB. Is this a “Good” account? Not in the same sense as one that was exposed to risk for the full 24 months. One might call it indeterminate in the sense that one doesn’t know whether it would have been G or B if it had hung around for 24 months. I prefer the more specific term “closed good” for this situation. 
  • (Similar to above) Only the first few MOB are known because the account opened recently. I prefer the more specific terms “out of sample” or “out of time” for these situations.
  • For whatever other reasons, such as incomplete data, one doesn’t know the exact outcome of some account at some point in time or across some time window.

This post is only about the nomenclature, and is not even definitive on that point! As to what you do or don’t use “indeterminate” for in the modelling world, that subject is too long for this week.

“Churn” is used here to refer to accounts closing ahead of schedule for reasons not related to default. Perhaps this is not ideal terminology - I tend to use it because it is short and specific - but other suggestions for common usage would be welcome.

One variation encountered is “closed good”, which will be used later in discussions of “closed goods in” versus “closed goods out” as bases of analysis. This nomenclature is more comfortable than “churneds in/out” would be.

Meaning varies amongst products. For CC, there is no fixed product schedule and churn would normally have the marketing meaning of customers taking their business elsewhere - e.g. “balance transfer” to another CC issuer. This has been a particular concern with aggressive marketing by competitors offering low or zero interest for an introductory period.

For term loans with a fixed principal & interest amortisation schedule, churn could come about from re-financing of a HL or PL with another lender. A similar issue is the early paying down of the loan balance on products that allow this. “Churn” is not a descriptive word for this behaviour - the account may remain open and active but have a much lower loan balance than the bank was expecting. Lower funds at risk means lower earnings for the bank, affecting the profitability model for the product cycle. What would be of particular concern, and likely in practice, would be the correlation between early payment and low PD, i.e. the lowest risk customers reducing in proportion of funds at risk.

As regards default analytics and PD models, churn is a countervailing force to default. If a portfolio has high churn, it will make the default experience look better (if analysed on a “closed goods in” basis AWML). To make a clear analysis of a portfolio it is better to analyse the effects of churn and default separately from each other. For profitability studies, each plays a role.

This post is as close as I get to a “rant”.

Some parts of Basel formulas have unnecessary complexity, which involves not just inefficiency but also potential pitfalls.

The specific example is the formula for asset correlation which appears in Basel paragraph [283] and which includes a term 0.12 x (1-EXP(-50 x PD)) / (1-EXP(-50)). There are similar terms elsewhere in Basel formulas, but for focus let’s look at just this case. Surely, this term should be given as simply 0.12 x (1-EXP(-50 x PD)).

Presumably the casters of the formula felt a need to normalise the term to handle PD its full range of [0,1]. This may satisfy academic neatness but, I maintain below, at significant risk of causing error or wasted resource.

The materiality of the normalising denominator is as close to nil as any banker could imagine. The term EXP(-50) evaluates as 2 x 10**-22 which means 0.0000000000000000000002. When one subtracts this from 1 it makes no difference and you still end up with 1. In the old days, this used to cause a computer error known as underflow, whereby the floating point arithmetic processors rearrangeing numbers for calculation would discover that during this process one of the quantities had disappeared, which although not automatically a fatal error, would probably be something you wanted to know about. In the case of the above Basel term it’s not an error but it is frivolous formulaic complexity and in practical terms the denominator equals 1 and the term should be simplified to 0.12 x (1-EXP(-50 x PD)) .

OTOH to make a pedantic point if Basel wants the answer to come out exactly the same, they could change the multiplier from 0.12 to 0.1200000000000000000000024  . Or, add a sentence in the doc saying that, whilst a normalising denominator was academically desirable, it was omitted on materiality grounds.

Whatever, the materiality of the denominator term is less than a thousandth of a cent even when multiplied against a capital figure of $100billion.

What makes this a non-trivial rant is that there is significant cost to extra complexity. In my experience, only the most adept of the technical team would be able to transcribe such a formula without error. Others not directly familiar with the context, such as managers or computer programmers, are prone to transcription errors. But, most perversely, if I were to see such a formula presented by an intermediary - say, for example, as part of a computer program - I would assume strongly that a transcription error had been made because the logic of the formula fails the “sanity test”.

Much as I admire maths, a Basel implementation is fraught with thousands of small hurdles (OK and big ones), and we owe it to the business community to adopt pragmatic standards.    

An important basic concept in default analytics is “exposed to risk” by which we mean risk of going into default unless otherwise specified (one might otherwise be studying risk/propensity of churn, cross-sell etc.)

Abbreviated ETR in this note but AFAIK this isn’t common so won’t be added to the abbreviations list.

Often probabilities are estimated by dividing the number of events that did happen by the number of events that could have happened, and ETR is basically that italicised bit i.e. the denominator of the fraction. The ‘hazards’ and risk PDs of default analytics are just special cases of this situation.

A typical setting is when building an Application PD model: the modelling mart will have some number of accounts that started out at open date (MOB=0), and a certain target OW of (say) 24 months; at the simplest level all the accounts are ETR of going into default within the OW.

However, if account #1 opened only 18 months ago and is still not in default, then although it has been ETR for 18 months, it hasn’t been ETR for 24 months and is not quite the same unit of modelling information as an older account #2 that did survive 24 months. Account #1 has reached the horizon and is said to have been censored. Model builders wouldn’t normally be dealing with these out-of-time (OOT) cases because, knowing that 24 months OW was the target, they would have chosen a sample window (SW) that was at least 24 months before the horizon in its entirety.

But what about account #3 that opened 30 months ago but closed good, i.e. without ever going into default, at MOB=18? Account #3, like account #1, was only ETR for 18 months and is not quite like account #2. There was no way it could have contributed a default event for MOB=19-24 as it was not ETR for 19-24.

That segues into the closed good in vs closed good out discussion AWML but meanwhile opinions and contributions would be welcome from those who have views on the issues. People who study mortality risk have similar issues whereby, for example, they study all individuals for a certain time window. People may emigrate and so be ETR for only a portion of the TW, because one can’t reliably trace their subsequent mortality (survive or die?) in another country. But, you don’t assume they survive (or die); rather you use their information appropriately with respect to their lesser overall ETR.

Because application modelling is longitudinal, the focus is on the first default, so ETR is mostly a matter of the account still being open and not ever having previously been in default. For behavioural modelling which is essentially cross-sectional, there is the additional issue of whether an account is ETR of fresh default or whether it is still included in some previous default episode - link to the re-ageing issue AWML.

There may be subleties in the ETR concept, such as deceased account holders, dormant accounts - are these ETR? Or in a product like reverse mortgage, is there a default risk at all?

As mentioned in this thread, the predictive power of Application PDs decay with time. Thinking in a cross-sectional Basel mode, we look at all the accounts in a portfolio as of this month. If an App PD is available for an account, it will have more predictive power if it is a recent one (i.e. account has low MOB) than if it is an old one (i.e. account has high MOB).

The connection with MOB is not absolute as, for certain products, there can be a re-assessment of application information at some later time in the account history such as an application for a limit increase. i.e. the real point is “how old is the application information and the assessment of the PD”.

The reasons for decay of predictive power merely reflect the fact that older information is often less relevant than recent information.

Some years ago for NNB I studied the decay of the predictive power (measured by Gini) by backtesting on many years of data for various portfolios. The essential output was a graph showing the profile of Gini plotted against MOB. IIRC this showed Gini decaying fairly gently from its maximum in the early months towards lower levels, but still retaining some predictive use even after 3 years. Exact patterns varied between products.

NNB had also developed behavioural prediction models, so I did the same exercise for those Beh PDs. These models concentrated on pure behavioural predictors (dynamic information about recent account performance) rather than the static “application” type predictors. Naturally, with Beh PDs the trend is opposite, in that they start with low predictive power at MOB=1 and ramp up as the behavioural information accumulates. IIRC the ramp up was fast, with the models reaching close to full power within 6-9 MOB. Also, this full power was substantially higher than the full power of the App PDS.

Hence the natural “transition” idea to make the best use of all the information for Basel purposes was implemented as follows:

  • calibrate the App PD for a 12-month OW - because this is the Basel context
  • Beh PDs are built with 12-month OW and so need no calibration
  • form the Basel transition PD as a weighted average of the App PD and the Beh PD
  • i.e. Basel PD = w * App PD + ( 1 - w ) * Beh PD
  • figure the weight w by consulting the previously determined “App decay” and “Beh ramp up” profiles
  • The details of figuring w are not important here but naturally starts out being close to w=1 for accounts that have just opened (MOB=1) and drops fairly rapidly with equal weight (w=0.5) reached after only a few MOB and most of the weight (w=0.2) passing to the Beh PD by 9-12 MOB.
  • One wouldn’t need to be so scientific, and a simple straight-line schedule transitioning from App to Beh over a fixed number of months would be good enough for most purposes.

Technical readers will note that a problem mentioned previously remains: that the 12-month OW for the App PD is longitudinal rather than cross-sectional. Thus it models default for OW=[1,12]MOB rather than, say, OW=[9,20]MOB. However, this is of diminishing importance because by the time it would make a major difference, say OW=[25,36]MOB, the weight will have mostly transferred away from the App PD and onto the Beh PD.

One clean aspect of this weighted approach to transitioning is that validation of the final PD is a consequence of validating its components App PD and Beh PD. As long as those two are accurate (unbiased), the weighted PD is mathematically sure to also be accurate. In practice, this theoretical nicety may not pan out so easily because the App PDs of all vintages would need to all be accurate.

Alternative approaches to the transition issue are discussed below.

One suggestion I have heard but don’t like is to include the App PD as a predictor into the build of the Beh model. This doesn’t have the desired effect because merely including App PD as a main effect doesn’t allow the mechanics of regression to downweight the App PD if it is an old one and vive versa if it is a young one. The regression doesn’t know about MOB and one can’t fix this by including MOB as another main effect. (If you like getting technical, you might get close by designing appropriate interaction effects). 

Rather, a simple and effective approach in this direction (which can be found in an early post by coldies) would be to segment the Beh PD model build: have one model for accounts with MOB<6 (say), which would include the App PD as a main effect, and another model for older accounts that ignored the App PD.

Also note that some application predictors don’t decay with time e.g. Gender; Secured vs Unsecured flag. Any such predictors could be used as main effects in the Beh model without problem.

The App PD thread noted that App models need not have been built on the 12-month OW which is the Basel platform.  

Picking any sample of accounts and following them longitudinally from their open date, the number of defaults naturally builds up cumulatively as one progresses along the MOB axis. Thus default rate @24MOB will be a bigger number than default rate @12MOB. The graph of the cumulative emergence of defaults against MOB is a particularly useful analytical tool that visually characterises the default profile of this sample (which may be a portfolio, cohort, segment or whatever). There are subtleties AWML to do with treatment of accounts that churn.

One use of this ‘emergence’ graph is to form a rough idea of the relativities between default rates at different MOB, for example cumulative defaults @24MOB would not typically be double the figure @12MOB - could be more, or less, depending on the product.

Illustrating a slightly more scientific approach: modellers may have already built a model predicting a target of “bad @24MOB” and may wish to calibrate this same model to alternatively predict “bad @12MOB”. As long as the original modelling mart is still available, it should not be too difficult to build an additional column (field) for the “bad 12MOB” flag, which can then be used as the independent variable in a regression against the original model’s score. This would provide a calibration of the model to a 12MOB basis without going to the trouble of building a whole new model for this different default target. Implicitly the hope is that the drivers (predictors) of default by 12MOB are the same as those for default by 24MOB. One can imagine objections to this assumption: it might be that certain variables are better at predicting early defaults.

But in any case, as mentioned in the earlier post, calibrating to 12MOB is still a longitudinal concept which does not closely match the Basel need to predict default in the next 12 calendar months. Hence the incorporation of Application PDs for Basel purposes needs to be more subtle AWML.

A related issue is that the predictive power of Application PDs decays AWML.

Basel systems are likely to make use of the application PD of each account, but this is not a comfortable fit because the Basel requirements are cross sectional whereas the App PD is longitudinal and not generally related to the same outcome window (OW).

The App PD is primarily for the purpose of decisioning: does the bank want to accept the application (made by some individual for some retail credit product).

At the front end the PD is usually presented as a score - merely a mathematical transformation of PD that is easier for general staff to handle - PDs can be a bit painful to look at because of decimal points, counting the leading zeroes, their inherent skewness, and potential confusion between decimal and percentage formats. Scores, by contrast, are chosen to span comfortable three-digit ranges, and are arranged such that high score = good applicant ( = low PD). This is done by linear transformation of the log(odds) which, if you are interested in it, you probably know all about.

So, in its simplest form, the computer knows that the cut-off score is (say) 567 and the applicant is declined if their score works out to be below this. Otherwise, referral, accept, etc.

This decisioning purpose is different from the Basel purpose of PDs. Originally, there would have been a business case for this product, modelling profitability of this line of business based on revenues, costs, and credit losses. This profitability model would ideally analyse a full product life cycle, but depending on the product, life cycles can be variable due to early closure, early repayment, refinancing and the like, which I will call “churn” below (although suggestions for a better term are welcome). A key input would be the default profile to be expected - how many defaults and at what stage (longitudinal MOB) in the account’s life cycle. The estimation of default profiles and churn profiles is not difficult given sufficient amounts of relevant data and the assumption that the future will be like the past (!?).

A common simplistic approach for building an application model is to settle on some fixed OW - such as 24 months - and do the modelling on the basis of predicting this “bad rate @24 months”.

There is no reason for such an OW to equal the Basel OW of 12 months. Its purpose is to help the business make the best decisions on new applications. Presumably, for some products such as HLs, this would require a profitability model that looked well beyond the first 12 MOB of the account. In my experience, defaults on HLs arise more in later years. If this were a technical discussion, we would now pause to sketch hazard graphs AWML.

So, quite likely, a bank’s App PDs are built on a different OW than 12 months, and are therefore not immediately commensurate with Basel needs. Furthermore, App PDs are longitudinal not cross-sectional, so even if it were a 12-month OW, it would be referring to the first 12 MOB for that account, which wouldn’t be the coming 12 months unless that account opened this month. A typical account this month may be 29 MOB, so for Basel purposes one would want to know the conditional probability, given that the account is not in default at MOB=29, that it would go into default during MOB=30 through 41 inclusive. Whilst this calculation could be done using the hazard curve, I don’t think many analysts go to this level of detail.

Rather, there are several simpler potential ways that the App PD can be incorporated for Basel purposes. Follow-ups to come but see also an earlier ozrisk discussion.  

Extending the previous post, the interesting problems occur when there is interplay between the two dimensions of the default definition, namely time (DPD) and amount (material credit obligation in dollars). This discussion segues into the “re-aging” issue which is complex AWML.

Many default cases will thankfully (?) be simple: as long as it is one-way traffic, the analytical aspects are not controversial. The simplest case is:

  • an account goes past due on some material credit obligation at a certain date, and the DPD counter starts ticking.
  • the account holder never reduces the original credit obligation below the materiality threshold
  • DPD counts steadily upwards until it triggers the default definition at 90 or 120 (or whatever) DPD

Typical cases are that an account holder stops making any kind of payment on some standard retail product (CC, PL) that requires a payment each month. The account then rolls through the successive default categories 30,60,90,..DPD and into collections, recoveries, write-off.

The potentially difficult cases are those involving interplay between time and amount, i.e. the amount of the credit obligation varies across time, including the possibility that it dips in and out of materiality thresholds. These would be cases where the account holder makes some partial repayments. Accounting issues then arise, with partial payments being applied firstly towards reducing the oldest outstanding amounts. For example, at 75DPD there might be a payment that was sufficient to settle the oldest amount outstanding, but not quite sufficient to settle the 45-day old amount, although enough to reduce that 45-day amount to below the materiality threshold. Where does that leave the DPD counter and the default definition? 

Can readers with accounting knowledge confirm that it is not difficult in principle to track the outstandings in 30-day buckets (i.e. how much is due, 30DPD, 60DPD, 90DPD, .. etc.) and to adjust these as partial payments come in. The problem I experienced as a modeller, though, was that in the historic summary data these bucketings of the outstanding obligation are not always separately available - only the total amount outstanding - and so it was not possible to re-create the intended default definition.

For CC, due to the varying monthly payment requirements, these buckets are usually available, but one has to accept them as given and it would not be possible (for example) to re-work them with a different materiality.

For products with a fixed monthly instalment and a standard amortisation schedule, one approach taken was to convert the amount outstanding - i.e. the amount by which the balance exceeds its amortisation schedule - into an equivalent number of months (or days) by dividing by the monthly instalment. For example, an account that is $2500 over, with a monthly installment of $1000, is 2.5NMA ~=75DPD. Note that under this definition it’s not impossible for this situation to come about in less than 75 days, i.e. an account might go from 0DPD to 75DPD in one day. By contrast, an account might pay nothing at all for a year without going past due, if it had previously paid down the balance to be well below the schedule. These comments naturally depend on the product rules and practices in place.

A general issue in all this is the difference between a definition of “default at a particular point in time”, and the longitudinal consideration of “default episodes” that have a time dimension, i.e. a certain account went into default at time x and stayed in default until time y. This does not have to be the same as saying that the account was in default at every point in time between x and y. This introduces the re-aging issue AWML.

Beneath all this, the time granularity of the data is playing a big part, and the above issues are addressed mainly on the assumption of monthly granularity. Daily granularity would accentuate the issues and probably produce some new tricky issues. With quarterly or longer granularity, many of the issues would disappear or be trite.  

The definition of default is one of those things that sounds easy at a high level, but can get fuzzy when you get down to the details - like writing a computer program to build the default information.

Representing the high level we have Basel[452] quoted in part: “… The obligor is past due more than 90 days on any material credit obligation … Overdrafts will be considered as being past due once the customer has breached an advised limit…”

So default has a time dimension as well as an amount dimension. The “amount” is basically dollars but may perhaps be expressed as a percentage (of a dollar limit). Materiality considerations apply to the amount - credit obligations below some threshold are not material and would not trigger Basel-default no matter how long past due.

How should this materiality threshold be chosen? Need it have any relation to the level at which the bank would write off an account as uneconomic to pursue collection activities on? Presumably, no; one would choose a fairly low and stable default materiality for Basel-default purposes of, say, $100, to be as inclusive as possible and avert any argument that true defaults were being buried.

There should be no risk capital cost of having an inclusive default definition because, although it would lead to higher PDs than a definition with a $250 threshold, it should lead to corresponding lower EAD and LGD. (Although this point sounds right in principle, there might be technical objections to it depending on the maths of the formulas and how they work together.)

OTOH, the write-off level for collections is a movable feast depending on many factors - such as product type, collections technology and resources, stages of the collection/recovery process - that could not provide a stable baseline for a definition. Clearly also, the bank would want to recognise that category of defaults that was material enough to be considered a default but not material enough to go through the full collection/recovery processes.

The downside to making a default definition too inclusive - such that it flagged cases that were not “..on any material credit obligation ..” - is the dilution of the estimating and predicting power of the models for the risk components.

Basel[452] does not mention a materiality for the over-limit situation, although common sense would suggest that a materiality might apply. For example, if a customer with an $10000 overdraft reaches a balance of -$10050, must this start the DPD counter ticking? At NNB I found that modest changes in materiality in these cases made an enormous difference to the default analytics - changing the bad rates by significant factors. The real cause was that the bank didn’t credit-manage this particular product in a way concordant with Basel principles: the issue was resolved by changes to the product, which were in any case appropriate to align business practices with risk issues.

Using a percentage (of limit) as a materiality is a sensible idea on standard products, but can produce unexpected results on large datasets where there will often be some data oddities like limits of $1. Percentage supplemented by an absolute dollar minimum and maximum should provide belt & braces for these situations.

The posts in this thread have retail banking in mind, which leads to an account-oriented approach for ”exposures”. This approach is discussed below, and any reader contributions as to different approaches or nomenclatures will be welcome and helpful.

The unit of analysis is the “account” - a retail bank account (HL,PL,CC,..) keyed by its account_number. For credit risk purposes the only accounts of interest are those that are or can become an exposure, so although “account” would also normally apply to term deposits, we won’t be thinking of them here. The bank’s retail exposure comprises a large number of accounts, grouped up in sub-classes and pools. ”Account” in this generic context therefore is understood to include credit cards, although when speaking specifically about a credit card portfolio it will be natural to think “card_number” rather than “account_number”.

The units of analysis are not the individual customers. If a customer happens to have a HL as well as a CC, that will be two separate accounts, that would fall into different sub-classes: respectively, exposures secured by residential property, and qualifying revolving retail exposures.

Why not use the terminology “exposures”? It seems to be not specific enough, as it could be interpreted to refer either to accounts or to customers, and also to groupings