One issue that can make a big difference to discussions about default analytics is the granularity of the time-based data that analysts are working from. Modellers often work with monthly data, but depending on context other credit risk analysts might be working with daily, or perhaps annual, data. Given this range, some issues that are difficult for one analyst may be trite or non-existent for another.
One extreme of the time granularity spectrum is probably annual data. This might apply to non-retail exposures, where meaningful updates to the risk information (for building, say, behavioural models) might arise only from annual financial statements. This would be a data-poor environment, placing more weight on banking expertise and less on credit risk analytics.
For retail banking, monthly granularity is the common warehousing level for data that will be the the prime source for credit risk analytics. For HLs & PLs, this might take the form of monthly snapshots, being summaries compiled at month-end of relevant data fields. CCs, however, have specific statement and repayment cycles not based on calendar month ends. So, although CCs basically have monthly granularity, they might not fit comfortably in a month-end snapshot warehousing approach, but one way or another will have some monthly data summarisation and warehousing.
For the AIRB purposes of using a few years of history to build (and use) retail banking models, monthly granularity is the typical basis and will be the assumption unless stated otherwise. Readers are asked to provide examples from their own environments, as these details can often make a big difference to the discussions.
At the shortest extreme, naturally much credit risk monitoring happens at daily granularity, but IIUC not many modellers would be analysing substantial extracts from raw data sources at daily granularity.
Intra-grain: what happens within the month? The warehouse would no doubt record risk variables at the end-of-month, but perhaps also for the worst level reached during the month. If not, there might be an account recorded as 111DPD at 31March and as 0DPD at 30April and no way to tell whether or not the account reached 120DPD during April. Then, a default flag built on this data would be a series of calendar-month-end default tests, rather than a “bad-ever” flag. This is not so much a problem as a difference – OK as long as you know what you are dealing with. IIUC in the old days, when computers were only mainframes, this kind of intra-grain issue could be substantial and even apply to an entire outcome window, such that an account was only determinable as good or bad at window end. These examples illustrate how time granularity plays a role in credit risk data.
Intra-grain: exposure for less than a month? Accounts open randomly throughout a month, so if credit risk data is summarised and warehoused on any regular schedule (like month-end), this will mean that accounts are exposed to risk for only half-a-month on average during their first “month on books”. Does anyone out there worry about this kind of issue? It may not sound important, but it might mean that your 12-month outcome window is really an 11.5-month outcome window. Perhaps data based on payment cycles, rather than regular snapshots, can avoid this issue, although IIUC it will pop up in other ways. BTW the issue in this paragraph applies to application modelling, rather than behavioural modelling, because of the intrusion of the account open date.
3 comments
26 March, 2008 at 20:15
Andrew
Clive,
Would one possible (longer term) solution be to keep really pared down daily snapshots for the bare minimum of information needed for this analysis, such as account number, DPD, balance and limit? OTOH, disk space is really cheap so perhaps it is a good time to look at altering archiving policies.
27 March, 2008 at 12:14
Clive
Daily snapshots as you describe would be easily possible by today’s standards. Many banking systems would already have something like this for the current month and one or two months prior, but not keep older archives because the monthly summaries and/or transaction histories suffice for all important questions.
The staightforward design for a daily snapshot data table would have one record (row) for each day for each account: thus one record of the table might look like (26/3/2008, 0123-456-7, 0, -$8972.15, $10000). This design has a two-part key, namely date*account_number.
A more space-efficient plan is to have the date portion of the key working as a date-valid-from field, such that a new record is only created if the content of any non-date field has changed. For vanilla products like HL or PL, and assuming that the account doesn’t have the DPD counter ticking upwards, this should lead to only a couple of records per month, to capture the monthly instalment movements. For reasons of efficient query logic, an implementation of this design should additionally have a date-valid-to field plus a simple Y/N indicator flag for the “current” (i.e. latest) record for each account.
It is not difficult for programmers to convert between the above two designs on the fly when required by the task at hand.
What advantages might be derived by having daily, rather than monthly, snapshots?
The original post hinted at some, amounting to a better grip on time (and data) granularity at levels below a month. This would be welcomed by modellers, but the cost/benefit would need to be argued. OTOH I think a greater benefit would come in the behavioural information that might be unlocked from analysis of intra-month spending patterns on CCs and transaction accounts. This could have value beyond credit risk, e.g. retention and cross-sell.
10 May, 2008 at 01:06
Inside a time grain « ozrisk.net
[…] Default Analytics by Clive Tags: exposed to risk, time granularity Harking back to the issue of time granularity, and anticipating some default analytic calculations yet to come, let’s get back to small […]