You are currently browsing the tag archive for the ‘exposed to risk’ tag.
Harking back to the issue of time granularity, and anticipating some default analytic calculations yet to come, let’s get back to small details and look inside the smallest data unit i.e. the time grain.
For typical retail products this granularity would be a month, which is the example carried forward in this post.
Monthly data warehoused for analysis purposes would typically be on a calendar month basis. An alternative (for CC?) might be data on a monthly payment cycle basis.
Even though a grain is ‘small’ there is still latitude for vagueness because data recorded against a month may relate in several ways to the time axis within that month:
- point in time at the beginning of the month
- point in time in the middle of the month
- point in time at the end of the month
- the whole time window comprising that month
For most cross-sectional studies, the time axis is calendar date and the ‘status’ variables like account balance would usually relate to the end of the month, as that would be their most up-to-date value. Other variables that summarise or count transactions (for example) would relate to the whole time window. Certain calculated values (like hazards AWML) may relate to the mid-point of the month.
In cross-sectional studies there is no difficulty in finding the point-in-time variables as at the beginning of a month, because these will be the (end-of-month) values from the previous month’s record – i.e. closing balance for Feb = opening balance for March etc.
If numeric date values are used as the key on a data table, they would most logically perhaps be set equal to the last day of each month, which is unfortunately a bit messy and harder (for a human) to remember than the obvious choice of the 1st of each month.
A non-numeric-date month key like “200805” avoids specifying any particular part of the month, and leaves it up to the user to figure the time relationships from the metadata. A slight disadvantage of such a key is that date arithmetic (figuring out the difference between two dates) becomes non-trivial.
Longitudinal studies would typically rely on performance data for each individual account that is stored cross-sectionally i.e. by calendar month. This introduces a slight wrinkle because the account opening date can be anywhere within a month, whereas the performance data is only available at month ends. So the first performance measurement point an account reaches may come up in only 1-2 days (if the account opened on the 29-30th of a month) or alternatively may represent up to 30 days of exposure-to-risk. Longitudinal studies have MOB rather than calendar date as their time axis, and this means that the MOB=1 analysis really represents on average about 0.5 months of exposure, and likewise all subsequent MOB points really represent on average half a month less. (This example assumes your MOB counting convention starts at 1 rather than from 0.) But in any case, it would be most representative to start at 0.5 and count upwards as 1.5, 2.5, etc.
The above may sound picky, but it can quite easily come about that one analyst’s 12-month OW is another analyst’s 13-month OW due to choices at this level, and this could make a significant change to risk measures.
Further intra-grain issues will be met when calculating hazards. This basically means dividing the number of defaults (at a certain MOB) by the number of accounts that were exposed-to-risk of default. In a longitudinal study the number of accounts exposed-to-risk will always be declining, as accounts close good or go into default. Good practice would therefore be to find the average number (month_start + month_end)/2 of exposed-to-risk accounts during that month for use in the denominator of the hazard.
Actuaries are good at these deliberations because of the care and expertise put into estimation of mortality statistics. If you can’t find a tame actuary, the recommended approach is large diagrams on whiteboards and a bottle of headache tablets.
An important basic concept in default analytics is “exposed to risk” by which we mean risk of going into default unless otherwise specified (one might otherwise be studying risk/propensity of churn, cross-sell etc.)
Abbreviated ETR in this note but AFAIK this isn’t common so won’t be added to the abbreviations list.
Often probabilities are estimated by dividing the number of events that did happen by the number of events that could have happened, and ETR is basically that italicised bit i.e. the denominator of the fraction. The ‘hazards’ and risk PDs of default analytics are just special cases of this situation.
A typical setting is when building an Application PD model: the modelling mart will have some number of accounts that started out at open date (MOB=0), and a certain target OW of (say) 24 months; at the simplest level all the accounts are ETR of going into default within the OW.
However, if account #1 opened only 18 months ago and is still not in default, then although it has been ETR for 18 months, it hasn’t been ETR for 24 months and is not quite the same unit of modelling information as an older account #2 that did survive 24 months. Account #1 has reached the horizon and is said to have been censored. Model builders wouldn’t normally be dealing with these out-of-time (OOT) cases because, knowing that 24 months OW was the target, they would have chosen a sample window (SW) that was at least 24 months before the horizon in its entirety.
But what about account #3 that opened 30 months ago but closed good, i.e. without ever going into default, at MOB=18? Account #3, like account #1, was only ETR for 18 months and is not quite like account #2. There was no way it could have contributed a default event for MOB=19-24 as it was not ETR for 19-24.
That segues into the closed good in vs closed good out discussion AWML but meanwhile opinions and contributions would be welcome from those who have views on the issues. People who study mortality risk have similar issues whereby, for example, they study all individuals for a certain time window. People may emigrate and so be ETR for only a portion of the TW, because one can’t reliably trace their subsequent mortality (survive or die?) in another country. But, you don’t assume they survive (or die); rather you use their information appropriately with respect to their lesser overall ETR.
Because application modelling is longitudinal, the focus is on the first default, so ETR is mostly a matter of the account still being open and not ever having previously been in default. For behavioural modelling which is essentially cross-sectional, there is the additional issue of whether an account is ETR of fresh default or whether it is still included in some previous default episode – link to the re-ageing issue AWML.
There may be subleties in the ETR concept, such as deceased account holders, dormant accounts – are these ETR? Or in a product like reverse mortgage, is there a default risk at all?