Default episodes have varying lengths. This can lead to a bias called related to the statistical issue called “length-based sampling”
For building an LGD modelling mart, a typical approach would be to collect all the bad episodes that impinge on a certain time window. However this introduces a length-based bias, because the longer episodes have more chance to be represented. Longer episodes are, in turn, quite likely to be correlated with non-average losses.
To get unbiased sampling for building a behavioural mart, specify a sample window and only include bad episodes that started during that window. This will exclude accounts that are already in the middle of a bad episodes at the start of th