In a previous post, we discussed the binary choice regression model which is often employed in evaluating consistency in underwriting with respect to fair lending laws and regulations. Let’s continue the discussion.
In such models, the dependent variable y in a regression model only takes two possible outcomes, generally 0 or 1. As such, we typically regress denial (y=1 if denied, 0 approved) on a target group indicator variable and other explanatory factors to measure conditioned denial incidences.
When approaching these analyses, the question becomes the choice of functional form the regression model is to take. We noted there are a number of ways to proceed, including ignoring that y is Binary and estimating the regression by Ordinary Least Squares (OLS). With the outcome variable dichotomous, the OLS model is referred to as the linear probability model (LPM) since the predicted value of y is the estimated probability that y =1 given x.
The same principle applies to other types of analysis where y is dichotomous. Doing so, however, creates a few problems including a heteroskedastic disturbance term and the predicted probabilities may fall outside the [0, 1] interval. Therefore, Logit and Probit estimation have become standard in the economic literature, primarily since they constrain the predicted probabilities to the [0, 1] interval. These typically are the models of choice for fair lending analysis with respect to underwriting and, in particular, the Logit model.
While OLS is considered the incorrect functional form for Binary choice models, Logit and Probit have problems as well – the severity of which is heavily dependent on the data that is being analyzed and the distributions and compositions thereof. This includes things such as the sample sizes and the degree of variation in the variables. These issues arise frequently in fair lending data in particular and may not always be readily apparent without close examination. The result can be biased parameter estimates which are particularly problematic in regard to fair lending.
To provide a simple illustration, we generate data as follows:
where denial* is known as a latent variable; this is an unobserved variable that determines whether the loan is denied or not. Using 3,000 observations of which 356 (11.9%) are target group applications (minority) we generate data using the above model. We randomly generate data for credit score (CS) and loan-to-value ratio (LTV) such that the average values are about 600 (CS) and 85 (LTV) for both target and control group applications. The model generates 45.51% denial rate for target group applicants compared to 34.30% for control group applicants, a disparity of 11.21%. Table 1 shows estimation results for OLS, Probit, and Logit.
We note a few things when comparing the estimates in Table 1. First, the Logit estimates are approximately 1.8 times larger than Probit estimates, which is always true with the models. However, the t-statistics are very similar. All estimates take the same signs and significance regardless of model.
The OLS coefficients are different, but we note that in OLS the coefficients are the derivatives (dY/dX) or marginal effects. That is, the OLS target group coefficient indicates that these applicants are 12.55% more likely to be denied (close to the actual value of 11.21%). The coefficients for the Probit and Logit models have no direct interpretation. Thus, to compare across models we calculate the marginal effects for each model, reported in Table 2.
We can see in Table 2 that once we convert to marginal effects, the various models are nearly identical. Thus, despite the problems with OLS (heteroscedasticity and predicted probabilities outside the allowable range), the models are nearly identical in terms of marginal effect and parameter significance. All three parameter estimation methods appear to work well with these data.
However, Logit and Probit results may be highly sensitive to rare occurrences and unbalanced or small samples which can be exacerbated when the model includes dummy variables. To illustrate this, we examine the effect of sample size by reducing the sample to 100 observations of which 12 are target group applications. In this example, 6 of 12 target group applicants (50%) are denied compared to 36 of 88 control group applicants (40.9%), a disparity of 9.1%. Tables 3 shows the parameter estimates for this sample size.
We note here that the results for the Probit and Logit models are unreliable in this case. While the target group coefficients are very large, standard errors are Z-statistics are not reported (NR) in the output. In addition, the CS coefficients despite being very large are highly insignificant. However, the OLS model still generates reasonable results. Table 4 gives estimated marginal effects.
In examining the marginal effects, we again note that Probit and Logit perform poorly. The estimated marginal effects are 0 for all parameters. On the other hand, OLS marginal effects are very similar to those when N = 3000. The OLS model indicates that discrimination exists, which it does, although it does overestimate the effect. However, Logit and Probit indicate no effect. Small sample sizes lead to less precise estimation in OLS due to large standard errors, but the Probit and Logit estimators are completely unreliable in this case.
The Logit and Probit models have become the standards in fair lending analysis of underwriting. However, serious estimation problems may arise in both, particularly in fair lending data. These problems may not be as clear as our example above illustrates and, therefore, may not always be readily detectable.
Be watching for our upcoming comprehensive white paper on underwriting analysis for fair lending where we address these and other issues in more detail.