It is important to bear in mind when conducting regression analysis for fair lending that models must be fit to data.
This means that there are often limitations posed with regard to the data itself. This, in turn, affects how the regression model can and should be specified. The analysis must consider the distributions, including the degree of variation within the data as well as sample size and composition.
There is somewhat of an infatuation with the term “model” and “modeling” which intersect different facets of the financial industry. These include not only regulatory compliance but asset quality, loan pricing, marketing, among others. The term model has become somewhat of a catch-all, but the application of quantitative and statistical methods are not universal but specific to each particular application. How a model is constructed may also differ with each specific data set in order to provide an accurate and valid analysis.
Diving In
As the case with every discipline, the field of econometrics has become highly specialized. Over the last several decades, an array of techniques have been developed that can be applied to a host of analytical questions or problems. Many of these bring their own complexities and assumptions. What is appropriate can vary widely with the objective of the analysis.
As an example, one conception of a model is to estimate a set of parameters that are then applied to predict future events or behavior. In such cases the model is static and data are then “run through” the model in order to forecast or otherwise quantify future events. Forecasting is a unique specialization with its own set of techniques and assumptions, most of which would be inapplicable to fair lending.
The objective here should be to answer a simple question: Are there differences in treatment of applicants with regard to protected and non-protected class status? It is more likely than not that this question cannot be properly answered with a static “model” as the composition of the data will vary with each data set.
That is why it is important to understand any model must be fit to the data. It could be inappropriate to simply rigidly force data through a statistical model for fair lending analysis without considering both the distribution and composition of the data.
While it is always preferable to have complete data and include all relevant factors in a regression model, this is often not feasible or practical. This is particularly true with regard to fair lending analyses. And, a model is merely a simplified version of reality designed to provide information about an unobservable population. These are limitations which must be understood when interpreting regression results.
The current regulatory emphasis with regard to regression for fair lending is generally threefold:
- Only specific policy related factors should be included in a regression.
- The model should contain all factors.
- These factors should be as precise and granular as possible.
Issues With the Approach
While this perspective is a reasonable and sensical approach, problems arise when there are limitations imposed by both sample distribution and the availability of data. Loan origination systems were not designed with regression analysis in mind, and both data availability and accuracy are always potential issues.
When data is unavailable or questionable with regard to accuracy, this must be accounted for in some way in the model and/or taken into account when interpreting results. These issues are common in this work and must be understood and addressed accordingly.
The second issue, even when complete data is available, is the composition of the sample itself. For example, policy factors may be very specific with regard to certain quantifiable benchmarks such as credit score, loan amount, or loan term.
It may be easy to define variables for incorporation into a model as such, but this may not be appropriate for the dataset. We have covered this topic in previous posts (see below for further reading), but it is non-sensical to simply include a large number of variables because they are based on policy as opposed to specifying a model that best measures the true relationships in the data.
At the end of the day when it comes to fair lending, it is in everyone’s best interest to arrive at the correct answer. This should always be the goal. Sometimes less is more in meeting this objective.
Further Reading:
- Modeling Rate Sheet Variables for Fair Lending Regression Analysis
- Model Specification Issues in Fair Lending Regression Analysis
- Issues in Regressions Modeling for Fair Lending Underwriting Analysis (Part 2)
How to cite this blog post (APA Style):
Premier Insights. (2018, March 22). Model Specification for Fair Lending: When Less is More [Blog post]. Retrieved from https://www.premierinsights.com/model-specification-for-fair-lending-when-less-is-more