Você está na página 1de 4

Data Analysis Assignment I February 16, 2013

Title
Interest Rate charged for a loan is strongly dependant on the borrowers FICO Rating and the Amount Requested

Introduction
Debt is a line of credit that is not only essential for industry and business but also for individuals who need short term loans for say, paying their credit card bills or long term loans for say, paying their college tuitions. Like all lines of credit, debt too has a cost. This cost is the interest you pay on the sum represented by the interest rate. Lending club targets the individual borrower who needs short term credit for exigencies like debt consolidation or payment of credit card bills. Based on a peer to peer model of credit, Lending Club lists loan needs by borrowers (ranging from $1,000 to $35,000) which other members can fund in part or in full, but only at a predetermined interest rate. Lending Club comes to the interest rate using the borrowers credit rating, credit history, desired loan amount and borrowers debt to income ratio. [1] From a borrowers point of view understanding how the interest rate depends on his FICO rating, the amount he is asking for, his credit history etc. he should be able to make an informed judgement as to cost of funding that he is likely to face. This could help a borrower to budget his resources and choose between alternative sources of funding. Our study is an exploratory analysis of such relationships and we come to the conclusion that given the amount of funds requested and the FICO rating, we can predict with reasonable accuracy the likely interest rate.

Methods
Data Collection
The loan data is collected from the lending club website where the loan data is shared under the norms of transparency so that members may analyse the data on their own. [3] There are 2500 rows of observations for 14 variables.

Exploratory Analysis
As a part of exploratory analysis, we performed the following steps: 1. 2. 3. 4. identified missing values produce exploratory graphs and tables of variables to understand their distribution get a better idea of the ranges of the variables and the presence of unusual values transformations needed on the raw data which convert them to a form that makes them easy to use 5. create simple plots between variables to understand the trends in their relationships

Page 1

Data Analysis Assignment I February 16, 2013 6. create exploratory models term by term to the basic model relating interest rates to amount requested and FICO scores

Statistical Modelling
The model is built using basic multivariate linear regression. It stands to intuitive reason that the interest rate increases as the amount requested increases (as the risk of losing a larger sum increases) and the credit rating worsens (the likelihood of the entire sum being returned reduces). The exploratory analysis as well as our intuition of the nature of the variables guides the model. The method of estimation of coefficients was the OLS (Ordinary Least Squares).

Results
The loan data downloaded provides us information on the a. Interest rate asked on the loan as a percentage (5.42% to 24.89%) b. Amount of loan requested in dollars ($1000 to $35000) c. FICO Credit Rating as ranges of values in 640 to 830. The best-known and most widely used credit score model in the United States, the FICO score is calculated statistically, with information from a consumer's credit files. The score is sold by the FICO Company. Higher FICO scores mean a better credit rating and hence a lower interest rate. [2] d. The length of time for which the loan is sought either 36 months or 60 months. e. Revolving Credit Balance in dollars is the total amount outstanding on a borrowers credit cards. It is a measure of the ability of a borrower to payback his revolving credit obligations. A larger outstanding amount a revolving credit will adversely affect ones credit rating. The values are in dollars and range in 5586 to 270800. [4] f. The number of inquiries in the last six months ranging from 0 to 9. When a borrower applies for a new credit, each inquiry he makes counts in the number of credit inquiries. A high number of credit inquiries often lead to poorer credit score. [5] The other variables do not add significant discrimination to our final model to merit separate explanation. Initially, I checked for rows with missing values. There were 2 such rows. Since the complete cases overwhelmingly outnumber the incomplete cases I dropped these 2 rows and proceeded with 2498 rows of observations. Through tabulation of the values I checked that all values are within ranges and do not have unusual values. I saw no specific pattern in terms of interest rates or amounts requested in their histograms. Preliminary data cleaning involved the following steps: a. Interest Rate was converted into a numeric field by stripping the % sign from the value. b. The FICO Range was in the form of a range represented as FICO Range Lower Bound FICO Range upper Bound. Using regular expressions, I stripped out the FICO Range Lower Bound and

Page 2

Data Analysis Assignment I February 16, 2013 included it as the 15th variable in my dataset. I assume that since the range is very small, the FICO Range Lower Bound is a good approximation of the FICO Rating of the borrower. I tried to fit a multivariate linear regression model to approximate the relationship between the interest rate charged and the amount requested as well as the FICO rating of the borrower. The residuals plotted against interest rate had some regularity and patterns which I gradually removed by adding more confounders to my model. The final model was of the form:

Interest Rate = d0+ d1* (Amount Requested) + d2*(FICO Rating) + f (Revolving Credit Balance) + g (Loan Length) + h (Number of Inquiries in the last 6 months) +e
In the above model, d0 is an intercept term; d1 is a change in interest rate for amount requested going up by $1 every other variable remaining the same; d2 is the change in interest rate for FICO Rating going up by one unit for every other variable remaining the same; f, g and h are functions of Revolving Credit Balance, Loan Length and Number of Credit Inquiries respectively. These functions f, g and h are essentially dividing the respective variable into 5 levels. The error term e measures all sources of unmeasured and un-modelled sources of random variation in the value of the Interest Rate. The relationship is extremely statistically significant. The p-value is less than 2.2e-16 and the adjusted R square is 0.761. So the model explains about 76% of the variation in Interest Rate in the given data. In my model d0 = 74.22 (at 95% Confidence Interval 72.45, 75.98), d1= 1.582e-04 (95% Confidence interval 1.456e-04, 1.707e-04) and d2= -8.955e-02 (95% Confidence interval -9.19e-02, -8.71e-02). For a change in amount requested of $1000 and an increase in FICO rating of 100, the Interest rate would decrease by 8.79% (95% Confidence Interval decrease by 9.049%, decrease by 8.542%) everything else staying the same. At the end of the model building, I could discern no further regular pattern in the residuals which could significantly be explained by the other variables.

Conclusion
My analysis shows a strong statistically significant relationship between Interest Rates and the Amount Requested and the FICO Credit Rating of the borrower. This relationship is linear and the coefficients of the linear relationship have been estimated using OLS. The confounders in the relationship having strong relationships with Interest Rates and FICO Ratings and they have been included in the model. Including the confounders in the model leads to a much better model fit. My analysis is merely an exploratory analysis of the relationship. It may be of use to individual borrowers who are seeking loans to predict the interest rates that they would have to pay. However, my model is not a predictive model. It has not undergone any test with an independent test sample to

Page 3

Data Analysis Assignment I February 16, 2013 estimate its goodness at predicting interest rates to be charged. Hence, financial experts who decide interest rates to be charged on a loan need to be wary of using this model. A more thoroughly tested predictive model will better serve their purpose.

References
1. 2. 3. 4. 5. http://en.wikipedia.org/wiki/Lending_Club http://en.wikipedia.org/wiki/Credit_score_in_the_United_States http://www.lendingclub.com/public/transparency.action http://www.ehow.com/about_7550001_revolving-credit-balance.html http://www.myfico.com/crediteducation/creditinquiries.aspx

Page 4

Você também pode gostar