• Ingen resultater fundet

Corporate and government bonds are two popular credit market investments. A benefit of investing in bonds over P2P-loans are the established theories explaining bond behaviors and structures. The bond structure closest to a P2P-loan is the fixed-coupon bonds.

A fixed coupon bond pays a fixed coupon at regular intervals until the bond reaches maturity. At maturity debtors pay investors the bond’s principal (Cochrane, 2005). A bonds theoretical price can be calculated as the present value of all cash flows received by the owner of the bond (Hull, 2012). The bonds price is written as:

Pi =

T

X

t=1

Coupon

(1 +r)t +P rincipal

(1 +r)T (15)

T represents the bonds maturity date, r is the interest rate, and t is the time-period.

Most bonds do not sell at par value, but dependent on no default, will mature at par value (Kane et al., 2014). The return of the bond is determined by several characteristics:

the price of the bond, time to maturity and the coupon rate. The bond’s yield is often used by investors to calculate the bond’s return. The yield for a given bond price can be found using equation 15 and solving for the interest rate. Thus, there is a negative relationship between the bond yield and the bond price. Further, the yield of a bond can be viewed as the interest rate that explains the market price of the bond (Cochrane, 2005).

Additionally, the bond market has a negative relationship with the interest rate. When interest rates rise, the opportunity cost of holding a bond increases causing the bond’s price to fall and the bond sells at a discount. The opposite follows for decreases in interest rates. Bonds then sell at a premium (Kane et al., 2014). The decrease in price occurs because the fixed coupons cannot compensate for the higher interest rates (Jordan &

Sundaresan, 2009). As a result, the bond is worth less than the price paid for it and the yield decreases. Thus, one can see how bond performances are directly related to macroeconomic factors and monetary policies.

These changes in interest rate risk have more significant impacts on bonds with longer times to maturities. This is because bonds with greater times to maturities are more sen-sitive to price fluctuations related to changes in the interest rate (Kane et al., 2014). This rationale also acknowledges that bonds with the shortest maturities are the least risky.

Again, this is proved by looking at the relationship between the variables in equation 15.

7 Methodology

The methodology is designed to address our research question of whether P2P-lending is a relevant asset class for investors. A natural way to approach this research question is to compare the expected returns in P2P-lending to other investment opportunities. In order to answer the research question feasibly, the methodology is divided into four parts. The first thing we need to do is find the risks associated with P2P-lending. Since loan default is considered to be the biggest risk regarding P2P-loans, Part I of our methodology intends to find the determinants of loan default. Considering that macroeconomic conditions play a significant role in the performance of traditional investment opportunities, we want to exclude that the macroeconomic condition of the economy is an omitted variable in the performance of P2P-loans. Hence, Part II includes macroeconomic variables in the model. Part III calculates the expected return that a lender should forsee from a P2P-loan investment. Further, Part III also takes the Sharpe ratio as a basis for comparison between the different investment alternatives. Lastly, Part IV compares the risk classification of LendingClub’s credit grades to those provided by Moodys.

7.1 Part I - Determinants of Default

The determinants of loan defaults are analyzed using logistic regression methods. The first step of building a logistic regression model is to split the data into a training and a testing set. By doing so, the model is first optimized on the training set. Afterward, the parameters of the model are run on the testing set to ensure the model is fitting. This last step validates the model and detects over-fitting. 70% of the data is used to train the model, and 30% is used to test it. The two datasets are drawn randomly to guard the sample from bias (Panzeri, Magri, & Carraro, 2010).

To ensure our results are not impaired by errors arising from an imbalanced dataset, we re-balance the data. Four re-balancing methods are carried out: oversampling, under-sampling, doing both and the R-function ROSE. The Random Over-Sampling Exampling (ROSE) function produces a synthetic and balanced dataset by using a smoothed boot-strap approach (Tantithamthavorn, Hassan, & Matsumoto, 2018). Undersampling and oversampling the data are two opposing approached to balance data. When

undersam-pling the data one deletes random observations from the majority class in order to match the number of observations in the minority class. Oversampling, on the other hand, gen-erates random, artificial data to sample the characteristics of the minority class (Badr, 2019). The last approach balances the data by combining the methods of undersampling and oversampling (Tantithamthavorn et al., 2018).

In order to find the best-fitted model, we use several methods for finding model specifi-cations. Namely, Lasso regressions, stepwise selection using AIC, F-tests and standard t-tests are compared. These approaches are performed independently. The Lasso (Least Absolute Shrinkage and Selection Operator) is a powerful feature selection technique.

It is a regularization method that reduces overfitting by removing less important vari-ables, after checking that they are not important for the model. The other regularization method used is stepwise selection. Equivalent to the Lasso method it attempts to remove any insignificant variables from the model. To ensure that we have the best-fitted model, we run the stepwise selection AIC both ways. Running both ways means the function begins with a null or full model and tests the addition or deletion of each variable using a chosen criterion. The model then adds (deletes) the variable whose inclusion (loss) gives the most statistically significant (insignificant) improvement of the model. The last approach is checking the significance of the independent variables using the F and t-test statistics and is a standard approach to finding a correctly specified model. As our base model and starting point, we use the following logistic model:

LoanStatus=α+β1loan.amnt+β2grade+β3home.ownership+β4annual.inc

+β5verif ication.status+β6purpose+β7dti+β8delinq.2yrs+β9years.of.credit.history +β10inq.last.6mths+β11open.acc+β12pub.rec+β13revol.bal+β14total.acc

+β15initial.list.status+β16acc.now.delinq+β17chargeof f.within.12.mths +β18delinq.amnt+β19tax.liens+β20debt.settlement.f lag+β21emp.length+

(16) Six of the variables in the above equation are categorical. Since logistic regression models require numerical inputs, the model converts each categorical variable into a numeric vari-able by employing dummy varivari-ables. After creating dummy varivari-ables for each category, we are left with 42 variables in our model.

The base regression is first run on the training set of each of the four balanced datasets and the imbalanced dataset. For the standard test statistics approach significant variables at the 5% level are removed. The models are then re-run without the insignificant variables to see whether the exclusion of insignificant variables improved the models fit.

The next step after fitting the models is to see how the models perform when predicting Loan Status on a new dataset. Cross-validating the models on our test data is a method to check the models’ robustness. Across data samples and specification methods we have in total of 15 models to compare.

There are several criteria to consider when determining which model best fits the data.

The performance of each model is compared using the performance measures outlined in our Theoretical Framework. Because we are performing analysis from the perspective of an investor who is mainly concerned about the credit risk of their investments, we argue that it is more important to find a model that is strong at correctly predicting defaulted loans, and it is less critical that the model falsely predicts non-defaulted loans as defaulted. Therefore, when analyzing the different models, we place greater emphasis on the ROC curve and the sensitivity of our models, than on the accuracy.

The comparative results and motivation for our model choice is provided as a part of our findings. The final regression model of Part I, Model 1, is defined as follows:

LoanStatus=α+β1loan.amnt+β2grade+β3home.ownership+β4annual.inc

+β5verif ication.satus+β6purpose+β7dti+β8delinq.2yrs+β9years.of.credit.history +β10inq.last.6mths+β11open.acc+β12pub.rec+β13revol.bal+β14total.acc

+β15initial.list.status+β16acc.now.delinq+β17delinq.amnt+β18tax.liens +β19emp.length+

(17)

To gain further understanding of which variables determine default we analyze the vari-able importance of each independent varivari-ables in Model 1. This is done by evaluating the absolute value of the t-statistic for each coefficient.

t

βb=

βc1β0

se(βc1)

(18)

These measures do not quantify the effect of the coefficient on the dependent variable but give an aggregate measure of each independent variable’s impact on the dependent variable.