Fundamentals for implementing data analysis

The implementation of data analysis is done by following some steps as showed by figure 7.1.

Figure 7.1 – Steps for implementing data analysis

6.2.1 Identification

This first step involves the identification of the objective, the goals, the required data and the data sources.

The objective of a P2P platform is to perform the best credit assessment possible, predicting risk of default. In order to do that, it is important to identify the following steps:

1. Identify the goal

2. Identify the required data to achieve this goal 3. Identify what is the source of this data

The goal is determining what are the elements needed in assessing the credit of a borrower, which are determining ability and willingness to repay a loan’s principal plus interest. Once the goal is identified, the platform needs to decide, which kind of data will be used in making their credit assessment: financial data and/or non-financial data. Note here that, as already mentioned in the previous chapter, credit assessments using both financial and non-financial data are better. If the platform chooses to use both financial and non-financial data, it would then need to define which non-financial data will be used: is it non-financial data regarding past loans or is it alternative data?

The next step is finding which are the data sources for this data.

• Historical financial data – borrower-specific data and quantitative data regarding previous loans

• Alternative data – online data collected by the platform (clickstream), data regarding the social presence and evaluation of the borrower (social media data), data regarding the borrower’s use of phones/mobile phones (mobile data) and so on.

Determining the kind and source of data is only the beginning. The platform has to also assess the availability of this data, present expected outcomes, and delimit the inputs (variables) that are needed to achieve the goals. An example of useful variables to achieve the described goal above, using a combined method can be seen below:

6.2.1.1. Determining ability to repay

In order to determine ability to repay, the following information should be collected:

• Presence of collateral and proportion of collateral value in relation to loan value

• Presence of cash in form of equity or borrower own investment in the project, and its proportion in relation to loan value

• Evaluation of the borrower’s company profitability, liquidity, leverage and efficiency

• Evaluation of the owner and top management’s debt factor

6.2.1.2. Determining willingness to repay

• Past behaviour – determining whether the owner, the members of the top management and the company is registered as bad-payer in: RKI and Debtor register. Also, determining whether the company is in bankruptcy and it the owner and top management have had prior bankruptcies

• Time stamp of the loan request.

• The borrower’s behaviour on the website, with emphasis on, whether the borrower has checked and read the terms and regulations of the loan contract as well as shown interest /curiosity in knowing before accepting the terms, which would be the RRR for the loan.

• Does the company has a fixed phone line, does the company’s calls get answered and or returned?

• Determining the company’s online social presence – is it registered on Facebook, Twitter, Linkedin, Instagram, YouTube, etc.?

o Analyse the number of posts, comments, shares, likes and dislikes, as well as the time it takes for a company to answer comments/questions.

o Analyse the star ratings of the company on those platforms as well as on other review platforms such as Trustpilot, Yelp, etc.

6.2.2 Collection

As already discussed, predictive analysis requires large volumes of data, with a certain number of defaults to show the trends and insights the data contain. Collection of different kinds of data coming from multiple sources should require a unitary approach to data. This is an issue because data islands are not all designed using only one specific key across systems. That means that issues regarding different format or categorization of data might arise. Additionally, it is not only the way the data is stored that should be taken in consideration, but also how the data is structured: some data will be presented in the form of tables (datasets from previous loans) while other data will be presented in a much less structured way (Facebook comments).

6.2.3 Analysis

Once all the data is acquired, the next step is to clean the data from all variables that should not be knows at the moment of the underwriting of a loan. After the data is cleaned from all leakage, the analytical process starts, trying to find patterns that indicate risk of default.

Based on quantitative data from past loans (historical financial data) and data regarding online behaviour as well as defaulted companies, patterns of the following data should be found:

• Find patterns from past loans that are indicative of default in all the dataset regarding variables such as age of company, age of borrower, gender of borrower, zip code of company, purpose of the loan, etc.

• Find patterns from past loans indicating the best range of financial ratios, collateral and cash invested in proportion to the loaned value

• Find patterns from defaulted companies and their social media data regarding posts, comments, shares, likes and dislikes, as well as the time it takes for a company to answer comments/questions, star ratings and reviews

• Find patterns regarding time of request for loan, and eventually also patterns referring to the company’s choice of phone (fixed line or not), as well as borrower’s choice of phone (IOS vs.

Android), and subscription 6.2.4 Evaluation

When the patterns are found, predictive models can be created, and the borrower-specific data must be evaluated in comparison with those predictive models to result in a credit scoring of the borrower.

6.2.5 Implementation

After the previous steps are done and the best predictive model is chosen, it is permanently incorporated by the P2P platform. It would, however, be beneficial to keep searching for better models, as more data is added.

In document Copenhagen Business School MINIMIZING INFORMATION ASYMMETRY IN A P2P CONTEXT WITH HELP OF DATA ANALYSIS Credit rationing of SMEs (Sider 62-66)