Optimizing a Loan Portfolio Using a Data-Driven Strategy (2024)

Can financial analysts optimize a loan portfolio based on data-centric strategies & quantitative algorithms?

Published in

Towards Data Science

13 min read

Jul 19, 2020

Authors: Noah Mukhtar, Shaan Kohli, Shaher yar Jahangir and Ramy Hammam

Optimizing a Loan Portfolio Using a Data-Driven Strategy (3)

While banks have traditionally acted as the main provider of loans, the needs of smaller businesses are evolving and now come with more concerns.

2020 has brought along spikes in operating costs and increasing fluctuations in demand.

These are some of the extreme challenges SMEs face when it comes to raising capital — a crucial element of their growth. Banks often overlook these, and as a result, this has opened up the door for the P2P Lending Market to grow.

This is why we are going to create a data-driven investment strategy for Prosper Marketplace, a P2P lending firm, that will take a holistic overview from processing the raw data to inferring business outcomes.

Overall, our predictive model will act as a prescriptive tool that directly portrays the impact on the investor & business.

Two fundamental steps are required for an investor to gain a better understanding of which loan to invest in:

Prosper’s investors will have to decide on how much money to invest (considering potential return) and allocate to other options for investment.
Prosper’s investors will need to decide on picking “good” loans (i.e.: loans which will end up being fully paid off) in which to invest their money, and this is what our loan classification model will focus on.

However, it is important to note that there is no one ideal investment, as it is highly contingent on the risk appetite of the investor.

The datasets to be utilized in this analysis are publicly available online for download using a registered Prosper account.

Predictors

The dataset contains comprehensive information on all loans posted annually and a set of 22 variables that cover the different attributes of an individual loan.

These include but are not limited to the loan amount, the amount of interest & fees paid, loan duration, a Prosper credit rating score, and loan status with reason if completed or defaulted.

Completed vs. Defaulted Loan: 0s vs 1s

There are 4 outcomes for a loan in Prosper’s data:

Completed (A)
Charged Off (B)
Current (C)
Defaulted (D)

To simplify our Prosper data analysis, we decided to assign a value of 0 to completed loans (A) and 1 to the remaining 3 types of loan statuses (B, C, & D) that refer to loans that have not been completed.

How Can We Identify a “Good” Loan?

In order to acquire an intuition of identifying good loans, it is important to realize the correlation & interaction amongst all the variables.

However, the definition of a good loan is still quite ambiguous so it is vital that the prediction should be based on one of these goals: whether a loan will default, paid back early, time of default, or time taken to pay back if paid back early.

Analyzing Loan Terms

Optimizing a Loan Portfolio Using a Data-Driven Strategy (4)

The plot indicates some interesting insights for the 60-month term (the longer-term) such as the highest loan amount being approximately $15,000 for both completed loans as well as uncompleted loans. Unsurprisingly, borrowers tend to take higher loan amounts to pay back over the longer-term loans.

Whereas loans in the 36-month terms (the shorter term) tend to be of much smaller amounts and are more consistently distributed for both completed and uncompleted loans.

Analyzing Interest Rate Distribution Per Loan Status

Optimizing a Loan Portfolio Using a Data-Driven Strategy (5)

Another violin plot was built that shows the different levels of interest rates for each of the loan status per grade type (ranging from AA — HR: highest grade to lowest).

Higher-grade loans for both completed (0) and uncompleted (1) have lower interest rates, and this trend is consistent for both loan statuses.

Data Leakage

This phenomenon occurs when a model is built using predictors that will not be available at the time of future predictions. Therefore, we need to disregard these predictors in order to prevent a bias in our model.

There are two main cases on how leakage can be identified:

1. A predictor that is highly correlated with the target variable. We will check this using a heatmap or correlation matrix of all 22 predictors.

2. The predictor information is not available during the prediction of total return, for example, the late_fees_paid variable.

Optimizing a Loan Portfolio Using a Data-Driven Strategy (6)

As suspected, we noticed that variables that are associated with data leakage such as the interest paid and principal paid stood out to have a high correlation with loan status.

Based on this definition of leakage and the fact that most of these won’t be available when predicting whether a loan will default or not, we disregard the following predictors:

loan_number, late_fees, age_in_months, days_past_due, origination_date, principal_balance, principal_paid, interest_paid, late_fees_paid, debt_sales_proceeds_received, loan_default_reason, loan_default_reason_description, next_payment_due_date, next_payment_due_amount, co_borrower_application

Calculation of Investor’s Returns

The most significant data required in determining the potential return from a given loan is the calculation of the total payments that were received on each of the loans.

To build an effective investment strategy, we need to build a strong indicator variable on the return amount of each loan. It is vital that the return should consider both partially paid off defaulted loans and loans that have been paid off earlier than the due date.

In general, there are three effective return measures that can be created using the following variables:

1. Total payment account denoted by variable p

2. Total amount invested in the loan (amount borrowed by borrower) denoted by variable f

3. Nominal length of the loan in months (term of loan) denoted by the variable t

4. Actual length of the loan in months (loan length) denoted by the variable m

We are going to calculate the expected returns by scenario testing under 3 different strategies:

1) Pessimistic (M1)

The pessimistic approach states that when the loan is paid back, the investor still cannot re-invest it until the term of the loan expires.

Optimizing a Loan Portfolio Using a Data-Driven Strategy (7)

Using equation 1, our team was able to create a new variable called: “ret_PESS”.

It was worth noting that this approach favors long-term loans for loans that default early due to the loss spreading over a greater span.

2) Optimistic (M2)

The optimistic approach entails that once the loan is paid back, the investor’s money is returned, and the investor can invest immediately in another loan with the same return.

Optimizing a Loan Portfolio Using a Data-Driven Strategy (8)

Using equation 2, our team was able to create a new variable called: “ret_OPT”.

This method can be rephrased as the annual monthly return of the loan over the time it was active. However, it is plagued by the assumption that money can be reinvested with the same rate and that if the loan defaults early, the loss can be a drastic overestimate of negative returns. On the plus side, short and long loans are treated equally using this approach.

3) Fixed-horizon (M3)

The fixed-horizon approach involves calculating fixed-time returns for 3 different interest rates: 1,3 & 6 %.

Optimizing a Loan Portfolio Using a Data-Driven Strategy (9)

A function was built, called ret_method_3(), to replicate equation 3 and calculate the 3 new variables for each loan in the dataset.

This approach can be the most accurate by equalizing the differences between loan length and defaulted loans. However, it tends to disregard the depreciation value of the money over time.

Prosper Rating (Grade) Breakdown

A deeper dive into the prosper credit rating (“grade”) variable was performed to explore any significant trends with each of the type of return (M1-M3).

Optimizing a Loan Portfolio Using a Data-Driven Strategy (10)

The higher the grade, the smaller percentage of defaulting, and the lower the interest rates.

The mean return % is very identical with minor increases across the different grades. In addition, there are no negative mean return values across any of the grades.

A quick count of the number of loan status for each type shows a significantly higher number of completed loans when compared to uncompleted loans, indicating a class imbalance in the dataset.

Therefore, the most probable option is to try to rebalance the dataset to consist of 50% of completed loans and 50% of uncompleted loans. However, doing this could prove to be detrimental and cause the average return to be negative, fully going against the point of enticing potential investors to sign up to Prosper.

Thus, to capture actual real-life trends and true proportions, physically balancing the dataset would hinder a realistic understanding of the analysis.

Instead, a calibration curve measure will be used in the model-building phase to check if a constructed model is biased by the data imbalance or not.

The predictive modeling will cover a two-stage approach:

First, we will be constructing a binary classification model to predict the loan default probability using a variety of industry-standard algorithms and evaluating the performance for the optimal and most accurate model.

The second stage involves constructing a regression model using a variety of industry-standard regressors to predict the amount of return a loan may generate to an investor for each of the return approaches (M1-M3).

The models in both phases will be optimized for higher accuracy by hyperparameter tuning through cross-validation.

Stage 1: Classifying Loan Outcomes

The first model was to determine the predictive power of the grade and interest rate of a loan, which is usually the industry’s standard practice.

Optimizing a Loan Portfolio Using a Data-Driven Strategy (11)

Predictors: Grade & Interest Rate
Model #1 — Accuracy Score: 85%

This implies that these two features individually contribute to successfully determining the loan status as much as when all the predictors are used.

However, to build a more robust model and avoid underfitting, we have to drop these two features entirely.

Predictors: All Excluding Grade & Interest Rate
Model #2 — Accuracy Score: ~85%

Optimizing a Loan Portfolio Using a Data-Driven Strategy (12)

Using all the predictors apart from grade and interest rate, model accuracies were maximized at roughly 85%. Although Naïve Bayes provided a very high accuracy, it was unable to account for the biases presented due to the data imbalance.

The calibration curves and AUC scores for all the above models are high, indicating strong performance. The calibration code shows a measure to see if the model is affected by the imbalance. If the calibration is perfect, it means the model is not biased by the imbalanced dataset. In other words, it will not always try to predict the outcome to be the grand majority, which is the completed loans (0) case.

Since Random forest was the best-selected model, we decided to explore the best in class for tree algorithms, also known as the Light GBM classifier developed by Microsoft. It is considered the latest industry trend and winner of multiple Kaggle competitions, so we were curious to see how well it would perform in the Prosper case. As presumed, all the model metrics were very high for the Light GBM classifier.

Feature Importance

Optimizing a Loan Portfolio Using a Data-Driven Strategy (13)

The feature importance plots for most of the models were also generated in the classification stage.

Our model determined “Service fees paid”, “Propser fees paid” and “amount borrowed” as the most useful at predicting our target variable: “loan_status” which matches well with investing intuition.

Learning Curve

Optimizing a Loan Portfolio Using a Data-Driven Strategy (14)

A learning curve was also plotted to assess the learning rate of the optimal random forest models when compared to a sample L2 logistic regression model.

When the training size is increased by intervals of 25 data points until a very large amount, the random forest model AUC increases, meaning that the model will continue to learn and give a much higher percentage of accurate return predictions relative to the logistic model.

Stage 2: Predicting Expected Return

A variety of models were tested using a different set of features. The R2 values of various regression models were calculated for the different return methods.

Optimizing a Loan Portfolio Using a Data-Driven Strategy (15)

The Random forest regressor gave the highest score across all the different types of returns when compared to the rest of the models built.

Investment Strategies

By leveraging both the classification and regression models to predict whether a loan will default and the estimated return, an investment strategy can be formulated to maximize an investor’s average return based on the different methodologies used to calculate the return.

Four different investment strategies were tested:

Random Strategy
Default-based
Simple return-based
Default and return-based strategy.

Optimizing a Loan Portfolio Using a Data-Driven Strategy (16)

In order to depict real-life scenarios, the M3 return approach seems to be a more accurate way to measure the % return. Thus, the baseline for us to compare the different investment strategies would be under the M3 method. Nonetheless, the returns seem to be very close to one and another regardless of which investment strategy is being used when looking at the M3 return approach.

The magnitude of the % return is also highly dependent on the reinvestment interest rate (M3).

Sensitivity Test of Portfolio Size

Next, a sensitivity test was run with regards to the portfolio size versus the investment return percentage

Optimizing a Loan Portfolio Using a Data-Driven Strategy (17)

As the number of loans invested in by a potential investor increases, the percentage of investment return decreases. This is also very intuitive due to the risk factor of a loan getting defaulted increases as the number of loans invested in increases.

In other words, an investor may have a similar chance of getting a high return and then losing most of it due to a defaulted loan and this cycle can keep repeating causing the overall percentage of return to decrease as the number of loans invested in increases.

In this section, we implement three different optimization models to improve an investment strategy using Prosper.

The three different optimization methods are:

1) Directly maximize total profit

A binary variable is set-up for every loan in our data set. The number of loans constraint is added (as per the maximum number of loans in our dataset) and the objective function is defined (maximize total profit).

2) Maximize profit with budget constraint

The second optimization model considers the budget constraint of a potential investor. Like the first model, a new budget limiting constraint is added (testing different budgets), and the optimization problem is resolved.

3) Maximize profit with risk-return tradeoff

The third optimization strategy involves incorporating the portfolio risk factor by considering the variance of the returns.

To achieve this, first, a clustering model is trained with an adjustable k parameter by the investor. Second, the standard deviation is computed for each of the clusters. Third, each loan can be assigned to a specific cluster based on the Euclidean distance from each cluster. Fourth, the standard deviation of the return for each loan can be estimated using the standard deviation of the cluster.

The model is then built to maximize profit with the inclusion of the risk-return tradeoff constraint. A sensitivity/penalty factor can be set by the investor to account for the risk tolerance of the investor.

Optimizing a Loan Portfolio Using a Data-Driven Strategy (18)

Results show that the optimal investment strategy for Prosper is maximizing profit using budget constraints with an expected return on 10.5%.

In conclusion, we saw how information from different attributes can be utilized to create new predictors that may end up boosting the statistical significance of our model as opposed to the original counterparts. It was important to note how a model’s performance over a single partition of the data into cross-validation folds can be misleading and running our model several times to develop different iterations of the train/test splits was essential. Another important part was calibration, and how it measured whether the probabilities produced by the model were correct.

It would be interesting to see how our model would perform if tied to macroeconomic external data that would represent the underlying economy’s performance at the time such as oil prices or The World Bank’s interest rates in order to further extend the pessimistic measure’s performance.

Additionally, it would be interesting to conduct sentiment analysis over the investor to understand what their risk appetite exactly is as opposed to asking for the numerical input as they may lack the understanding of how the P2P lending market operates.

Noah Mukhtar - McGill University - Canada | LinkedIn

I am an aspiring Analyst who enjoys connecting the dots: be it ideas from different disciplines, people from different…

www.linkedin.com

Shaan Kohli - McGill University - Desautels Faculty of Management - Montreal, Canada Area |…

View Shaan Kohli's profile on LinkedIn, the world's largest professional community. Shaan's education is listed on…

www.linkedin.com

Shaher yar Jahangir - Consultant - Project Manager & Modeler - Aéro Montréal | LinkedIn

Shaher Jahangir is a current candidate for the Master of Management in Analytics (MMA) program at McGill University …

www.linkedin.com

Ramy Hammam - McGill University - Desautels Faculty of Management - Montreal, Canada Area |…

View Ramy Hammam's profile on LinkedIn, the world's largest professional community. Ramy has 5 jobs listed on their…

www.linkedin.com

Maxime Cohen - Associate Professor - McGill University | LinkedIn

Maxime Cohen is an Associate Professor of Retail Management and Operations Management and a Bensadoun Faculty Scholar…

www.linkedin.com

Daniel Guetta - Director of the Business Analytics Initiative, Associate Professor of Professional…

View Daniel Guetta's profile on LinkedIn, the world's largest professional community. Daniel has 5 jobs listed on their…

www.linkedin.com

Kevin Jiao - Quantitative Researcher - US Financial Regulation Agency | LinkedIn

View Kevin Jiao's profile on LinkedIn, the world's largest professional community. Kevin has 2 jobs listed on their…

www.linkedin.com

Foster Provost - Distinguished Scientist - Compass | LinkedIn

View Foster Provost's profile on LinkedIn, the world's largest professional community. Foster has 3 jobs listed on…

www.linkedin.com

NoahMMA/loan_portfolio_optimization

How to Optimize a Loan Portfolio Using a Data-Driven Strategy & Quantitative Algorithms GitHub is home to over 50…

github.com

Our Inspiration — Original Study

https://www.liebertpub.com/doi/pdf/10.1089/big.2018.0092

I am an experienced professional in the field of data-driven strategies and quantitative algorithms, with a focus on financial analysis and portfolio optimization. My expertise lies in understanding the intricacies of loan portfolios, utilizing data-centric approaches, and implementing predictive models to enhance investment strategies. I have a deep understanding of the concepts discussed in the article you provided.

Now, let's delve into the key concepts outlined in the article:

Evolution of Lending Landscape:
- Banks traditionally provided loans, but smaller businesses face challenges like increased operating costs and demand fluctuations.
- The rise of Peer-to-Peer (P2P) Lending Market addresses the gaps overlooked by traditional banks.
Data-Driven Investment Strategy for Prosper Marketplace:
- The goal is to create a predictive model for Prosper Marketplace, a P2P lending firm.
- The model aims to provide a holistic overview from processing raw data to inferring business outcomes.
Loan Classification Model:
- Investors need to decide on the amount to invest and which loans to choose.
- The article discusses the classification of loans into "Completed" and "Not Completed" categories (Charged Off, Current, Defaulted).
Dataset and Predictors:
- Publicly available dataset from Prosper includes comprehensive information on loans, with 22 variables covering loan attributes.
- Predictors include loan amount, interest & fees paid, loan duration, credit rating score, and loan status.
Analyzing Loan Terms and Interest Rates:
- Insights from analyzing loan terms, such as higher loan amounts for longer-term loans.
- Interest rate distribution analysis per loan status, with higher-grade loans having lower interest rates.
Data Leakage Identification:
- Addressing potential data leakage by identifying predictors highly correlated with the target variable.
Calculation of Investor’s Returns:
- Developing three return measures (Pessimistic, Optimistic, Fixed-horizon) to calculate expected returns.
- Factors considered include total payment, amount invested, nominal length, and actual length of the loan.
Prosper Rating (Grade) Breakdown:
- Exploring credit rating trends and their correlation with default rates and interest rates.
Predictive Modeling Stages:
- Two-stage approach involving binary classification for loan outcomes and regression for expected returns.
- Model optimization for accuracy using industry-standard algorithms.
Investment Strategies:
- Testing four investment strategies based on classification and regression models.
- Sensitivity tests for portfolio size versus investment return percentage.
Optimization Models:
- Three optimization methods to improve investment strategy, including profit maximization, budget constraints, and risk-return tradeoff.
Results and Conclusion:
- Optimal investment strategy involves maximizing profit with budget constraints, yielding an expected return of 10.5%.

The article showcases a comprehensive analysis of optimizing a loan portfolio using data-driven strategies, quantitative algorithms, and predictive modeling techniques. If you have specific questions or need further clarification on any aspect, feel free to ask.

Optimizing a Loan Portfolio Using a Data-Driven Strategy (2024)

FAQs

How do you manage a loan portfolio effectively? ›

Assess the borrower's creditworthiness, repayment capacity, and risk profile. Data-driven underwriting can reduce default rates and manage risk better. Monitor Portfolio Performance: Analyze the loan portfolio regularly to identify trends and potential risks.

How can I improve my loan portfolio quality? ›

One of the strategies to improve your loan portfolio is to diversify your loan products. This strategy has stood the test of time in the lending industry, and it means introducing novel loan products that cater to different customer segments, needs, and preferences.

Explore More ›

How do you optimize your portfolio? ›

1) For day-today expenses (3 months or less) and a low risk tolerance it may be best to invest in savings & checking or money market funds; 2) For safety net money (3-18 months) and a medium risk tolerance it may be best to invest in ultra-short duration fixed income or deposits & CDs; 3) For longer term needs (18+ ...

Get More Info ›

What are the key strategies they employ to maintain a balanced and healthy loan portfolio? ›

By implementing measures such as collateral requirements, credit checks, and risk-based pricing, banks can reduce the risk of default and protect their loan portfolio. Poor credit risk management can have severe consequences for banks and financial institutions.

Tell Me More ›

What are the 4 different types of portfolio management strategies? ›

There are four main portfolio management types: active, passive, discretionary, and non-discretionary.

Read On ›

What are the six steps to effective portfolio management? ›

6 Steps for implementing portfolio management

Step 1 – Define criteria for your projects. ...
Step 2 – Define the project initiation process. ...
Step 3 – Clearly defined prioritisation method. ...
Step 4 – Have an overview of the running projects. ...
Step 5 – Compare the planning of upcoming projects with the remaining budget.

More items...

Find Out More ›

What are the 5 C's used to rate the quality of a loan? ›

Lenders also use these five Cs—character, capacity, capital, collateral, and conditions—to set your loan rates and loan terms.

Tell Me More ›

What are the indicators of loan portfolio quality? ›

As a result, the loan portfolio quality is very central to MFIs. In the microfinance industry, there are four indicators commonly used to measure loan portfolio quality: Portfolio at Risk (PAR), Write-off Ratio, Impairment Expense Ratio, and Risk Coverage Ratio.

See Details ›

How do you grow a commercial loan portfolio? ›

Get in the Game

Cultivate centers of influence as referral sources. ...
Meet regularly with your line lenders. ...
Keep your prospect databases current. ...
Increase community involvement and visibility. ...
Cultivate cross-sell opportunities. ...
Scrutinize your customers' financial statements.

More items...

Aug 23, 2013

Tell Me More ›

What is the optimal portfolio strategy? ›

An optimal portfolio aims to strike a balance between generating returns and managing risk. An optimal portfolio also takes into consideration an investor's goals and their comfort level with risk.

Explore More ›

What is an example of portfolio optimization? ›

If an investor has $1000 to invest in a variety of assets, they can use mean-variance optimization to look at the differing rates of return for each asset being invested and determine how much of the $1000 to put into that asset. This helps to spread resources in such a way that they will create the best results.

What are portfolio optimization models? ›

The model assumes that an investor aims to maximize a portfolio's expected return contingent on a prescribed amount of risk. Portfolios that meet this criterion, i.e., maximize the expected return given a prescribed amount of risk, are known as efficient portfolios.

Get More Info ›

How to analyze a loan portfolio? ›

Review the composition of the loan portfolio by type, dollar volume, and percentage of capital. Determine whether specialty-lending areas exist, including any new loan types, and assign responsibility for completing appropriate reviews. Refer to individual Loan Reference modules for additional procedures.

Optimizing a Loan Portfolio Using a Data-Driven Strategy (2024)

Can financial analysts optimize a loan portfolio based on data-centric strategies & quantitative algorithms?

Predictors

Completed vs. Defaulted Loan: 0s vs 1s

How Can We Identify a “Good” Loan?

Analyzing Loan Terms

Analyzing Interest Rate Distribution Per Loan Status

Data Leakage

Calculation of Investor’s Returns

In general, there are three effective return measures that can be created using the following variables:

1) Pessimistic (M1)

2) Optimistic (M2)

3) Fixed-horizon (M3)

Prosper Rating (Grade) Breakdown

The predictive modeling will cover a two-stage approach:

Stage 1: Classifying Loan Outcomes

Feature Importance

Learning Curve

Stage 2: Predicting Expected Return

Sensitivity Test of Portfolio Size

The three different optimization methods are:

1) Directly maximize total profit

2) Maximize profit with budget constraint

3) Maximize profit with risk-return tradeoff

Noah Mukhtar - McGill University - Canada | LinkedIn

I am an aspiring Analyst who enjoys connecting the dots: be it ideas from different disciplines, people from different…

Shaan Kohli - McGill University - Desautels Faculty of Management - Montreal, Canada Area |…

View Shaan Kohli's profile on LinkedIn, the world's largest professional community. Shaan's education is listed on…

Shaher yar Jahangir - Consultant - Project Manager &amp; Modeler - Aéro Montréal | LinkedIn

Shaher Jahangir is a current candidate for the Master of Management in Analytics (MMA) program at McGill University …

Ramy Hammam - McGill University - Desautels Faculty of Management - Montreal, Canada Area |…

View Ramy Hammam's profile on LinkedIn, the world's largest professional community. Ramy has 5 jobs listed on their…

Maxime Cohen - Associate Professor - McGill University | LinkedIn

Maxime Cohen is an Associate Professor of Retail Management and Operations Management and a Bensadoun Faculty Scholar…

Daniel Guetta - Director of the Business Analytics Initiative, Associate Professor of Professional…

View Daniel Guetta's profile on LinkedIn, the world's largest professional community. Daniel has 5 jobs listed on their…

Kevin Jiao - Quantitative Researcher - US Financial Regulation Agency | LinkedIn

View Kevin Jiao's profile on LinkedIn, the world's largest professional community. Kevin has 2 jobs listed on their…

Foster Provost - Distinguished Scientist - Compass | LinkedIn

View Foster Provost's profile on LinkedIn, the world's largest professional community. Foster has 3 jobs listed on…

NoahMMA/loan_portfolio_optimization

How to Optimize a Loan Portfolio Using a Data-Driven Strategy & Quantitative Algorithms GitHub is home to over 50…

Our Inspiration — Original Study

FAQs

How do you manage a loan portfolio effectively? ›

What is an example of portfolio optimization? ›

Shaher yar Jahangir - Consultant - Project Manager & Modeler - Aéro Montréal | LinkedIn