SlideShare a Scribd company logo
1 of 92
Download to read offline
ANALYSIS OF THE FACTORS USED BY VALUATION MODELS FOR
MORTGAGE-BACKED SECURITIES
By
Bhawani Singh
A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science
in Management and Systems Stern School of Business
New York University
2015
Table of Contents
Table of Tables.................................................................................................................................v
Table of Figures...............................................................................................................................vi
Acknowledgements ........................................................................................................................vii
A Declaration................................................................................................................................ viii
Abstract............................................................................................................................................ix
Chapter 1 - Introduction .................................................................................................................1
1.0 Introduction..............................................................................................................................1
1.1 Purpose of Research.................................................................................................................3
1.2 Problem Definition ..................................................................................................................6
1.3 Research Question ...................................................................................................................7
1.4 Conclusion ...............................................................................................................................7
Chapter 2 - Literature Review........................................................................................................9
2.1. Introduction.............................................................................................................................9
2.2 Types Of Risks That Effect the Valuation Of a Mortgage Backed Security...........................9
2.3 Prepayment and Default Risk Models ...................................................................................10
2.3.1 Perfect Payer (Refinance Activity). ............................................................................. 13
2.3.2 Perfect Payer (Age of the Mortgage Assets)................................................................ 13
2.3.3 Loan Balance. .............................................................................................................. 14
2.3.4 FICO score................................................................................................................... 14
2.3.5 Geographic................................................................................................................... 15
2.4 Interest Rate Risk Models......................................................................................................16
2.4.1 Vasicek model.............................................................................................................. 16
2.4.2 Cox, Ingersoll and Ross (CIR) Model. ........................................................................ 17
2.4.3 Black–Derman–Toy Model.......................................................................................... 17
2.4.4 Ho–Lee Model. ............................................................................................................ 18
2.4.5 Hull–White Model. ...................................................................................................... 19
2.4.6 The Black–Karasinski Model. ..................................................................................... 19
2.5 Other Models .........................................................................................................................20
2.5.1 Heath–Jarrow–Morton (HJM) Model.......................................................................... 20
2.5.2 LIBOR Market Model.................................................................................................. 20
2.6 Subprime and Prime Mortgages ............................................................................................21
2.7 Foreclosure Process ...............................................................................................................22
2.7.1 Judicial Foreclosure. .................................................................................................... 23
2.7.2 Power of Sale. .............................................................................................................. 23
2.7.3 Strict Foreclosure......................................................................................................... 23
2.8 Conclusion .............................................................................................................................23
Chapter 3 - Research Methodology And Design.........................................................................25
3.1 Introduction............................................................................................................................25
3.2 Research Question and Hypothesis........................................................................................25
3.3 Relevance of Topic ................................................................................................................26
3.4 Research Methodology ..........................................................................................................27
3.5 Research Design: Variables Identified ..................................................................................28
3.5.1 FICO Score. ................................................................................................................. 29
3.5.2 Geography of Loan. ..................................................................................................... 29
3.5.3 Loan Balances.............................................................................................................. 30
3.5.4 Perfect payers............................................................................................................... 30
3.6 Conclusion .............................................................................................................................30
Chapter 4: Data Collection ...........................................................................................................31
4.1 Introduction............................................................................................................................31
4.2 Database Description: Population and Sample......................................................................31
4.3 Database Description: Reliability and Validity .....................................................................34
4.4 Conclusion .............................................................................................................................38
Chapter 5 - Results and Analysis..................................................................................................40
5.1 Introduction............................................................................................................................40
5.2 Data Analysis and Interpretation ...........................................................................................40
5.3 Conclusion .............................................................................................................................67
Chapter 6 – Conclusions and Recommendations........................................................................68
6.1 Introduction............................................................................................................................68
6.2 Conclusion: Hypothesis Holds True......................................................................................68
6.3 Recommendations..................................................................................................................68
6.5 Contribution of This Study ....................................................................................................71
6.6 Limitations of the Study ........................................................................................................72
6.8 Conclusion .............................................................................................................................72
iii
References.......................................................................................................................................74
Appendix A – FEFUO Letter........................................................................................................82
Appendix B – Glossary..................................................................................................................83
iv
Table of Tables
Table 3-1. Models Used in the Industry and Their Component Variables ................................... 23
Table 5-1. R2
Explanatory Power ................................................................................................. 37
Table 5-2. Statistical Analysis for pool BCAP 2007-AA2 22A1................................................. 38
Table 5-3. Statistical Analysis for pool BOAA 2005-1 2A1........................................................ 41
Table 5-4. Statistical Analysis for Pool B0AA 2005 10 5A1....................................................... 43
Table 5-5. Statistical Analysis for pool BCAP 2007-AA2 33A1………………......................... 43
Table 5-6. Statistical Analysis for pool BOAA 2005-6 7A1........................................................ 46
Table 5-7. Statistical Analysis for pool BOAA 2005-1 2A1........................................................ 47
Table 5-8. Statistical Analysis for pool AMAC 2003-12 2A ....................................................... 49
Table 5-9. Statistical Analysis for pool AHM 2005-2 3A............................................................ 50
Table 5-10. Statistical Analysis for pool AMAC 2003-12 2A ..................................................... 60
Table 5-11. Statistical Analysis for pool AHM 2005-1 8A1........................................................ 60
Table 5-12. Statistical Analysis for pool AHM 2005-1 6A.......................................................... 61
Table 5-13. Statistical Analysis for pool AHM 2004-1 1A.......................................................... 61
v
Table of Figures
Figure 5-1. Change in Median and Mean Incomes 2001-2010 ................................................... 52
Figure 5-2. Change in Median and Mean Net worth 2001-2010................................................. 52
Figure 5-3. Change in real GDP .................................................................................................. 53
Figure 5-4. Monthly Change in Nonfarm Employment............................................................... 53
Figure 5-5. Unemployment Rate ………………......................................................................... 54
Figure 5-6. Long Term Unemployment....................................................................................... 54
Figure 5-7. Before Tax Family Income 2001-2004...................................................................... 55
Figure 5-8. Before Tax Family Income 2007 - 2010.................Error! Bookmark not defined.56
Figure 5-9. Before Tax Family Income 2007 – 2010 Continued.................................................. 57
Figure 5-10. Amount Before Tax Family Income ........................................................................ 58
Figure 5-11. U.S. Wages............................................................................................................... 58
vi
Acknowledgements
I sincerely thank Nouriel Roubini for his service as my Thesis Supervisor. I also thank Dr.
Sandra Marshall and Dr. Jeffery Keefer for the Research Project and Research Process and
Methodology (RPM), which prepared me for my thesis research. Additionally, I would like to
thank Dr. Nitya Singh for the guidance to pursue my studies at NYU. My thanks also go to all
the instructors at Stern , from whom I learned a great deal.
vii
A Declaration
I grant powers of discretion to the Department, SPS, and NYU to allow this thesis to be
copied in part or in whole without further reference to me. This permission covers only copies
made for study purposes or for inclusion in Department, SPS, and NYU research publications,
subject to normal conditions of acknowledgement.
viii
Abstract
Since 2007, a greater emphasis has been placed on the valuation of mortgage-backed
securities (MBS), especially because of the systematic risk that they pose to financial
institutions in particular and the whole economy in general. The thesis, therefore, evaluated the
various methodologies presently used by rating agencies such as S&P and banks such as JP
Morgan for inhouse valuation to calculate the value of a mortgage-backed portfolio. Given that
there are multiple models to value prepayment, default, and interest rate risk for mortgage-
backed securities, the thesis examined the level of correlation between the main input factors
used in the various models and the foreclosure rates. Using hypothesis testing the result findings
go on to suggest that there is a relationship between the dependent variable (foreclosure rates)
and independent variables (credit score, perfect payer, balance and geography), the relationship
is not a static one, but a dynamic once. This relationship suggests that the correlation between the
dependent variable and independent variable changes over time. Another major finding of this
thesis was to suggest that the input variables which compose a Mortgage Backed Security (MBS)
have an explanatory power over the foreclosure rates. The period of study in this thesis was a
seven year longitudinal study between the years 2008 and 2014.
Keywords: Mortgage-backed securities, valuation, prepayment model, credit risk model,
interest risk model.
ix
Chapter 1 - Introduction
1.0 Introduction
Securitization in the United States began during the 1970s with US government-sponsored
National Mortgage Association funding programs for residential mortgages, followed by private
financings. Gaining popularity beginning of the 1980s, securitization has become a common
financing tool on the global level (Bakri, 2014). “Securitization of Mortgage-backed securities
(MBS) are debt obligations that represent claims to the cash flows from pools of mortgage loans,
most commonly on residential property. Mortgage loans are purchased from banks, mortgage
companies, and other originators and then assembled into pools by a governmental, quasi-
governmental, or private entity. The entity then issues securities that represent claims on the
principal and interest payments made by borrowers on the loans in the pool, a process known as
securitization.” (Fast Answers,2015).
By 2006, the securitization market has grown to $1.480 trillion of issuance (Ashcraft &
Schuermann,2014). Since the Great Recession in 2007, one of the main culprits widely blamed
for the downturn was the inaccurate valuation of distressed mortgage-backed securities (MBS)
held by major banks and financial institutions. These instruments were also widely held outside
of the United States. This lent even greater impetus to the contagion process. This was reflected
in the extensive write offs by financial institutions at both their own capital level, and also within
investment funds which they managed. These events have been a key contributing factor in calls
for greater supervision of financial institutions and setting of risk appropriate capital
requirements. (Reilly, 2009)
The pricing of such complex structures, structured in various tranches (a piece, portion or
slice of a deal or structured financing. This portion is one of several related securities that are
1
offered at the same time but have different risks, rewards and/or maturities. "Tranche" is the
French word for "slice"), and requiring complex calculations to estimate cash flows assuming
various defaults rates, made it difficult to accurately value them on their own merits after the
collapse of Lehman Brothers. With no active market to trade in the secondary market, the banks
were unable to accurately value the MBS, thus leading to issues with valuation, balance sheet
liabilities, and ultimately the dreaded margin call from counter parties. The losses booked by the
banks forced them to write down capital, while margin calls drained liquidity from the financial
markets. These losses reduced the capacity of the banks to act as purveyors of credit in an
economy already shaken by the collapse of a large swathe of the housing market. The resulting
contraction in credit ultimately caused the US economy to implode in 2007.
DiMartino and Duca (2007) suggest that, “in the early and mid-2000s, high-risk
mortgages became available from lenders who funded mortgages by repackaging them into pools
that were sold to investors. New financial products were used to apportion these risks, with
private-label mortgage-backed securities (PMBS) providing most of the funding of subprime
mortgages. The less vulnerable of these securities were viewed as having low risk either because
they were insured with new financial instruments or because other securities would first absorb
any losses on the underlying mortgages” (DiMartino & Duca, 2007, p. 47). This enabled more
first-time homebuyers to obtain mortgages, and homeownership rose. (Duca, Muellbauer, &
Murphy, 2011)
The resulting demand bid up house prices, more so in areas where housing was in
tight supply. This induced expectations of still more house price gains, further increasing
housing demand and prices (Case, Shiller, & Thompson, 2012). Investors purchasing PMBS
profited at first because rising house prices protected them from losses. When high-risk mortgage
2
borrowers could not make loan payments, they either sold their homes at a gain and paid off their
mortgages, or borrowed more against higher market prices. Because such periods of rising home
prices and expanded mortgage availability were relatively unprecedented, and new mortgage
products’ longer-run sustainability was untested, the riskiness of PMBS was not well-
understood. On a practical level, risk was “off the radar screen” because many gauges of
mortgage loan quality available at the time were based on prime, rather than new, mortgage
products.
While sub-prime lending was not new, two key factors helped precipitate the disaster:
The lack of sufficient estimates on which to base default probabilities, and the assumption that
there could not be a nationwide housing collapse. When house prices peaked, mortgage
refinancing and selling homes became less viable means of settling mortgage debt, and mortgage
loss rates began to rise for lenders and investors. In April 2007, New Century Financial Corp., a
leading subprime mortgage lender, filed for bankruptcy. Shortly thereafter, large numbers of
PMBS and PMBS-backed securities were downgraded to high risk, and several subprime lenders
closed. As the bond funding of subprime mortgages collapsed, lenders stopped making subprime
and other nonprime risky mortgages. This lowered the demand for housing, leading to sliding
house prices that fueled expectations of still more declines, further reducing the demand for
homes. Prices fell to such low levels that it became difficult for troubled borrowers to sell their
homes to fully pay off their mortgages, even if they had provided a sizable down payment (Duca,
2010).
1.1 Purpose of Research
Much research has focused on the valuation of MBS, and these valuation methods have
been reinvented and developed since 2007. The main impetus for such a change came from the
Federal Reserve, pushing the banks and major financial institutions to accurately value their
3
exposure to the MBS portfolio they are holding and meet the capital requirements. To accurately
measure the foreclosure risk, pre-payment risk, value at Risk (VAR); various models have been
developed that value these performing and non-performing MBS Pools. The concept of
performing pools implies that the borrower is paying on time and non-performing implies that
the buyer has missed several payments. The concept of foreclosure risk implies to the risk in a
MBS pool that the borrower will not pay, and default on his loan, leading to a bank foreclosure.
Pre-payment risk is the risk in a MBS pool that a borrower will pay ahead of time, causing the
pool to have lower interest payments thus lower return. VAR is the total risk of a MBS portfolio
that may be at risk given a macro-economic event, such as rise in interest rate. Some institutions
have taken a lead on this and are the industry leaders. One such market leader in this segment is
Blackrock. The company was invited by the Federal Reserve to independently value the banks’
MBS holdings, and also by the Greek government to advise them on their exposure. Their
propriety valuation system is called “Alladin” and is considered the industry standard (Goliath,
2011).
Presently the MBS market is $8.7 trillion, while the total outstanding public and private
bond market is $39.9 trillion in the USA, which include treasuries, MBS, auto loans, credit cards,
etc. (Campbell, 2014). The mortgage-backed security market is crucial to the economy not only
because there are large sums of money involved, but also because it is a very crucial and direct
link to the economy. The consumer accounts for two-thirds of the spending in the United States
economy; therefore, taking out a mortgage is the single biggest investment an average person
makes in his life. This is also important because many borrow against the unrealized capital gain
in their home to finance consumption. The collapse in home prices therefore had a negative
“multiplier” effect on consumption.
4
This trend drives not only the mortgage industry but also various other industries that are
dependent on the housing market such as construction, heating, appliances lumber etc. The
housing market not only creates jobs and consumes resources during the construction, but also
plays an important role in stimulating the economy through numerous associated activities such
as ongoing home maintenance, gardening, repairs and home improvements. Businesses such as
Home Depot depend on these types of activities to survive. Thus, given the far-reaching effect of
the housing sector on the economy, the mortgage-back security industry is critical to the
economy. However, since the bubble burst in 2008, housing is now seen as playing a lesser role
in economic growth. There has been a relatively moderate recovery which, despite record low
interest rates, continues to be hampered by difficult access to credit for non-prime borrowers.
The core logic of the models used within the MBS is based on the cash-flow of the
mortgages. The system works as follows: once a mortgage is issued by a bank it is collected by
the bank and combined in a pool. The pool may range from as little as 25 loans to a few thousand
individual loans. In previous asset backed securitizations there was an assumption of “safety in
number, with over-collateralization seen as a means of ensuring sufficient cash flows for debt
repayment. This may have been the case in the case of MBS (mortgage backed securities) backed
by strong credits. However, as investors discovered, over collateralization cannot compensate for
risks that were not viable at inception. “The process of posting more collateral than is needed to
obtain or secure financing. Overcollateralization is often used as a method of credit enhancement
by lowering the creditor's exposure to default risk.” (Investopedia, 2003) The criteria for forming
the pool are based on various circumstances or investor needs, such as maturity of the pool
which might be a 15 years period or a 30 years period, Another criteria is required return. A 15%
required return will have riskier mortgages where as a 6% required return will have a less risky
5
loans. Once a pool is created, it is given to a rating firm such as Moody’s or S&P. The rating
firm does its due diligence and assigns an investment grade to the pool based on the risk metrics
they have identified. Some risk metrics used by rating companies are loan to value (LTV), i. e
how much is being borrowed vs the value of the property, credit scores etc. The bank either
holds the pool on its own balance sheet or sells it to investors. (Tatom, 2009)
1.2 Problem Definition
The mechanics of mortgage backed securities are based on payments by the individual
borrowers. The borrower makes monthly payments and the servicing firm i.e. the bank, collects
the payments and amortizes the loan with part payment to interest and remaining to principal till
the balance becomes zero. The most common types of mortgages are 30 year fixed rate, and 15
year fixed rate mortgages. The problem with such complex instruments is that they are made of
several moving pieces, such as tranches, which react differently to macro-economic events.
They are also very susceptible to changes in interest rates – which may cause an extension, in the
event that rates should rise, of the original maturity or, in the event that rates fall, prepayments.
In this last case, the best borrowers, who can access refinancing at lower rates shall prepay. This
is tantamount to “adverse selection”. The pool generally tends to be comprised increasingly of
lower rated credits. In the event of a recession and fall in borrower income, this may lead to an
increased foreclosure rate. Thus the way to predict the effect is to use proxies. In the case of
mortgage backed security, the proxies are FICO score, geography of the loan and payment
history. The decision to use a proxy is based on each individual company’s fund manager, and
the model they plan to use, as well as the variables that they seem fit to include or drop from the
model. There is no industry standard established for this. It is essential therefore, that such
proxies are relevant measures. However, there is a paucity of research which aims to establish
whether such proxies are relevant indicators or not (Fabozzi, 1998). Since there is no industry or
6
government standard each firm is left to design a model that it likes based on the proxies the
research team decides to use.
1.3 Research Question
This thesis, therefore, investigated the various methodologies and models that are used to
calculate the value or price of a distressed MBS security. A distressed MBS security is a pool
that has a high rate of foreclosures. A correlation analysis was conducted between the main
inputs into the models (credit score, geography, loan balances and perfect payer percentage) with
the foreclosure rates. Such an analysis enabled us to developing a comprehensive understanding
of whether the input variables into the models had a high explanatory potential or not; and
whether the models were using the correct factors or not. The thesis therefore answered the
research question: What is the nature of correlation between the factors used by the most
common risk models (e.g. FICO score, geography, loan balance and perfect payer) used to
predict foreclosure rate of the pool?
1.4 Conclusion
The primary goal of this thesis was to develop an understanding of the various models
that are used to analyze mortgage backed securities, and identify which model is the optimum
model for financial professionals to use. In order to answer the research question, I conducted an
extensive literature review in Chapter 2. This set the stage of developing an understanding of
what is the existing work in the field, as well as enabling me to identify the numerous research
gaps. In Chapter 3, I then went ahead and identified the research methodology to be followed, as
well as developed the research design that I used to answer the research question. Once the
research methodology was established in Chapter 4, I then enumerated how the data would be
collected and processed. The data was tested in Chapter 5, and as identified in the research
7
methodology, I used numerous techniques to answer the research question. Finally, in Chapter 6,
I summarized the research and put forward my recommendations and conclusion.
8
Chapter 2 - Literature Review
2.1. Introduction
In order to evaluate the relevance of this research topic, I conducted a comprehensive
literature review and approached the ideas from different viewpoints. The objective of this
review was to establish the theoretical background on which I based my thesis. I used existing
scholarly works to first of all identify the various types of risks that effect the valuation of a
mortgage backed security. Once the various types of mortgage backed securities were identified,
I then reviewed the various prepayment and default risk models. I also evaluated the literature on
various interest rate models and risk models. An analysis of the existing work was necessary to
establish the argument that the variables identified by me are variables that are the primary basis
of calculation in all of the models.
2.2 Types Of Risks That Effect the Valuation Of a Mortgage Backed Security
A study done by Dunn and McConnell (1981) on the various methods used by banks to
value the portfolio of mortgage-backed securities on their balance sheet came to the conclusion
that there is no model that can be considered as being superior over the other, as well as can be
considered as the benchmark model that the rest of the industry should follow. However, it is
significant to note that in spite of the usage of more than six different types of models in the
industry, none of the models were able to predict the problem in the industry that triggered the
financial debacle. This fact highlights the limits of credit enhancement capacities of structuring
when dealing with situations, where at inception debt repayments were contingent on asset sales
by the borrowers (Hung & Lin, 2007).
During the 1980’s, MBS were simple in their structure, unlike today, where
computational power and complexity of the MBS structure have greatly increased. Currently
9
with many counter parties involved, tracing the loan’s deal and exposures to off-balance-sheet
entities is almost impossible (Dunn & McConnell, 1981). Mortgage-backed securities are
financial instruments that are backed by the house as the collateral, thus the premise is that the
holder of the note is assured that on default, the payment of the remaining balance will definitely
come through. In reality however, the house value turns out to be in many cases lower than the
residual debt, leading to losses for both the lenders as well as for the bond holders, with regard to
fixed income instruments. The key support is that the cash flows from the borrowers were
secured for payment of capital and interest; this is of little avail when the foreclosure rate
explodes. However, the risk is that the borrower will pay faster. Thus, the investor has to find
another security to invest the money to meet his long-term investment target. In such a scenario,
the investor faces risk of not finding similar investment opportunities, or has to the take
additional risks. The second risk that the investor faces is that the interest rate will change
causing reinvestment risk. Based on this, it can be argued that the main risks that need to be
modeled are the prepayment risk, the interest rate risk, and the default risk (Becketti, 1989).
2.3 Prepayment and Default Risk Models
There are four major factors influencing the prepayment models (Stanton, 1995). The
first is the refinancing incentive, which is the incentive a borrower has when the rates go down
below the current mortgage rate. The second factor is the age of the mortgage, technically called
seasoning. The term seasoning refers to a phenomenon in which a new pool of mortgages pays
the balance faster, both as in prepayments and full payments. This can be attributed to the fact
that, in the new pool, the borrowers find better rates and refinance or move and sell the property.
This can also lead to investors needing to simultaneously contend with lower than expected cash
flows and ‘adverse selection’ as remaining cash flows are contingent on payments by the higher
10
credit risk. These are the borrowers who were not deemed sufficiently creditworthy to access
refinancing and remain in the pool by default (Stanton, 1995).
However, as time goes by this activity becomes lower and somewhat fixed to a low
percentage of the total mortgages. The third factor is the month of the year also known as
seasonality. On an average, mortgages are paid off more often in the summer months than in the
winter months. Fourth factor is also known as premium burnout. As different households have
different cost bases for the mortgages they have taken out, the interest rates for some households
may need to fall further than other households, for the aforementioned households to be
financially profitable to refinance or prepay (Stanton, 1995).
An analysis of the literature suggested that there are three main categories of prepayment
models presently used in the industry:
1. Econometric approach: This model is the projection of cash flow based on
prepayment models that are fine-tuned to historical data (Schwartz & Torous,
1989).
2. Option-based Approaches: These models are built upon projection of cash flow
based on option-based theory and the value of the underlying call options of the
MBS (Stanton,1995).
3. Reduced-form Approaches: These models focus on intensity models as used in
credit risk modelling (Kau, Keenan & Smurov, 2004).
Prepayment risk is the key to determining the MBS value. This was the traditional
assumption posited on an acceptable credit standing of the underlying collateral. With the market
focused on the stronger credits, credit issues were deemed manageable via over-collateralization.
Prepayment risk models can be broken down into two approaches. The first approach is the
11
statistical approach. In this approach statistical tools are used to predict the probability of
prepayment in term of Conditional Prepayment Rate (CPR). The reduced form models are
widely used, as they are highly customizable and depend on the parameters defined by the user.
This flexibility makes reduced form models easy to develop and use by regular users, and do not
require programming or mathematical skills to accurately model the historical data. However,
given recent experience, historical data are not always a good predictor of the future. The model
works well when dealing with historical data, but the forecasting validity of such models is
suspect, as it was evident during the 2007 crisis (Dowing, Stanton, & Wallace, 2003).
Predominantly in the case of mortgage cash flows the unscheduled cash flows result from
prepayments, not from scheduled amortization. Therefore, the choice of an accurate prepayment
factor is the main driver to calculate the liquidity and valuation metrics. There are numerous
sources of commercial or third-party prepayment models. One of the most popular models is the
Bloomberg median estimates. This model is an average of the mortgage rate via a survey of the
research departments of several Wall Street broker/dealers. BondEdge is a tool also on
Bloomberg, widely used as a fixed income portfolio analytics system by many banks and
financial institutions (Bloomberg, 2009). Another model that is popular is the Andrew Davidson
Co. (ADCO) model. (Bloomberg, 2009). This proprietary model is different from the above two
as it provides a loan level detail and is also available via Bloomberg. The third model is the
Applied Financial Technologies (AFT) model. This proprietary model is also available via
Bloomberg and can be used at the loan or MBS level inside several advanced Asset Liability
Management (ALM) models (Fan, Sing, & Ong, 2012).
Aside from the above mentioned statistical approaches, another approach used in the
industry is the mathematical approach. In this process the model is based on mathematical
12
finance and is sub-classed into option-based approach or structural approach, which is predicting
prepayment via credit risk modeling. Majority of the prepayment models are based on multi-
factor regression and/or optimization models using the below mentioned factors (Nakamura,
2011).
2.3.1 Perfect Payer (Refinance Activity). This has a direct effect on the factor perfect
payer, if a loan is refinanced , the old loan is pain in full and thus perfect pay percentage
increases in the pool. Under this factor, the market loan interest rate is lower than the original
term, thus the borrower refinances to a new lower-rate loan. According to Guttentag (2004), “to
repay a loan by taking out another loan, refinancing can allow one to secure a lower interest rate;
for example, one can replace a loan at an 8.5% rate with one at 5.5%. In the case of a balloon
loan, refinancing can repay the principal if one does not have sufficient funds to do it. This
implies that if one has made only interest payments over the life of the loan and has not reduced
the principal amount when the loan comes due, refinancing can prevent bankruptcy. There are
two main drawbacks to refinancing. First, there is no certainty that one will be approved for it.
One thus takes a risk every time one decides to make only interest payments on a loan or
mortgage. Secondly, refinancing generally resets the repayment period; that is, if one refinances
six years into a 10 year loan, the one generally repays the new loan over 10 years instead of the
remaining four” (Guttentag, 2004,p.18).
2.3.2 Perfect Payer (Age of the Mortgage Assets). The industry standard is to use the
Public Securities Association (PSA) approach to ramp up prepayments over the first 30 months
of a mortgage, and then the prepayments are assumed to be stable.
The main rational is that when a pool is new, borrowers who have good credit will move
out sooner when they get good offers, similarly, borrowers who are not credit worthy but
13
somehow got the loan will default, thus during the first 30 months the outliers, both good and
bad borrowers will exit the pool early. But newer models are factoring in other factors such as
buyer laziness or lack of opportunistic behavior even when there are economic advantages of
doing so after the initial year or so have passed.
2.3.3 Loan Balance. It has been observed using historical data that loan balances with
lower balances prepay slower; this is assumed that the borrower has less incentive as the dollar
advantage of a refinance is minimal. “The general loan limits for 2015 are unchanged from 2014
(e.g., $417,000 for a 1-unit property in the continental U.S.) and apply to loans delivered to
Fannie Mae in 2015 (even if originated prior to 1/1/2015).” (Mortgage Refinance Financial
Glossary, 2011).
2.3.4 FICO score. It has been observed that loans with lower FICO scores than the
national average tend to prepay slower, perhaps because they cannot get favorable loan terms
thus the incentive to refinance is not there. The lower prepayment risk was however not
sufficient compensation for the higher repayment risk. Bhardwaj & Sengupta (2011) in their
paper suggested that “FICO score is a simple yet effective measure for evaluating the
performance of credit scoring. As mentioned earlier, the advantage of using such a measure is
twofold. First, it lends itself to both non-parametric and parametric estimation. Second, it
minimizes the impact of situational factors on this measure of credit score performance. Using
this measure, we find that credit score performance is robust to both high and low default
environments. However, evidence suggests that some of the increase in credit scores over the
cohorts can be explained as adjustment for the increased riskiness in other attributes on the
originations. This was particularly true for low levels of credit scores resulting in a sharp
deterioration of credit score performance in terms of our nonparametric measure. Significantly,
14
once we control for other (riskier) attributes in the origination, our parametric credit score
performance shows improvement over the cohorts. This would suggest an over-reliance on credit
scoring not only as a measure of credit risk but also as a means to set risk on other origination
attributes. In part, this reliance led to deterioration in loan performance even though average
credit quality as measured in terms of credit scores actually improved over the year” (Bhardwaj
& Sengupta, 2011).
2.3.5 Geographic. Longstaff (2005) conducted an empirical analysis and observed that
certain parts of the country prepay faster than the others. This is a function of job mobility,
younger demographics, etc. Regardless of the method or model of prepayment estimates, it is
advised to back-test projected prepayments versus actual prepayment. Seasonality, historical data
have shown that mortgages prepay faster during the summer months than during winter months
in most parts of the country (Longstaff, 2005).
Valuing MBS requires that a model takes into consideration both the behavior and the
prepayment of the mortgages in the pool. After the economic crisis of 2008, the renewed focus
on this sector has increase significantly. This has resulted in us developing a better understanding
of MBS, however several challenges remain. “These challenges include the persistence of
model-based MBS pricing errors (option adjusted spread, or OAS), the observed variance in bids
for MBS derivative auctions” (Bernardo & Cornell, 1997).
Another prepayment model approach is the option-based model. Here a no-arbitrage
pricing theory is used but in a discrete time setting. Kariya and Kobayashi (2000), formulated a
framework for pricing a mortgage-backed security (MBS) that predicted the burnout effect based
on a one-factor valuation model. However, this option-based approach implicitly and usually
assumes homogeneous mortgagors. This is a serious short-coming since it is very rare to have a
15
pool of mortgagors that are homogeneous. The mortgagors in an MBS pool are typically
heterogeneous, with different incomes, FICO scores, geographic locations (Ushiyama & Pliska,
2011). This was however not the case with the sub-prime mortgage ABS, which were composed
of largely credit homogenous mortgages. Geographical diversification, if any, brought little
solace to the investors.
2.4 Interest Rate Risk Models
The other major source of uncertainty in MBS valuation is the use of interest rates.
Different models are used to value that segment, thus making the one-factor-model valuation less
accurate. A large decrease in the mortgage rate that follows a decrease in the short-term rate
tends to lower the value of an MBS due to the refinancing activity. On the other hand a decrease
in the short-term rate also has an opposite effect thereby increasing the value of an MBS by
increasing the discount factors. Therefore, it is important to balance the two and incorporate their
separate roles (Ushiyama & Pliska, 2011). In a recent study, Tahani and Li (2011), came to the
conclusion that the interest rate behavior is not Gaussian but Brownian in nature, as evidenced by
the changing volatility of the interest rates (Tahani & Li, 2011). Brownian motion refers to the
motion of gas particles as they move about randomly. Using this concept Vervaat (1979), has
shown that interest rates mimic the random behavior of the gas particles. Thus, financial models
that incorporate the random walk are more accurate. The literature discussed above goes on to
show that there are various approaches that can be adopted to calculate and develop interest rate
models. Some of the major models following the earlier mentioned approached are:
2.4.1 Vasicek model. The Vasicek model is a mathematical model used in finance
predicting how interest rates effect fixed-income valuation, such as that of a mortgage-backed
security. The Vasicek model is a one-factor model where short-term rates are the main driver, as
it contributes interest rate movements as driven by only one source of market risk, which in this
16
model is the short-term interest rate (Vasiçek,1977). The significance of this model is that it was
the first of its kind and subsequent models are based on it.
2.4.2 Cox, Ingersoll and Ross (CIR) Model. The Cox–Ingersoll–Ross model (or CIR
model) is used to model interest rates in the valuation of MBS. The CIR model is a one-factor
model mostly factoring in short-term interest rates, and the interest rate fluctuations are driven by
only one source of market risk. CIR model was introduced in 1985 by John C. Cox, Jonathan E.
Ingersoll and Stephen A. Ross as an extension of the Vasicek model. The extension that this
model added was time-varying functions that replaced the factors and they can be introduced in
the model to make it sync with a set of predetermined term structure and volatility of interest
rates (Cox, Ingersoll, & Ross, 1985).
2.4.3 Black–Derman–Toy Model. The Black–Derman–Toy model (BDT) is a popular
short-rate one-factor model used in the pricing mortgage-backed securities. The short-term rate is
the single most important stochastic factor that determines the predictions of the model. This
model is extremely popular within the industry, and used widely, as it was the first model to
combine the mean reverting behavior of short-term interest rates with lognormal distribution.
This model was developed in-house by Goldman Sachs in the 1980’s by Fischer Black, Emanuel
Derman, and Bill Toy (Black, Derman, & Toy, 1990).
The popularity of this model stems from the fact that it is used by one of the most
influential player in the MBS market. Another salient feature of the BDT model is that it uses a
binomial lattice. The model is calibrated using balance and fit of the volatility of interest rates
caps, and the current yield curve or the interest rates structure. Thus once we have the calculated
or calibrated lattice, then it is easier to value the complex interest-rate sensitive MBS.
17
The model was developed by its originator for a lattice-based environment; however the
model has shown it is the following continuous stochastic differential
equation:
where,
= short-term rate at a given point t
= value of the asset
= short-term rate volatility at a given time t
= Brownian motion under a risk-neutral probability measure
Black, Derman, & Toy (1990).
2.4.4 Ho–Lee Model. The Ho–Lee model was developed in 1986 by Thomas Ho and
Sang Bin Lee (1986). It was the first arbitrage-free model of interest rates. An arbitrage-free
model is a financial engineering model that calculates prices or valuation in such a way that it is
impossible to construct arbitrages between two or more of those prices. Thus, the profit of
buying from one seller and simultaneously selling to another buyer, and making a profit, is not
there.
Under this model, the short rate follows a normal process:
The Ho–Lee model adds values since it is fine-tuned to the market data thus the valuation
is essentially the fair market price. The Ho–Lee model can therefore accurately calculate the
price of the bonds with the market yield curve. The model calculates the yields based on a
binomial lattice based method (Ho & Lee,1986). However, one of the weaknesses of the model is
18
that it does not incorporate mean reversion. Additionally it generates bell-shaped distribution of
rates in the future that makes it unpredictable as with this distribution negative rates are possible.
2.4.5 Hull–White Model. The Hull–White model is a model used to calculate future
interest rates. The Hull–White model is based on the principles of no-arbitrage models, which are
more practical given the present-day interest-rate term structure. The model easily translates the
mathematical description of the future interest rates for a binominal tree; hence derivatives such
as Bermudan swaptions can be valued in the model.
The first Hull–White model is still popular today and was introduced in 1990 by John C.
Hull and Alan White (Hull & White, 2001).
The model is a short-rate model.
There are disagreements among the users about to the exact time-dependent parameters,
but the most commonly accepted hierarchy has
θ and α constant – the Vasicek model
θ has t dependence – the Hull–White model (Hull & White, 2001).
2.4.6 The Black–Karasinski Model. The Black–Karasinski model is used for the
calculation of the term structure of interest rates. This model is also from the family of no-
arbitrage models and uses a one-factor model for predicting interest rate movements influenced
by a single source of randomness. The model is a good fit for today’s market, as in its most
generic form, for the calculation of the call options on the underlying loans of the MBS. The
main driving factor of the model is the short-term rate. The short-term rate is assumed to follow
the following stochastic differential equation (under the risk-neutral measure):
19
In the above equation dWt is a standard Brownian motion. The short-term interest rates
are assumed to be log-normal distribution (Black & Karasinski, 1991).
2.5 Other Models
In addition to the earlier mentioned models, other models used in the industry are as
follows:
2.5.1 Heath–Jarrow–Morton (HJM) Model. Heath–Jarrow–Morton (HJM) model
negates an assumption that is the core of the models above, i.e., no drift estimation is needed.
The HJM model is different from other models as, this model captures the full dynamics of the
entire forward rate curve; whereas the other models incorporating drift only capture dynamics of
one point of the curve, or short rate. HJM frameworks are usually non-Markovian with infinite
dimensions. But recent research has shown that they can be computed in a finite manner, making
it computationally feasible (Heath, Jarrow, & Morton, 1990).
2.5.2 LIBOR Market Model. The LIBOR market model is used for predicting the future
curve of interest rates. In the LIBOR model, the quantities are modeled to get the interest rate
risk, rather than the individual LIBOR forward rates. This method offers a better understanding
of the volatilities that are directly linked to the underlying contracts and can be observed easily in
the market. In the LIBOR model a lognormal process is used to model the individual forward
rate. Black model leads to a Black formula for interest rate caps, that tells us what maximum
value an option can have a , in other words what is the cap on it. The most popular formula is the
Black formula for interest rate caps, this formula is the market standard to quote cap prices in
terms of implied volatilities, hence the term "market model".
The LIBOR market model in a simple explanation is a collection of forward LIBOR of
different forward rates. The LIBOR Market Model (LMM) differs from short-rate models as it
20
uses the lognormal LMM for each forward rate, in that it evolves a set of discrete forward rates.
Specifically,
where
dW is an N-dimensional geometric Brownian motion with
The LMM relates the drifts of the forward rates based on no-arbitrage arguments.
Specifically, under the Spot LIBOR measure, the drifts are expressed as the following:
Nekrasov, (n.d.).
Definitions
2.6 Subprime and Prime Mortgages
“The main difference between prime and subprime mortgages lies in the risk profile of
the borrower; subprime mortgages are offered to higher-risk borrowers. Specifically, lenders
differentiate among mortgage applicants by using loan risk grades based on their past mortgage
or rent payment behaviors, previous bankruptcy filings, debt-to income (DTI) ratios, and the
level of documentation provided by the applicants to verify income. Next, lenders determine the
price of a mortgage in a given risk grade based on the borrower’s credit risk score, e.g., the Fair,
Isaac, and Company (FICO) score, and the size of the down payment.” (Agarwal & Ho, 2007).
“Subprime loans, which are loans to borrowers with relatively low credit scores and
records of poor credit performance or little credit experience, have become an increasing share of
all mortgages in this decade and currently make up about 13 percent of such loans. In 2000 and
earlier, subprime loans were negligible. Other higher risk mortgages today include credit
21
extended by the Federal Housing Administration (FHA) and so-called "alt-?" loans, which are
loans to borrowers usually with prime credit scores, but who do not provide any documentation
("no-doc") of income or wealth or ability to service pay the loan, or very little documentation
("low-doc"). They have been reported to constitute over 10 percent of all mortgages. When all
three categories are added together, nearly 30 percent of loans outstanding are estimated to be in
the high-risk category. Subprime loans have foreclosure rates that are much higher than that for
prime loans” (Tatom, 2009).
2.7 Foreclosure Process
“Foreclosure processes are different in every state. Differences among states range from
the notices that must be posted or mailed, redemption periods, and the scheduling and notices
issued regarding the auctioning of the property. In general, mortgage companies start foreclosure
processes about 3-6 months after the first missed mortgage payment. Late fees are charged after
10-15 days; however, most mortgage companies recognize that homeowners may be facing
short-term financial hardships. It is extremely important that you stay in contact with your lender
within the first month after missing a payment. After 30 days, the borrower is in default, and the
foreclosure processes begin to accelerate. If you do not call the bank and ignore the calls of your
lender, then the foreclosure process will begin much earlier.
Three types of foreclosures may be initiated at this time: judicial, power of sale and strict
foreclosure. All types of foreclosure require public notices to be issued and all parties to be
notified regarding the proceedings. Once properties are sold through an auction, families have a
small amount of time to find a new place to live and move out before the sheriff issues an
eviction notice.
22
2.7.1 Judicial Foreclosure. All states allow this type of foreclosure, and some require it.
The lender files suit with the judicial system, and the borrower will receive a note in the mail
demanding payment. The borrower then has only 30 days to respond with a payment in order to
avoid foreclosure. If a payment is not made after a certain time period, the mortgage property is
then sold through an auction to the highest bidder, carried out by a local court or sheriff's office
(Foreclosure Process/U.S. Department of Housing and Urban Development, 2015).
2.7.2 Power of Sale. This type of foreclosure, also known as statutory foreclosure, is
allowed by many states if the mortgage includes a power of sale clause. After a homeowner has
defaulted on mortgage payments, the lender sends out notices demanding payments. Once an
established waiting period has passed, the mortgage company, rather than local courts or sheriff's
office, carries out a public auction. Non-judicial foreclosure auctions are often more expedient,
though they may be subject to judicial review to ensure the legality of the proceedings
(Foreclosure Process/U.S. Department of Housing and Urban Development, 2015).
2.7.3 Strict Foreclosure. A small number of states allow this type of foreclosure. In strict
foreclosure proceedings, the lender files a lawsuit on the homeowner that has defaulted. If the
borrower cannot pay the mortgage within a specific timeline ordered by the court, the property
goes directly back to the mortgage holder. Generally, strict foreclosures take place only when the
debt amount is greater than the value of the property” (Foreclosure Process/U.S. Department of
Housing and Urban Development, 2015).
2.8 Conclusion
In conclusion, the literature indicated that there are several models that have been
developed over the years for the purpose of valuation of mortgage-backed securities. Each model
is valuable and correct in its methodology as shown by its authors. However, different
23
circumstances and priorities make one model better than the other. There is no one model that is
the industry standard and superior to the other. However, each of the models relies on input
factors that are similar. The major factors are FICO score, loan balances, geography of the loans,
perfect payee. In the following sections, I used the variables identified by the literature, as a part
of my model, and developed an understanding of the level of efficiency of each of the models
identified above, in understanding Mortgage Backed Securities.
24
Chapter 3 - Research Methodology And Design
3.1 Introduction
In the section related to research methodology and design, I clearly identified the
research question, and enumerated on the research design, adopted by me to test my research
question. I also identified the research methodology as well as the various variables that I used to
test the research question.
3.2 Research Question and Hypothesis
The primary goal of this thesis was to investigate the various methodologies and models
that are used to calculate the value or price of a distressed MBS security, and conduct a
correlation analysis between the main inputs into the models (FICO score, geography, loan
balances) with the foreclosure rates. Such an analysis enabled us to understand whether the input
variables into the models have a high explanatory potential or not; and whether the models are
using the correct factors or not. The thesis therefore answered the following research question,
R.Q.: Does a correlation exist between the foreclosure rate of the pool and the factors
used by the most common risk models used to predict foreclosure rates?
In order to answer this research question, I used existing literature to develop the
following hypotheses:
H0: There exists no correlation between the foreclosure rate of the pool, and variables such as
Credit Score, Perfect Payer, Balance and Geography; which are used to predict foreclosure rates.
H1: There exists a correlation between the foreclosure rate of the pool, and variables such as
Credit Score, Perfect Payer, Balance and Geography; which are used to predict foreclosure rates.
25
3.3 Relevance of Topic
This thesis is intended to benefit the mortgage-backed security professionals, bank and
valuation experts, who utilize numerous methods to value their risk or portfolio on a daily basis,
without the knowledge whether one methodology is superior to other. As there are numerous
models, and all of them have a sound logical and mathematical basis, there is no one model that
may be considered superior in all instances compared to others. The thesis, therefore, provided a
brief introduction to various models used to value MBS, and established a correlation between
the main inputs that drive the model, and the foreclosure rate. Additionally, this thesis provided
recent graduates entering into the MBS structure finance field with a summary of the valuation
methods, and a reference of how valuation is done for such products.
Table 3-1. Models Used in the Industry and Their Component Variables
Factor used FICO Geography Loan balance Perfect payer
Vasicek model   
Cox, Ingersoll and Ross
(CIR) model
  
Black–Derman–Toy model    
Ho–Lee model    
Black–Karasinski (B-K)
model
   
Heath–Jarrow–Morton
(HJM) model
   
LIBOR market model    
26
The thesis did not try to analyze the internal logic of all the models mentioned (Table 3-
1), as this stream of research has been extensively studied by numerous scholars (Vasiçek,1977;
Black, Derman, &Toy,1990).
3.4 Research Methodology
To answer the research question enumerated above, a quantitative method approach was
adopted using a longitudinal study over seven years from 2008 to 2014 focusing on the factors
that are used in the model. The use of a longitudinal study as a methodology, instead of surveys
and interviews, was adopted as this approach is more robust and prevents individual biased from
impacting the final result. For example, if a survey of finance professionals was conducted on the
correlation between FICO score and foreclosure rate, then the results would have a personal bias
experience component. This would result in the data not being homogeneous, and call into
question the validity of the data. Similarly, if a regression analysis were based on survey, the
results would be skewed. In the case of interviews, the same issues would persist making the
analysis unreliable.
The longitudinal study used here evaluated four input variables. These factors are FICO
score, loan balance, geography of the location of the house that make up the pool of loan in the
mortgage backed security, and the payment history of the borrowers. These variables were
evaluated over a 7-year period from 2008 to 2014 to identify whether there is a significant
correlation between the factors and foreclosure rate. The major emphasis of the thesis was to
identify the correlation between the dependent variable and the independent variables. The
nature of the correlation, i.e., if it is positive or negative, is outside the scope of the study. The
main reason a period between the years 2008 and 2014 was chosen was, because it was during
this time frame that the largest collapse in the housing market in the history or modern
economics took place. The epic-center of such a housing crisis was also based in the mortgage
27
backed securities market, making it the ideal time frame to study (Goliath, 2011). The rational
for picking the period was that if we were to see the nature of relationship between the dependent
variable and the four independent variables we have chosen for the study. Then the period of
extreme change and foreclosures and recovery would be better than using a period where there is
little change in the macro economic situation of the country in general and the borrowers in
particular.
The foreclosure rate was the metric chosen as a benchmark, rather than price, as default is
a major risk event in a mortgage-backed security. It would make price meaningless if there is
going to be no future cash flow. With the longitudinal survey, the study investigated if there is a
correlation between the main factors the models use, and the foreclosure rate. The reason
correlation with the foreclosure rate and input factors as a method was chosen, because the
greater the correlation the higher the reliability level of the models.
3.5 Research Design: Variables Identified
The method used in the thesis was a longitudinal study using a 7-year time frame. The
main independent variables are FICO score, geography of the loans in the pool, loan balances,
and perfect payers (see Table 1). The dependent variable is the foreclosure rate. It is defined as
the mortgage foreclosure rate: the dollar value of 1–4 family mortgages that are delinquent by 30
days or more or are in foreclosure, divided by the dollar value of all 1–4 family mortgages. For
example, if we assume there are 100 loans in the pool of a mortgage back security, of the 100
loans, 10 are not making payments for over 30 days and five are in foreclosure. Then the
foreclosure rate for this pool is 15%. In the research design, it is important to not only identify
the constructs, but also to clearly explain them so that when I test my research question, I am
able to ensure robustness. The independent variables or constructs are as follows:
28
3.5.1 FICO Score. FICO score is a credit score based on a mathematical formula using
payment history, debt balance, and length of credit history, types of credit used, and recent
inquiries. The score ranges from 300 to 850. The score is used by mortgage lender to access the
borrowers’ credit worthiness and risk. ‘A FICO Score is a three-digit number calculated from the
credit information a credit report. Lenders use these scores to estimate their credit risk, which is,
how likely is the borrower to pay his credit obligations as agreed. A FICO Score assesses the
information in a borrowers credit report at a particular point in time. It helps lenders evaluate
credit risk reliably, objectively, and quickly. And it helps the borrower obtain credit based on his
actual borrowing and repayment history, filtering out extraneous details such as race or religion’
(What’s in My FICO Score. 2014).
3.5.2 Geography of Loan. This metric classifies where the collateral/home is located.
Since a mortgage-backed security has a large number of loans, they tend to be a mixture from all
over the country. For example, some pools are only from one state such as Florida; while some
are a mixture of various states such as California, New York and Michigan. “Geographic
diversification does not guarantee diversification in housing market returns. To the extent that
the housing market is associated with the probability of loan default, a relevant measure of loan
diversification is the correlation between the returns in housing markets of the loan collateral. As
an illustration of this point consider that despite the geographic distance, returns on a California
house price index have a correlation coefficient of 0.87 with returns on an index measuring
house price returns in Washington DC. We construct a Herfindahl index of the geographic
concentration in each deal as follows. For each deal, we calculate the percentage of the deal
principal that is concentrated in each of the 50 states, plus Washington, DC. The deal-level
29
Herfindahl index is then calculated as the sum of the squared weights” (Nadauld,& Sherlund,
2009).
3.5.3 Loan Balances. This metric provides the unpaid balance (UPB) information. The
UPB is defined as the amount owned by the borrower on the loan. The loan balance is not a fixed
amount. The original balance is reduced as payments are made based on an amortization
schedule. The payments are applied to both interest and principal and over time the balance
becomes zero. Thus the higher the balance the greater the effect the loan has on the pool as the
default of a high-balance loan will have more impact on the pool than a low-balance loan. This
behavior of this variable makes it an ideal candidate to be included in the research as one of the
factors that helps us identify the level of correlation with the foreclosure rate.
3.5.4 Perfect payers. This metric reflects the percentage of people in the pool who have
been paying on time over a period of time. There are several sub categories within this. The first
sub-category is the 24 month payer, which is comprised of individuals who have not missed a
payment in the past 24 months. The second sub-category is the 60 month perfect payer, which is
comprised of borrowers who have paid on time for the past 60 months. The category we are
benchmarking in this study is the perfect payer, which is individuals who have not missed a
single payment at all.
3.6 Conclusion
Based on the literature review in Chapter 2, we were able to identify the research
question, and how it fits within the overall literature on the topic. This chapter took the study
further and clearly identified the research question, the research design, as well as the dependent
and independent variables. The next chapter identified how the data was collected and analyzed.
30
Chapter 4: Data Collection
4.1 Introduction
The data collected in this study was a time series date. Most of the data covered a 10-year
period of mortgage-backed securities issued before 2004, and was active with payments being
made by the underlying cash flow. To add diversity to the data, the study also included some
mortgage-backed securities that either collapsed or were paid out. The main data source was the
Bloomberg terminal that was accessed through the NYU library. Specifically, using the
Bloomberg fixed-income section, and sorting for active pools of mortgage-backed securities
issued before 2004, the database collected pools of data that includes MBS with both active
payments, as well as mortgage-backed securities that either collapsed or were paid out. The
database consisted of a total of 1000 such unique securities. For the study we choose 10 pools
that had more than 1,000 individual mortgages inside then, so the N in the regression analysis
would be large and no outlier would have minimal effect on the results. The pools have not been
modified and the regression analysis was performed on the original data. These pools are
existing pools and have been chosen to give a diverse representation. The pools were chosen
from different banks and not from one single bank. Additionally, care was taken that the pools
represented different geography and cont concentrated from one area, such as New York or
Texas.
4.2 Database Description: Population and Sample
The population of the data set comprised of all mortgage-backed securities issued in the
United States of America prior to 2004. The mortgage-backed securities range from FICO scores
of 400 to 800, geographically they represent all 50 states, and have loan balances ranging from
$10 million to $2 billion. The total population of such loans is over 1,000. The sample that I
31
chose from this population is around 300. These randomly chosen MBS have a diverse loan
balance, FICO score, and geography and payment histories. The choice of a sample size of 300 is
apt, as it is sufficient to resolve any issues arising from missing data or presence of outliers. The
sample is large enough to resolve any self-selection biases and other statistical errors that are
commonly observed in datasets with a low sample sizes. The method of identifying samples is
based on the criteria defined earlier in the study. Once I was able to identify the population set, I
randomly picked 300 pools that have all the four independent variables. This pool of 300
securities, of different vintages and characteristics, was subjected to statistical tests to answer the
research question. Results of 10 pools from the 300 tested are discussed in detail. The reason we
discussed 10 pools in detail was we wanted to elaborate the relationship how the dependent
variable is affected by the independent variable. We discussed the macro economic, political,
business environment impacting the variables in details so the reader can understand in details
the relationship between the independent variable and dependent variables.
At this stage it is important to point out that the loans chosen consist of prime
loans and, not sub-prime loans, but the prime loans are distressed given the rescission in the
chosen period of 2008 to 2014. The aim of the thesis was to test the relationship of the input
factor to the foreclosure rate, thus having sub-prime mortgages might have had given us
unreliable data. The study of MBS in the sub-prime mortgage is the subject of another study, but
beyond the scope of this thesis. The primary reason why the sub-prime mortgage was because
they are biased towards default, especially given the credit and income profile of the borrower,
and also not geographically dispersed. “Subprime originations appear to be heavily concentrated
in fast-growing parts of the country with considerable new construction, such as Florida,
California, Nevada, and the Washington DC area. Subprime loans were also heavily
32
concentrated in zip codes with more residents in the moderate credit score category and more
black and Hispanic residents. Areas with lower income and higher unemployment had more
subprime lending.” (Mayer & Pence, 2008). The economic recovery since 2009 has been unlike
any other, and this slow recovery has caused wages to be depressed. Additionally, there is the
concept of shadow unemployment, where people are working part-time or working jobs they are
more qualified for. This may be causing substantial drift in the credit standing of loans
previously deemed non sub-prime. This trend may be further accentuated by continuing
restrictions on credit to non- perfect credit score borrowers.
Thus, having sub-prime pools might have had resulted in the identification of spurious
result between the foreclosure rate and the input variables. The major drawback would be a self-
selection bias, i.e. bad pools being tested for default. My objective is to keep the data as close to
the source as possible without having to amend it. Having sub-prime pools would require us to
adjust for locations, or greater default, to normalize it with other pools. In short the pool of data
would not be homogeneous, and therefore not comparable.
The data collection design mostly leveraged the resources of the Bloomberg fixed-
income section available at NYU libraries, to gain access to the data. Sample securities were
identified using the criteria enumerated above, and then saved in an Excel format. The data
analysis plan adopted a two-step methodology. The first step was sorting the securities in Excel
based on the year of maturity, FICO score average, loan balance average, and average payment
history. Since these are our independent variables, such a process helped weed out securities that
were too similar. For, example, if two pools were made by a bank using similar mortgages from
a common larger pool, then we can see that the pools are too similar and we would take only one
pool for our study not the other. This would help us minimize any biases .The second step of the
33
analysis involved conducting a regression analysis using each independent variable, and the
dependent variable as the foreclosure rate.
The focus of the study was to find the level of dependence between the independent
variables and the dependent variable. For example, does a low FICO score pool result in a high
default among borrowers, or does a high FICO score pool have a high foreclosure rate too? Thus,
using correlation and regression analysis through the SPSS software version 23 was a perfect
tool to identify such a relationship.
4.3 Database Description: Reliability and Validity
According to Morse and Davidshofer (2005), “Joppe (2000) defines reliability as: The
extent to which results are consistent over time and an accurate representation of the total
population under study is referred to as reliability and if the results of a study can be reproduced
under a similar methodology, then the research instrument is considered to be reliable.” Validity
is defined as the statistical measure that a writer employees to show that the test in this case the
regression analysis is measuring what the test intends to measure (Murphy & Davidshofer,
2005). There are numerous methods to measure validity. For this study, we adopted the concept
of construct validity; which is a measure of how well observed relationships between test
constructs, match those predicted by some theory (Cronbach & Meehl, 1955). According to
Morse and Davidshofer (2005), “Kirk and Miller (1986) identify three types of reliability
referred to in quantitative research, which relate to: (1) the degree to which a measurement,
given repeatedly, remains the same (2) the stability of a measurement over time; and (3) the
similarity of measurements within a given time period.” Charles (1995) adheres to the notions
that consistency with which questionnaire [test] item are answered or individual’s scores remain
relatively the same can be determined through the test-retest method at two different times. This
34
attribute of the instrument is actually referred to as stability. If we are dealing with a stable
measure, then the results should be similar. A high degree of stability indicates a high degree of
reliability, which means the results are repeatable” (Morse & Davidshofer, 2005)
To establish a relationship between the input factors and foreclosure rate, it is important
to use the correct data from a non-biased source. The study, therefore, used data available in the
public domain. By public domain, we mean data that is available academically, that is not
proprietary, and data that is not cleaned to remove any markers or identifiers. The rational for
using such a data is to keep the study transparent, and remove any data biases that maybe
inherent in proprietary data. Such a data source allows for easy replicability of my conclusions,
adding to further robustness of the research method. If we were to primarily use only proprietary
data, such as data from Goldman Sachs, then there is a possibility that the data may be biased.
This is true for the data made available by Goldman Sachs, as it tends to be biased towards high
FICO scores, primarily because they refuse to deal in loans that have a high foreclosure rate.
Using proprietary data will also defeat the purpose of the study, which was to establish a
relationship between both high and low FICO score and foreclosure rate. The study therefore
incorporated pools that were diverse and structurally different from each other. The criteria for
different sources was based on the independent variables that the study has identified for the
study, i.e. the FICO score, the location of the mortgages, loan balances and payment history.
According to Morse et al. (2004); “Joppe, (2000) detects a problem with the test-retest
method which can make the instrument, to ascertain degree, unreliable. She explains that test-
retest method may sensitize the respondent to the subject matter, and hence influence the
responses given. We cannot be sure that there was no change in extraneous influences such as an
attitude change that has occurred. This could lead to a difference in the responses provided”.
35
Similarly, Crocker and Algina (1986) noted that when a respondent answers a set of test items,
the score obtained represents only a limited sample of behavior. As a result, the scores may
change due to some characteristic of the respondent, which may lead to errors of measurement.
These kinds of errors will reduce the accuracy and consistency of the instrument and the test
scores. Hence, it is the researchers’ responsibility to assure high consistency and accuracy of the
tests and scores. Crocker and Algina (1986) suggested that, "test developers have a responsibility
of demonstrating the reliability of scores from their tests." Morse et al. (2004). Thus, after
evaluating several sources of data, the dataset from Bloomberg was identified as the most viable
source.
The decision to use Bloomberg data source, over others available sources, was guided by
numerous factors. The first reason to use the Bloomberg data is the ease of availability. It is
available for NYU student and faculty via Bloomberg terminal. Secondly, Bloomberg has data
on mortgage-backed securities for almost all securities that were issued by almost all parties
dating back to the 1970s. This is important for the study because we conducted a longitudinal
study over a 10 year period, and used pools issued by several banks. The rational in evaluating
data spread over a seven year period was to lower the possibility of a bad year or cyclical
macroeconomic events having a bias effect on the data. For example, interest rates, employment
level etc. may have an impact on the pool for a particular given year, but by using a 10 year
period, we minimized the effect of business cycle on the data. We need to however bear in mind,
that the last 10 years have not been representative of the business cycles since 1945.
The third reason for using Bloomberg data was because it was used by a majority of
finance firms, and is considered as the industry standard. Therefore, the data made available
through the Bloomberg terminal on mortgage-backed pools is current. Bloomberg has access to
36
almost all the major issuers of mortgage backed securities. This is evidenced by the fact that my
initial search resulted in over a 1,000 such pools with different FICO scores, loan balances,
payment history. Therefore, the breadth of data available helps us address and test input factors
that is hardest to decipher, i.e. geography. Usually a mortgage-backed issuance is heavily loaded
with mortgages from one region. Thus using different pools of mortgage-backed securities
enabled us to ensure that there existed a valid relationship between the geography of the loans in
the pool, and foreclosure rate. For example, if we were to use a pool that has mortgages
originating from mostly Florida, then our regression analysis would show a spurious relationship
between geography and foreclosure rates. Therefore, it was necessary to have a dataset that is
geographically dispersed to ensure that the correlations identified are meaningful and valid.
According to Moorse (2004), “the traditional criteria for validity, finds their roots in a
positivist tradition, and to an extent, positivism has been defined by a systematic theory of
validity. Within the positivist terminology, validity resided amongst, and was the result and
culmination of other empirical conceptions: universal laws, evidence, objectivity, truth, actuality,
deduction, reason, fact and mathematical data to name just a few” ( page?)
Based on the above developed arguments, the study adopted regression analysis
techniques, specifically the R2
functionality in SPSS, to develop a reliable parameter and show
that the independent variables have an effect on the dependent variable, and it is not a random
correlation or a relation.
A common tendency for quantitative researchers is to focus on the tangible outcomes of
the research , a single figure or a number to explain the research question, rather than
demonstrating what verification strategies were used in the research. According to Morse et al.
(2004), “While strategies of trustworthiness may be useful in attempting to evaluate rigor, they
37
do not in themselves ensure rigor. While standards are useful for evaluating relevance and utility,
they do not in themselves ensure that the research will be relevant and useful.” Therefore, it is
time to reconsider the importance of verification strategies used by the researcher in the process
of inquiry so that reliability and validity are actively attained, rather than proclaimed by external
reviewers on the completion of the project.
“These strategies to include rigor include investigator responsiveness, methodological
coherence, theoretical sampling and sampling adequacy, an active analytic stance, and saturation.
These strategies, when used appropriately, force the researcher to correct both the direction of
the analysis and the development of the study as necessary, thus ensuring reliability and validity
of the completed project” Morse et al. (2004).
Based on the above guidelines, it can be argued that the research methodology of
adopting a longitudinal study over a seven year period, with data available from a third party
(Bloomberg), is the correct manner in which the research question can be empirically tested and
verified. Therefore, the method of testing adopted in the thesis was a combination of regression
analysis and interpretation of result tables. This was accomplished by using the SPSS/excels
software. The research design therefore enumerated by me, and the research methodology
identified, helped make the study parsimonious, verifiable and reliable; as they meet all the
criteria discussed by Morse necessary in any research.
4.4 Conclusion
The main objective of this thesis was to identify the relationship between the four input
factors and the foreclosure rate, quantifying it by adopting regression analysis. Additionally, the
thesis attempted to provide an economic/business analysis between the main inputs into used in
various models (FICO score, geography, loan balances), and its relationship with foreclosure
rates. Such an analysis enabled us to understand whether the input variables into the models have
38
a high explanatory potential or not, and whether the models are using the correct factors or not.
The thesis therefore answered the question, what is the nature of correlation between the factors
used by the most common risk models (e.g. FICO score, geography, loan balance, etc.) used to
predict foreclosure rates and the foreclosure rate of the pool? The research design and
methodology enumerated earlier in the chapter played an important role in helping us identify the
road map to be followed in order to answer the research question.
39
Chapter 5 - Results and Analysis
5.1 Introduction
In the earlier sections, not only has the research question been clearly identified, but also
it was highlighted how the question fits within the overall literature on the topic of MBS.
Subsequently, the research methodology was identified, and the research design was enumerated.
Following the research methodology identified in chapter 4, the data was collected, and a time
series database was developed. Both regression analysis and ANOVA tools were used to
deconstruct the data and develop a better understanding of how the data answers the research
question. This chapter deals with the analysis of the data, the subsequent interpretation of the
data, and how it helped answer the research question.
5.2 Data Analysis and Interpretation
The regression analysis presented in the tables below summaries the findings. In the
tables, the number that really need to be paid attention to is the pool number on the top left hand
corner. This is the pool number that can be used to identify a specific mortgage pool and it can
also be used as a reference on Bloomberg to identify the source data. The dependent variable,
foreclosure percent is for reference only. The variable gives us the average foreclosure rate for
the pool for the year. For example, if there were 10,000 mortgages in the pool and 10 of them
went into foreclosure, then the foreclosure rate for the year will be 1%. The primary reason why
foreclosure rate was used as a variable is because it acts as a reference, so that we can observe
the degree to which the rates have changed over the seven year study period. Such an analysis
helps to ground the R2
results and gives us a reference point. The literature below in this section
suggested that any value less than 25% for R2
would not be significant, and that is the parameter
used in the entire body of research. That being said, Table 1 gives us a good reference with
respect to R2
and its explanatory power. Table 5-1 is a representation of what percentage of R2
40
represents its explanatory power in predicting or explaining one standard deviation. Based on the
data in table 5-1, in this thesis we benchmarked R2
value above 25% as being statistically
significant.
Table 5-1. R2 Explanatory Power
41
Based on the results of the regression analysis, presented in the discussion below, it was
observed that that there is a trend where foreclosure rate, the dependent variable; and the four
independent variables have a relationship. As the R2
is the main measure, it has been used to
determine the model fit in percentage terms, as well as whether the independent variables have
an influence on the dependent variables or not. The results showed that there was no constant
relationship between the dependent variable and the independent variable, but more of a dynamic
relationship between the foreclosure rate and the four independent variables. Additionally, the
input variables had a significant explanatory power over the foreclosure rate.
If we evaluate the data in Table 5-2, we look at mortgage backed security pool BCAP
2007-AA2 22A1, in the year 2008. The results show that for this MBS pool, credit score could
explain only 2.6 % of the total variance for the dependent variable. However, in 2012 credit
score accounted for 53.96% and 59.70% of variance in 2014 respectively. This is significant and
above our threshold. Table 5-2. Statistical Analysis for pool BCAP 2007-AA2 22A1
Pool:BCAP2007-AA222A1
2008 2009 2010 2011 2012 2013 2014
Dependentvariable:Foreclosure %
Foreclosure%averagefor
theyearforreference 0.52% 4.29% 4.30% 13.36% 9.41% 6.29% 6.52%
R²values
Independentvariable:1 Creditscore 2.60% 3.52% 20.73% 28.00% 53.96% 0.60% 59.70%
Independentvariable:2 PerfectPayer% 32.10% 3.87% 6.90% 23.85% 9.20% 0.97% 59.09%
Independentvariable:3 Balance<417k 6.54% 3.58% 6.90% 23.85% 9.20% 0.97% 59.09%
Independentvariable:4 Geo<50%ofthepool 30.94% 1.05% 30.37% 35.38% 54.42% 11.61% 50.99%
Regressionanalysisusing95%confidenceinterval
42
BCAP here mean the name of the bank that produced the pool, BCAP is the code for
Barclay’s Capital, BOAA is the code for Bank of America. The next row of foreclosure % , this
row tells us of the total mortgages in the pool how many are in foreclosure, this is not a R2
, but a
simple percentage. For example if there are 10,000 loans and 1,000 are in foreclosure then the
foreclosure % is 10%. This is included here to give the reader a sense how the pool is behaving,
higher foreclosure rate means pool is not performing and mortgages are failing. Since this is a
master thesis and not a PHD study, we are limited in the scope and breathe of the research and
presentation we can perform and display. Thus displaying all the variables is not feasible or
prudent. The next row gives us the R2
by year between the four independent variables and the
dependent variable.
What a 53.96% R2
implies is that, credit score accounted for at least 29% of the standard
deviation, and hence credit score can account for 29% of the 9.41% of foreclosure rate.
If we look at the average foreclosure rate, in 2008 it was 0.52% of the pool and by 2014
the foreclosure rate increased to 6.52%. There is an argument that over time, the good mortgages
leave the pool and only bad mortgages are left in the pool, so that the foreclosure rate
automatically increases. We agree with the statement in theory, but this cannot explain the
sudden spikes in foreclosure rate. An analysis of the year 2009 and 2008 shows that, the
foreclosure rate jumped to 4.29% in 2009 from 0.52% in 2008. Post this, in the year 2013, the
foreclosure rate dropped to 9.41% from 13.36% in 2012. As R2
is a measure of the explained
variance, it can be argued that good borrowers exited the pool as they found better rates, or
moved and sold their houses, and hence the resulting changes in the foreclosure rate.
Pool BCAP 2007-AA2 22A1; gives us a good example of how there is a linear
relationship between an increase in foreclosure rates, and credit scores. The pool highlights how
43
as the foreclosure rate increases, the explanatory power of credit score also increases. In the year
2013, all four independent variables fail to have any explanatory power, this tells us that there
was some factor outside the four we tested that had an impact on the foreclosure rate. The
foreclosure rate declined dramatically, from 9.41% to 6.29%, and all four independent variable
were below the 25% threshold. Based on the analysis of pool BCAP 2007-AA2 22A1, it can be
concluded that relationship between the independent variables and the dependent variables is
dynamic, and the input variables have an explanatory power over the dependent variable
majority of the time.
If we were to interpret the results for its significance in a business decision environment,
based on the regression analysis, we can argue that in the case of pool BCAP 2007-AA2 22A1;
for the year 2008, perfect payer and geography of the loans were the main factor in the
foreclosure rate. During this time period the average foreclosure rate was 0.52% of the total pool.
The results show that perfect payer and geography of the loans taken together highlight that fact
that, as the borrower’s current income situation in a particular region took a negative turn, there
was a negative impact on the cash flow of the borrower, giving rise to the increased foreclosure
rates. However, in 2013, geography was the most relevant but not the main factor, suggesting
that one particular state was having a macroeconomic issue that was causing the foreclosure rate
to spike. Alternatively, in 2014, all four factors were equally at play. Although the models
discussed above are used to come up with a probability of default and not for pinpointing the
exact macro environment behind the cause. However, it is important to see if the input variables
R2
results are telling a story that has a logical backing and are not just mathematically related.
An evaluation of Table 5-3 (pool: BOAA 2005-1 2A1), showed that in 2008, perfect
payer accounted for 49.94% of the variance in the dependent variable, and 84.05% in 2010. This
44
influence on the variance in the dependent variable decreased to 30.24% in 2011, and maintained
a downward trajectory reaching 7.77% in 2013 and 6.56% in 2014 respectively. In this pool,
perfect payer has the best explanatory power from 2008 to 2011. However, after 2012 even
though the foreclosure rate was high, the perfect pay variables’ ability to explain the variation
diminished significantly. The perfect payer is a variable that measures the borrower’s current
cash flow situation. Thus, if the borrower has sufficient income or saving, he or she can meet the
monthly mortgage payment. However, recent research has shown that paying mortgages first is
no longer a priority. “If we've learned one thing from the housing downturn, it's that making the
monthly mortgage payment is no longer a sacred concept in many American households. In
recent years, when facing financial pressure, homeowners have been more likely to let the
mortgage slide before they would fall behind on their credit card bills, researchers have found.
But it turns out that the mortgage is even less sacred than we thought: When times are tight,
consumers put paying for their cars first. Then the credit cards will be paid. The once-mighty
mortgage has slipped to No. 3” (Umberger, 2012page?).
Table 5-3. Statistical Analysis for pool BOAA 2005-1 2A1
45
Pool:BOAA2005-12A1
2008 2009 2010 2011 2012 2013 2014
Dependentvariable:Foreclosure %
Foreclosure%averagefor
theyearforreference 0.07% 0.69% 3.87% 4.14% 4.95% 2.54% 5.49%
R²values
Independentvariable:1 Creditscore 22.96% 63.12% 0.03% 1.13% 1.77% 0.95% 10.38%
Independentvariable:2 PerfectPayer% 49.94% 39.88% 84.05% 30.24% 1.70% 7.77% 6.56%
Independentvariable:3 Balance<417k 16.71% 16.05% 7.31% 5.54% 5.71% 13.12% 9.90%
Independentvariable:4 Geo<50%ofthepool 0.53% 61.01% 17.86% 17.37% 10.45% 0.11% 10.05%
Regressionanalysisusing95%confidenceinterval
Taking a look at the data for the pool on a monthly basis we observed that in 2008 the
prefect payer percentage was 98.65%. This implies that 98.65% of the loans in the pool were
being paid on time, and the borrower had not missed a single payment. The total loans in the
pool for February 2008 were 29,654. By February 2012, the year, when perfect payer starting to
lose it significance in explaining the foreclosure rate, the perfect payer percentage dropped to
69.46%, and the number of loans in the pool also decreased to 15,940. Subsequently, by
February 2014, the perfect payer percentage of the pool had gradually decreased to 59.34%, and
the number of loans in the pool had also decreased to 7,834. Given the fact that the pool shrunk
in size from 29,654 to 7,834 between 2008 and 2014, and the foreclosure rate was below 5%
during this time period, it can be argued that foreclosure was not the main reason for the decrease
in the pool. It could be one of the other factor that cause a tremendous runoff in the number of
loans that have disappeared from the pool, and not via the foreclosure route.
A mortgage backed security is a closed-ended security, which means that once a pool is
formed, then as the mortgages retire from it the pool, this change is not refilled with new loans.
46
There are only two ways for the loan to exit out of the pool. The first one is voluntary runoff or
full payment by the borrower, either through cash from his saving or other sources, or by selling
the house. The second route is through foreclosure, where the borrower defaults on the loan by
not paying on his monthly payments.
An analysis of the pool, BOAA 2005-1 2A1, showed that the relationship between the
independent variable and the dependent variable is dynamic and constantly changing due to the
changing macroeconomic picture, and the unpredictable behavior of the consumer based on his
income and the overall economy. These variances were captured by each of the four independent
variables in their own way. In the case of pool BOAA 2005-1 2 A1, we observed that in 2008 the
foreclosure rate was only 0.07% of the total pool, but by 2012 it reached 4.95%; geography had
the maximum explanatory power at 10.45%, however perfect payer went from 49.94% to 1.70%.
An analysis of the data in Figure 5-3, shows that from 2008 to 2011, the four factors to some
degree had significant relationship with foreclosure rate and after 2011, other factors came into
play that had an impact on the foreclosure rate.
The data in Table 5-4 (pool BOAA 2005 10 5A1) showed that the average foreclosure in
the pool between 2008 and 2013 was less than 1%, and only 3.96% in 2014. With such a low
foreclosure rate, we can see that all four independent variables have a significant explanatory
power during different times. For example, credit score had a significant R2
of 47.62% in 2008,
46.43% in 2011 and 65.26% in 2012. Perfect payer had significant R2
of 40.07% in 2008,
46.09% in 2010, 52.35% in 2011 and 57.41% in 2014. The balance had a significant R2
of
67.94% in 2009 and 44.26% in 2011. Geography had 65.68% R2
in 2010 and 66.91% in 2012.
2013 have zero foreclosure rate this could be due to the fact that some pools were subject to
Robo-signing scrutiny were it was alleged that Bank of America did not follow proper procedure
47
to foreclosure , thus Bank of America stopped all foreclosure activity, thus leading to a different
result.
Table 5-4. Statistical Analysis for Pool B0AA 2005 10 5A1
Pool:BOAA2005105A1
2008 2009 2010 2011 2012 2013 2014
Dependentvariable
Foreclosure%averagefor
theyearforreference 0.43% 0.32% 0.50% 0.99% 0.89% 0.00% 3.96%
R²values
Independentvariable:1 Creditscore 47.62% 3.69% 21.28% 46.43% 65.26% 9.33% 3.94%
Independentvariable:2 PerfectPayer% 40.07% 15.82% 46.09% 52.35% 0.08% 100.00% 57.41%
Independentvariable:3 Balance<417 0.03% 67.94% 23.96% 44.26% 0.11% 100.00% 3.94%
Independentvariable:4 Geo<50% 1.02% 15.74% 65.68% 13.11% 66.91% 100.00% 22.39%
Regressionanalysisusing95%confidenceinterval
Table 5-5. Statistical Analysis for pool BCAP 2007-AA2 33A1
Pool:BCAP2007-AA233A1
2008 2009 2010 2011 2012 2013 2014
Dependentvariable:Foreclosure %
Foreclosure%averagefor
theyearforreference 0.00% 2.14% 3.03% 5.58% 7.55% 11.76% 12.25%
R²values
Independentvariable:1 Creditscore 100.0% 64.3% 0.2% 0.0% 9.5% 77.1% 1.0%
Independentvariable:2 PerfectPayer% 100.0% 78.8% 24.1% 14.1% 6.8% 62.8% 6.6%
Independentvariable:3 Balance<417k 100.0% 66.7% 0.6% 14.5% 57.7% 99.0% 19.7%
Independentvariable:4 Geo<50%ofthepool 100.00% 70.19% 30.78% 25.28% 0.82% 93.26% 10.52%
Regressionanalysisusing95%confidenceinterval
48
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2
MBS paper v2

More Related Content

What's hot

Trimble total station help
Trimble total station helpTrimble total station help
Trimble total station helpGonçalo Beja
 
Technical Communication 14th Edition Lannon Solutions Manual
Technical Communication 14th Edition Lannon Solutions ManualTechnical Communication 14th Edition Lannon Solutions Manual
Technical Communication 14th Edition Lannon Solutions ManualIgnaciaCash
 
Estrategias para el desarrollo sostenible OCDE CAD
Estrategias para el desarrollo sostenible OCDE CADEstrategias para el desarrollo sostenible OCDE CAD
Estrategias para el desarrollo sostenible OCDE CADAnibal Aguilar
 
Led mqp paper_final_dwan_horgan
Led mqp paper_final_dwan_horganLed mqp paper_final_dwan_horgan
Led mqp paper_final_dwan_horganherme3241
 
College America Grant Reports- Final Evaluation
College America Grant Reports- Final EvaluationCollege America Grant Reports- Final Evaluation
College America Grant Reports- Final EvaluationCOCommunityCollegeSystem
 
Emergency planning independent study 235.b
Emergency planning  independent study 235.b  Emergency planning  independent study 235.b
Emergency planning independent study 235.b ronak56
 
Participatory Planning in Plan Preparation: A Case of Delhi by Shashikant Nis...
Participatory Planning in Plan Preparation: A Case of Delhi by Shashikant Nis...Participatory Planning in Plan Preparation: A Case of Delhi by Shashikant Nis...
Participatory Planning in Plan Preparation: A Case of Delhi by Shashikant Nis...Shashikant Nishant Sharma
 
Aatc employee handbook final 2010 (2)
Aatc employee handbook   final 2010 (2)Aatc employee handbook   final 2010 (2)
Aatc employee handbook final 2010 (2)JLynnWalker
 
Project proposal 32
Project  proposal 32Project  proposal 32
Project proposal 32Firomsa Taye
 
Employee handbook -head_office
Employee handbook -head_officeEmployee handbook -head_office
Employee handbook -head_officeConfidential
 
Manejo de cefaleas jovenes y adultos
Manejo de cefaleas jovenes y adultosManejo de cefaleas jovenes y adultos
Manejo de cefaleas jovenes y adultossilvanaveneros
 
Man, marriage and machine – adventures in artificial advice
Man, marriage and machine – adventures in artificial adviceMan, marriage and machine – adventures in artificial advice
Man, marriage and machine – adventures in artificial adviceAsheesh Goja
 
SPEED- Final Design Report
SPEED- Final Design ReportSPEED- Final Design Report
SPEED- Final Design ReportLeah Segerlin
 
Evaluation of the u.s. army asymmetric warfare adaptive leader program
Evaluation of the u.s. army asymmetric warfare adaptive leader programEvaluation of the u.s. army asymmetric warfare adaptive leader program
Evaluation of the u.s. army asymmetric warfare adaptive leader programMamuka Mchedlidze
 

What's hot (19)

Trimble total station help
Trimble total station helpTrimble total station help
Trimble total station help
 
Technical Communication 14th Edition Lannon Solutions Manual
Technical Communication 14th Edition Lannon Solutions ManualTechnical Communication 14th Edition Lannon Solutions Manual
Technical Communication 14th Edition Lannon Solutions Manual
 
Estrategias para el desarrollo sostenible OCDE CAD
Estrategias para el desarrollo sostenible OCDE CADEstrategias para el desarrollo sostenible OCDE CAD
Estrategias para el desarrollo sostenible OCDE CAD
 
E elt constrproposal
E elt constrproposalE elt constrproposal
E elt constrproposal
 
Led mqp paper_final_dwan_horgan
Led mqp paper_final_dwan_horganLed mqp paper_final_dwan_horgan
Led mqp paper_final_dwan_horgan
 
MBA Dissertation Thesis
MBA Dissertation ThesisMBA Dissertation Thesis
MBA Dissertation Thesis
 
API Project Capstone Paper
API Project Capstone PaperAPI Project Capstone Paper
API Project Capstone Paper
 
College America Grant Reports- Final Evaluation
College America Grant Reports- Final EvaluationCollege America Grant Reports- Final Evaluation
College America Grant Reports- Final Evaluation
 
Emergency planning independent study 235.b
Emergency planning  independent study 235.b  Emergency planning  independent study 235.b
Emergency planning independent study 235.b
 
Participatory Planning in Plan Preparation: A Case of Delhi by Shashikant Nis...
Participatory Planning in Plan Preparation: A Case of Delhi by Shashikant Nis...Participatory Planning in Plan Preparation: A Case of Delhi by Shashikant Nis...
Participatory Planning in Plan Preparation: A Case of Delhi by Shashikant Nis...
 
Aatc employee handbook final 2010 (2)
Aatc employee handbook   final 2010 (2)Aatc employee handbook   final 2010 (2)
Aatc employee handbook final 2010 (2)
 
Project proposal 32
Project  proposal 32Project  proposal 32
Project proposal 32
 
Employee handbook -head_office
Employee handbook -head_officeEmployee handbook -head_office
Employee handbook -head_office
 
Manejo de cefaleas jovenes y adultos
Manejo de cefaleas jovenes y adultosManejo de cefaleas jovenes y adultos
Manejo de cefaleas jovenes y adultos
 
Man, marriage and machine – adventures in artificial advice
Man, marriage and machine – adventures in artificial adviceMan, marriage and machine – adventures in artificial advice
Man, marriage and machine – adventures in artificial advice
 
SPEED- Final Design Report
SPEED- Final Design ReportSPEED- Final Design Report
SPEED- Final Design Report
 
Sample training manual
Sample training manualSample training manual
Sample training manual
 
Evaluation of the u.s. army asymmetric warfare adaptive leader program
Evaluation of the u.s. army asymmetric warfare adaptive leader programEvaluation of the u.s. army asymmetric warfare adaptive leader program
Evaluation of the u.s. army asymmetric warfare adaptive leader program
 
Final_Report_12th
Final_Report_12thFinal_Report_12th
Final_Report_12th
 

Similar to MBS paper v2

THE IMPACT OF SOCIALMEDIA ON ENTREPRENEURIAL NETWORKS
THE IMPACT OF SOCIALMEDIA ON ENTREPRENEURIAL NETWORKSTHE IMPACT OF SOCIALMEDIA ON ENTREPRENEURIAL NETWORKS
THE IMPACT OF SOCIALMEDIA ON ENTREPRENEURIAL NETWORKSDebashish Mandal
 
Staff Report and Recommendations in Value of DER, 10-27-16
Staff Report and Recommendations in Value of DER, 10-27-16Staff Report and Recommendations in Value of DER, 10-27-16
Staff Report and Recommendations in Value of DER, 10-27-16Dennis Phayre
 
The application of VSM to NPD
The application of VSM to NPDThe application of VSM to NPD
The application of VSM to NPDEdoardo Bruno
 
A Real Time Application Integration Solution
A Real Time Application Integration SolutionA Real Time Application Integration Solution
A Real Time Application Integration SolutionMatthew Pulis
 
E-FREELANCING - MAJOR/FINAL YEAR PROJECT DOCUMENTATION
E-FREELANCING - MAJOR/FINAL YEAR PROJECT DOCUMENTATIONE-FREELANCING - MAJOR/FINAL YEAR PROJECT DOCUMENTATION
E-FREELANCING - MAJOR/FINAL YEAR PROJECT DOCUMENTATIONPIYUSH Dubey
 
Strategic Technology Roadmap Houston Community College 2005
Strategic Technology Roadmap Houston Community College 2005Strategic Technology Roadmap Houston Community College 2005
Strategic Technology Roadmap Houston Community College 2005schetikos
 
ITSM-Process-Description doc.docx
ITSM-Process-Description doc.docxITSM-Process-Description doc.docx
ITSM-Process-Description doc.docxAsad Abbas
 
Web2.0 And Business Schools Dawn Henderson
Web2.0 And Business Schools   Dawn HendersonWeb2.0 And Business Schools   Dawn Henderson
Web2.0 And Business Schools Dawn HendersonDawn Henderson
 
final dissertation pambuka
final dissertation pambukafinal dissertation pambuka
final dissertation pambukaTakesure Pambuka
 
Citrus College - NASA SL Criticla Design Review
Citrus College - NASA SL Criticla Design ReviewCitrus College - NASA SL Criticla Design Review
Citrus College - NASA SL Criticla Design ReviewJoseph Molina
 
Name Thistle Anderson Phone ext. 2927 Email [email.docx
 Name Thistle Anderson  Phone ext. 2927  Email [email.docx Name Thistle Anderson  Phone ext. 2927  Email [email.docx
Name Thistle Anderson Phone ext. 2927 Email [email.docxMARRY7
 
Putting Together the Pieces - The S&OP Technology Landscape - 20 AUG 2015
Putting Together the Pieces - The S&OP Technology Landscape - 20 AUG 2015Putting Together the Pieces - The S&OP Technology Landscape - 20 AUG 2015
Putting Together the Pieces - The S&OP Technology Landscape - 20 AUG 2015Lora Cecere
 
Tellurium 0.6.0 User Guide
Tellurium 0.6.0 User GuideTellurium 0.6.0 User Guide
Tellurium 0.6.0 User GuideJohn.Jian.Fang
 
Does online interaction with promotional video increase customer learning and...
Does online interaction with promotional video increase customer learning and...Does online interaction with promotional video increase customer learning and...
Does online interaction with promotional video increase customer learning and...rossm2
 
Gate 2013-brochure
Gate 2013-brochureGate 2013-brochure
Gate 2013-brochureAnkur Khanna
 
Gate 2013-brochure
Gate 2013-brochureGate 2013-brochure
Gate 2013-brochureKarthik Ps
 

Similar to MBS paper v2 (20)

THE IMPACT OF SOCIALMEDIA ON ENTREPRENEURIAL NETWORKS
THE IMPACT OF SOCIALMEDIA ON ENTREPRENEURIAL NETWORKSTHE IMPACT OF SOCIALMEDIA ON ENTREPRENEURIAL NETWORKS
THE IMPACT OF SOCIALMEDIA ON ENTREPRENEURIAL NETWORKS
 
Staff Report and Recommendations in Value of DER, 10-27-16
Staff Report and Recommendations in Value of DER, 10-27-16Staff Report and Recommendations in Value of DER, 10-27-16
Staff Report and Recommendations in Value of DER, 10-27-16
 
Rand rr2637
Rand rr2637Rand rr2637
Rand rr2637
 
The application of VSM to NPD
The application of VSM to NPDThe application of VSM to NPD
The application of VSM to NPD
 
A Real Time Application Integration Solution
A Real Time Application Integration SolutionA Real Time Application Integration Solution
A Real Time Application Integration Solution
 
E-FREELANCING - MAJOR/FINAL YEAR PROJECT DOCUMENTATION
E-FREELANCING - MAJOR/FINAL YEAR PROJECT DOCUMENTATIONE-FREELANCING - MAJOR/FINAL YEAR PROJECT DOCUMENTATION
E-FREELANCING - MAJOR/FINAL YEAR PROJECT DOCUMENTATION
 
Consumer Demand
Consumer DemandConsumer Demand
Consumer Demand
 
Strategic Technology Roadmap Houston Community College 2005
Strategic Technology Roadmap Houston Community College 2005Strategic Technology Roadmap Houston Community College 2005
Strategic Technology Roadmap Houston Community College 2005
 
Thesis writing
Thesis writingThesis writing
Thesis writing
 
ITSM-Process-Description doc.docx
ITSM-Process-Description doc.docxITSM-Process-Description doc.docx
ITSM-Process-Description doc.docx
 
Upwind - Design limits and solutions for very large wind turbines
Upwind - Design limits and solutions for very large wind turbinesUpwind - Design limits and solutions for very large wind turbines
Upwind - Design limits and solutions for very large wind turbines
 
Web2.0 And Business Schools Dawn Henderson
Web2.0 And Business Schools   Dawn HendersonWeb2.0 And Business Schools   Dawn Henderson
Web2.0 And Business Schools Dawn Henderson
 
final dissertation pambuka
final dissertation pambukafinal dissertation pambuka
final dissertation pambuka
 
Citrus College - NASA SL Criticla Design Review
Citrus College - NASA SL Criticla Design ReviewCitrus College - NASA SL Criticla Design Review
Citrus College - NASA SL Criticla Design Review
 
Name Thistle Anderson Phone ext. 2927 Email [email.docx
 Name Thistle Anderson  Phone ext. 2927  Email [email.docx Name Thistle Anderson  Phone ext. 2927  Email [email.docx
Name Thistle Anderson Phone ext. 2927 Email [email.docx
 
Putting Together the Pieces - The S&OP Technology Landscape - 20 AUG 2015
Putting Together the Pieces - The S&OP Technology Landscape - 20 AUG 2015Putting Together the Pieces - The S&OP Technology Landscape - 20 AUG 2015
Putting Together the Pieces - The S&OP Technology Landscape - 20 AUG 2015
 
Tellurium 0.6.0 User Guide
Tellurium 0.6.0 User GuideTellurium 0.6.0 User Guide
Tellurium 0.6.0 User Guide
 
Does online interaction with promotional video increase customer learning and...
Does online interaction with promotional video increase customer learning and...Does online interaction with promotional video increase customer learning and...
Does online interaction with promotional video increase customer learning and...
 
Gate 2013-brochure
Gate 2013-brochureGate 2013-brochure
Gate 2013-brochure
 
Gate 2013-brochure
Gate 2013-brochureGate 2013-brochure
Gate 2013-brochure
 

MBS paper v2

  • 1. ANALYSIS OF THE FACTORS USED BY VALUATION MODELS FOR MORTGAGE-BACKED SECURITIES By Bhawani Singh A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Management and Systems Stern School of Business New York University 2015
  • 2. Table of Contents Table of Tables.................................................................................................................................v Table of Figures...............................................................................................................................vi Acknowledgements ........................................................................................................................vii A Declaration................................................................................................................................ viii Abstract............................................................................................................................................ix Chapter 1 - Introduction .................................................................................................................1 1.0 Introduction..............................................................................................................................1 1.1 Purpose of Research.................................................................................................................3 1.2 Problem Definition ..................................................................................................................6 1.3 Research Question ...................................................................................................................7 1.4 Conclusion ...............................................................................................................................7 Chapter 2 - Literature Review........................................................................................................9 2.1. Introduction.............................................................................................................................9 2.2 Types Of Risks That Effect the Valuation Of a Mortgage Backed Security...........................9 2.3 Prepayment and Default Risk Models ...................................................................................10 2.3.1 Perfect Payer (Refinance Activity). ............................................................................. 13 2.3.2 Perfect Payer (Age of the Mortgage Assets)................................................................ 13 2.3.3 Loan Balance. .............................................................................................................. 14 2.3.4 FICO score................................................................................................................... 14 2.3.5 Geographic................................................................................................................... 15 2.4 Interest Rate Risk Models......................................................................................................16 2.4.1 Vasicek model.............................................................................................................. 16 2.4.2 Cox, Ingersoll and Ross (CIR) Model. ........................................................................ 17 2.4.3 Black–Derman–Toy Model.......................................................................................... 17 2.4.4 Ho–Lee Model. ............................................................................................................ 18 2.4.5 Hull–White Model. ...................................................................................................... 19 2.4.6 The Black–Karasinski Model. ..................................................................................... 19 2.5 Other Models .........................................................................................................................20 2.5.1 Heath–Jarrow–Morton (HJM) Model.......................................................................... 20 2.5.2 LIBOR Market Model.................................................................................................. 20 2.6 Subprime and Prime Mortgages ............................................................................................21
  • 3. 2.7 Foreclosure Process ...............................................................................................................22 2.7.1 Judicial Foreclosure. .................................................................................................... 23 2.7.2 Power of Sale. .............................................................................................................. 23 2.7.3 Strict Foreclosure......................................................................................................... 23 2.8 Conclusion .............................................................................................................................23 Chapter 3 - Research Methodology And Design.........................................................................25 3.1 Introduction............................................................................................................................25 3.2 Research Question and Hypothesis........................................................................................25 3.3 Relevance of Topic ................................................................................................................26 3.4 Research Methodology ..........................................................................................................27 3.5 Research Design: Variables Identified ..................................................................................28 3.5.1 FICO Score. ................................................................................................................. 29 3.5.2 Geography of Loan. ..................................................................................................... 29 3.5.3 Loan Balances.............................................................................................................. 30 3.5.4 Perfect payers............................................................................................................... 30 3.6 Conclusion .............................................................................................................................30 Chapter 4: Data Collection ...........................................................................................................31 4.1 Introduction............................................................................................................................31 4.2 Database Description: Population and Sample......................................................................31 4.3 Database Description: Reliability and Validity .....................................................................34 4.4 Conclusion .............................................................................................................................38 Chapter 5 - Results and Analysis..................................................................................................40 5.1 Introduction............................................................................................................................40 5.2 Data Analysis and Interpretation ...........................................................................................40 5.3 Conclusion .............................................................................................................................67 Chapter 6 – Conclusions and Recommendations........................................................................68 6.1 Introduction............................................................................................................................68 6.2 Conclusion: Hypothesis Holds True......................................................................................68 6.3 Recommendations..................................................................................................................68 6.5 Contribution of This Study ....................................................................................................71 6.6 Limitations of the Study ........................................................................................................72 6.8 Conclusion .............................................................................................................................72 iii
  • 4. References.......................................................................................................................................74 Appendix A – FEFUO Letter........................................................................................................82 Appendix B – Glossary..................................................................................................................83 iv
  • 5. Table of Tables Table 3-1. Models Used in the Industry and Their Component Variables ................................... 23 Table 5-1. R2 Explanatory Power ................................................................................................. 37 Table 5-2. Statistical Analysis for pool BCAP 2007-AA2 22A1................................................. 38 Table 5-3. Statistical Analysis for pool BOAA 2005-1 2A1........................................................ 41 Table 5-4. Statistical Analysis for Pool B0AA 2005 10 5A1....................................................... 43 Table 5-5. Statistical Analysis for pool BCAP 2007-AA2 33A1………………......................... 43 Table 5-6. Statistical Analysis for pool BOAA 2005-6 7A1........................................................ 46 Table 5-7. Statistical Analysis for pool BOAA 2005-1 2A1........................................................ 47 Table 5-8. Statistical Analysis for pool AMAC 2003-12 2A ....................................................... 49 Table 5-9. Statistical Analysis for pool AHM 2005-2 3A............................................................ 50 Table 5-10. Statistical Analysis for pool AMAC 2003-12 2A ..................................................... 60 Table 5-11. Statistical Analysis for pool AHM 2005-1 8A1........................................................ 60 Table 5-12. Statistical Analysis for pool AHM 2005-1 6A.......................................................... 61 Table 5-13. Statistical Analysis for pool AHM 2004-1 1A.......................................................... 61 v
  • 6. Table of Figures Figure 5-1. Change in Median and Mean Incomes 2001-2010 ................................................... 52 Figure 5-2. Change in Median and Mean Net worth 2001-2010................................................. 52 Figure 5-3. Change in real GDP .................................................................................................. 53 Figure 5-4. Monthly Change in Nonfarm Employment............................................................... 53 Figure 5-5. Unemployment Rate ………………......................................................................... 54 Figure 5-6. Long Term Unemployment....................................................................................... 54 Figure 5-7. Before Tax Family Income 2001-2004...................................................................... 55 Figure 5-8. Before Tax Family Income 2007 - 2010.................Error! Bookmark not defined.56 Figure 5-9. Before Tax Family Income 2007 – 2010 Continued.................................................. 57 Figure 5-10. Amount Before Tax Family Income ........................................................................ 58 Figure 5-11. U.S. Wages............................................................................................................... 58 vi
  • 7. Acknowledgements I sincerely thank Nouriel Roubini for his service as my Thesis Supervisor. I also thank Dr. Sandra Marshall and Dr. Jeffery Keefer for the Research Project and Research Process and Methodology (RPM), which prepared me for my thesis research. Additionally, I would like to thank Dr. Nitya Singh for the guidance to pursue my studies at NYU. My thanks also go to all the instructors at Stern , from whom I learned a great deal. vii
  • 8. A Declaration I grant powers of discretion to the Department, SPS, and NYU to allow this thesis to be copied in part or in whole without further reference to me. This permission covers only copies made for study purposes or for inclusion in Department, SPS, and NYU research publications, subject to normal conditions of acknowledgement. viii
  • 9. Abstract Since 2007, a greater emphasis has been placed on the valuation of mortgage-backed securities (MBS), especially because of the systematic risk that they pose to financial institutions in particular and the whole economy in general. The thesis, therefore, evaluated the various methodologies presently used by rating agencies such as S&P and banks such as JP Morgan for inhouse valuation to calculate the value of a mortgage-backed portfolio. Given that there are multiple models to value prepayment, default, and interest rate risk for mortgage- backed securities, the thesis examined the level of correlation between the main input factors used in the various models and the foreclosure rates. Using hypothesis testing the result findings go on to suggest that there is a relationship between the dependent variable (foreclosure rates) and independent variables (credit score, perfect payer, balance and geography), the relationship is not a static one, but a dynamic once. This relationship suggests that the correlation between the dependent variable and independent variable changes over time. Another major finding of this thesis was to suggest that the input variables which compose a Mortgage Backed Security (MBS) have an explanatory power over the foreclosure rates. The period of study in this thesis was a seven year longitudinal study between the years 2008 and 2014. Keywords: Mortgage-backed securities, valuation, prepayment model, credit risk model, interest risk model. ix
  • 10. Chapter 1 - Introduction 1.0 Introduction Securitization in the United States began during the 1970s with US government-sponsored National Mortgage Association funding programs for residential mortgages, followed by private financings. Gaining popularity beginning of the 1980s, securitization has become a common financing tool on the global level (Bakri, 2014). “Securitization of Mortgage-backed securities (MBS) are debt obligations that represent claims to the cash flows from pools of mortgage loans, most commonly on residential property. Mortgage loans are purchased from banks, mortgage companies, and other originators and then assembled into pools by a governmental, quasi- governmental, or private entity. The entity then issues securities that represent claims on the principal and interest payments made by borrowers on the loans in the pool, a process known as securitization.” (Fast Answers,2015). By 2006, the securitization market has grown to $1.480 trillion of issuance (Ashcraft & Schuermann,2014). Since the Great Recession in 2007, one of the main culprits widely blamed for the downturn was the inaccurate valuation of distressed mortgage-backed securities (MBS) held by major banks and financial institutions. These instruments were also widely held outside of the United States. This lent even greater impetus to the contagion process. This was reflected in the extensive write offs by financial institutions at both their own capital level, and also within investment funds which they managed. These events have been a key contributing factor in calls for greater supervision of financial institutions and setting of risk appropriate capital requirements. (Reilly, 2009) The pricing of such complex structures, structured in various tranches (a piece, portion or slice of a deal or structured financing. This portion is one of several related securities that are 1
  • 11. offered at the same time but have different risks, rewards and/or maturities. "Tranche" is the French word for "slice"), and requiring complex calculations to estimate cash flows assuming various defaults rates, made it difficult to accurately value them on their own merits after the collapse of Lehman Brothers. With no active market to trade in the secondary market, the banks were unable to accurately value the MBS, thus leading to issues with valuation, balance sheet liabilities, and ultimately the dreaded margin call from counter parties. The losses booked by the banks forced them to write down capital, while margin calls drained liquidity from the financial markets. These losses reduced the capacity of the banks to act as purveyors of credit in an economy already shaken by the collapse of a large swathe of the housing market. The resulting contraction in credit ultimately caused the US economy to implode in 2007. DiMartino and Duca (2007) suggest that, “in the early and mid-2000s, high-risk mortgages became available from lenders who funded mortgages by repackaging them into pools that were sold to investors. New financial products were used to apportion these risks, with private-label mortgage-backed securities (PMBS) providing most of the funding of subprime mortgages. The less vulnerable of these securities were viewed as having low risk either because they were insured with new financial instruments or because other securities would first absorb any losses on the underlying mortgages” (DiMartino & Duca, 2007, p. 47). This enabled more first-time homebuyers to obtain mortgages, and homeownership rose. (Duca, Muellbauer, & Murphy, 2011) The resulting demand bid up house prices, more so in areas where housing was in tight supply. This induced expectations of still more house price gains, further increasing housing demand and prices (Case, Shiller, & Thompson, 2012). Investors purchasing PMBS profited at first because rising house prices protected them from losses. When high-risk mortgage 2
  • 12. borrowers could not make loan payments, they either sold their homes at a gain and paid off their mortgages, or borrowed more against higher market prices. Because such periods of rising home prices and expanded mortgage availability were relatively unprecedented, and new mortgage products’ longer-run sustainability was untested, the riskiness of PMBS was not well- understood. On a practical level, risk was “off the radar screen” because many gauges of mortgage loan quality available at the time were based on prime, rather than new, mortgage products. While sub-prime lending was not new, two key factors helped precipitate the disaster: The lack of sufficient estimates on which to base default probabilities, and the assumption that there could not be a nationwide housing collapse. When house prices peaked, mortgage refinancing and selling homes became less viable means of settling mortgage debt, and mortgage loss rates began to rise for lenders and investors. In April 2007, New Century Financial Corp., a leading subprime mortgage lender, filed for bankruptcy. Shortly thereafter, large numbers of PMBS and PMBS-backed securities were downgraded to high risk, and several subprime lenders closed. As the bond funding of subprime mortgages collapsed, lenders stopped making subprime and other nonprime risky mortgages. This lowered the demand for housing, leading to sliding house prices that fueled expectations of still more declines, further reducing the demand for homes. Prices fell to such low levels that it became difficult for troubled borrowers to sell their homes to fully pay off their mortgages, even if they had provided a sizable down payment (Duca, 2010). 1.1 Purpose of Research Much research has focused on the valuation of MBS, and these valuation methods have been reinvented and developed since 2007. The main impetus for such a change came from the Federal Reserve, pushing the banks and major financial institutions to accurately value their 3
  • 13. exposure to the MBS portfolio they are holding and meet the capital requirements. To accurately measure the foreclosure risk, pre-payment risk, value at Risk (VAR); various models have been developed that value these performing and non-performing MBS Pools. The concept of performing pools implies that the borrower is paying on time and non-performing implies that the buyer has missed several payments. The concept of foreclosure risk implies to the risk in a MBS pool that the borrower will not pay, and default on his loan, leading to a bank foreclosure. Pre-payment risk is the risk in a MBS pool that a borrower will pay ahead of time, causing the pool to have lower interest payments thus lower return. VAR is the total risk of a MBS portfolio that may be at risk given a macro-economic event, such as rise in interest rate. Some institutions have taken a lead on this and are the industry leaders. One such market leader in this segment is Blackrock. The company was invited by the Federal Reserve to independently value the banks’ MBS holdings, and also by the Greek government to advise them on their exposure. Their propriety valuation system is called “Alladin” and is considered the industry standard (Goliath, 2011). Presently the MBS market is $8.7 trillion, while the total outstanding public and private bond market is $39.9 trillion in the USA, which include treasuries, MBS, auto loans, credit cards, etc. (Campbell, 2014). The mortgage-backed security market is crucial to the economy not only because there are large sums of money involved, but also because it is a very crucial and direct link to the economy. The consumer accounts for two-thirds of the spending in the United States economy; therefore, taking out a mortgage is the single biggest investment an average person makes in his life. This is also important because many borrow against the unrealized capital gain in their home to finance consumption. The collapse in home prices therefore had a negative “multiplier” effect on consumption. 4
  • 14. This trend drives not only the mortgage industry but also various other industries that are dependent on the housing market such as construction, heating, appliances lumber etc. The housing market not only creates jobs and consumes resources during the construction, but also plays an important role in stimulating the economy through numerous associated activities such as ongoing home maintenance, gardening, repairs and home improvements. Businesses such as Home Depot depend on these types of activities to survive. Thus, given the far-reaching effect of the housing sector on the economy, the mortgage-back security industry is critical to the economy. However, since the bubble burst in 2008, housing is now seen as playing a lesser role in economic growth. There has been a relatively moderate recovery which, despite record low interest rates, continues to be hampered by difficult access to credit for non-prime borrowers. The core logic of the models used within the MBS is based on the cash-flow of the mortgages. The system works as follows: once a mortgage is issued by a bank it is collected by the bank and combined in a pool. The pool may range from as little as 25 loans to a few thousand individual loans. In previous asset backed securitizations there was an assumption of “safety in number, with over-collateralization seen as a means of ensuring sufficient cash flows for debt repayment. This may have been the case in the case of MBS (mortgage backed securities) backed by strong credits. However, as investors discovered, over collateralization cannot compensate for risks that were not viable at inception. “The process of posting more collateral than is needed to obtain or secure financing. Overcollateralization is often used as a method of credit enhancement by lowering the creditor's exposure to default risk.” (Investopedia, 2003) The criteria for forming the pool are based on various circumstances or investor needs, such as maturity of the pool which might be a 15 years period or a 30 years period, Another criteria is required return. A 15% required return will have riskier mortgages where as a 6% required return will have a less risky 5
  • 15. loans. Once a pool is created, it is given to a rating firm such as Moody’s or S&P. The rating firm does its due diligence and assigns an investment grade to the pool based on the risk metrics they have identified. Some risk metrics used by rating companies are loan to value (LTV), i. e how much is being borrowed vs the value of the property, credit scores etc. The bank either holds the pool on its own balance sheet or sells it to investors. (Tatom, 2009) 1.2 Problem Definition The mechanics of mortgage backed securities are based on payments by the individual borrowers. The borrower makes monthly payments and the servicing firm i.e. the bank, collects the payments and amortizes the loan with part payment to interest and remaining to principal till the balance becomes zero. The most common types of mortgages are 30 year fixed rate, and 15 year fixed rate mortgages. The problem with such complex instruments is that they are made of several moving pieces, such as tranches, which react differently to macro-economic events. They are also very susceptible to changes in interest rates – which may cause an extension, in the event that rates should rise, of the original maturity or, in the event that rates fall, prepayments. In this last case, the best borrowers, who can access refinancing at lower rates shall prepay. This is tantamount to “adverse selection”. The pool generally tends to be comprised increasingly of lower rated credits. In the event of a recession and fall in borrower income, this may lead to an increased foreclosure rate. Thus the way to predict the effect is to use proxies. In the case of mortgage backed security, the proxies are FICO score, geography of the loan and payment history. The decision to use a proxy is based on each individual company’s fund manager, and the model they plan to use, as well as the variables that they seem fit to include or drop from the model. There is no industry standard established for this. It is essential therefore, that such proxies are relevant measures. However, there is a paucity of research which aims to establish whether such proxies are relevant indicators or not (Fabozzi, 1998). Since there is no industry or 6
  • 16. government standard each firm is left to design a model that it likes based on the proxies the research team decides to use. 1.3 Research Question This thesis, therefore, investigated the various methodologies and models that are used to calculate the value or price of a distressed MBS security. A distressed MBS security is a pool that has a high rate of foreclosures. A correlation analysis was conducted between the main inputs into the models (credit score, geography, loan balances and perfect payer percentage) with the foreclosure rates. Such an analysis enabled us to developing a comprehensive understanding of whether the input variables into the models had a high explanatory potential or not; and whether the models were using the correct factors or not. The thesis therefore answered the research question: What is the nature of correlation between the factors used by the most common risk models (e.g. FICO score, geography, loan balance and perfect payer) used to predict foreclosure rate of the pool? 1.4 Conclusion The primary goal of this thesis was to develop an understanding of the various models that are used to analyze mortgage backed securities, and identify which model is the optimum model for financial professionals to use. In order to answer the research question, I conducted an extensive literature review in Chapter 2. This set the stage of developing an understanding of what is the existing work in the field, as well as enabling me to identify the numerous research gaps. In Chapter 3, I then went ahead and identified the research methodology to be followed, as well as developed the research design that I used to answer the research question. Once the research methodology was established in Chapter 4, I then enumerated how the data would be collected and processed. The data was tested in Chapter 5, and as identified in the research 7
  • 17. methodology, I used numerous techniques to answer the research question. Finally, in Chapter 6, I summarized the research and put forward my recommendations and conclusion. 8
  • 18. Chapter 2 - Literature Review 2.1. Introduction In order to evaluate the relevance of this research topic, I conducted a comprehensive literature review and approached the ideas from different viewpoints. The objective of this review was to establish the theoretical background on which I based my thesis. I used existing scholarly works to first of all identify the various types of risks that effect the valuation of a mortgage backed security. Once the various types of mortgage backed securities were identified, I then reviewed the various prepayment and default risk models. I also evaluated the literature on various interest rate models and risk models. An analysis of the existing work was necessary to establish the argument that the variables identified by me are variables that are the primary basis of calculation in all of the models. 2.2 Types Of Risks That Effect the Valuation Of a Mortgage Backed Security A study done by Dunn and McConnell (1981) on the various methods used by banks to value the portfolio of mortgage-backed securities on their balance sheet came to the conclusion that there is no model that can be considered as being superior over the other, as well as can be considered as the benchmark model that the rest of the industry should follow. However, it is significant to note that in spite of the usage of more than six different types of models in the industry, none of the models were able to predict the problem in the industry that triggered the financial debacle. This fact highlights the limits of credit enhancement capacities of structuring when dealing with situations, where at inception debt repayments were contingent on asset sales by the borrowers (Hung & Lin, 2007). During the 1980’s, MBS were simple in their structure, unlike today, where computational power and complexity of the MBS structure have greatly increased. Currently 9
  • 19. with many counter parties involved, tracing the loan’s deal and exposures to off-balance-sheet entities is almost impossible (Dunn & McConnell, 1981). Mortgage-backed securities are financial instruments that are backed by the house as the collateral, thus the premise is that the holder of the note is assured that on default, the payment of the remaining balance will definitely come through. In reality however, the house value turns out to be in many cases lower than the residual debt, leading to losses for both the lenders as well as for the bond holders, with regard to fixed income instruments. The key support is that the cash flows from the borrowers were secured for payment of capital and interest; this is of little avail when the foreclosure rate explodes. However, the risk is that the borrower will pay faster. Thus, the investor has to find another security to invest the money to meet his long-term investment target. In such a scenario, the investor faces risk of not finding similar investment opportunities, or has to the take additional risks. The second risk that the investor faces is that the interest rate will change causing reinvestment risk. Based on this, it can be argued that the main risks that need to be modeled are the prepayment risk, the interest rate risk, and the default risk (Becketti, 1989). 2.3 Prepayment and Default Risk Models There are four major factors influencing the prepayment models (Stanton, 1995). The first is the refinancing incentive, which is the incentive a borrower has when the rates go down below the current mortgage rate. The second factor is the age of the mortgage, technically called seasoning. The term seasoning refers to a phenomenon in which a new pool of mortgages pays the balance faster, both as in prepayments and full payments. This can be attributed to the fact that, in the new pool, the borrowers find better rates and refinance or move and sell the property. This can also lead to investors needing to simultaneously contend with lower than expected cash flows and ‘adverse selection’ as remaining cash flows are contingent on payments by the higher 10
  • 20. credit risk. These are the borrowers who were not deemed sufficiently creditworthy to access refinancing and remain in the pool by default (Stanton, 1995). However, as time goes by this activity becomes lower and somewhat fixed to a low percentage of the total mortgages. The third factor is the month of the year also known as seasonality. On an average, mortgages are paid off more often in the summer months than in the winter months. Fourth factor is also known as premium burnout. As different households have different cost bases for the mortgages they have taken out, the interest rates for some households may need to fall further than other households, for the aforementioned households to be financially profitable to refinance or prepay (Stanton, 1995). An analysis of the literature suggested that there are three main categories of prepayment models presently used in the industry: 1. Econometric approach: This model is the projection of cash flow based on prepayment models that are fine-tuned to historical data (Schwartz & Torous, 1989). 2. Option-based Approaches: These models are built upon projection of cash flow based on option-based theory and the value of the underlying call options of the MBS (Stanton,1995). 3. Reduced-form Approaches: These models focus on intensity models as used in credit risk modelling (Kau, Keenan & Smurov, 2004). Prepayment risk is the key to determining the MBS value. This was the traditional assumption posited on an acceptable credit standing of the underlying collateral. With the market focused on the stronger credits, credit issues were deemed manageable via over-collateralization. Prepayment risk models can be broken down into two approaches. The first approach is the 11
  • 21. statistical approach. In this approach statistical tools are used to predict the probability of prepayment in term of Conditional Prepayment Rate (CPR). The reduced form models are widely used, as they are highly customizable and depend on the parameters defined by the user. This flexibility makes reduced form models easy to develop and use by regular users, and do not require programming or mathematical skills to accurately model the historical data. However, given recent experience, historical data are not always a good predictor of the future. The model works well when dealing with historical data, but the forecasting validity of such models is suspect, as it was evident during the 2007 crisis (Dowing, Stanton, & Wallace, 2003). Predominantly in the case of mortgage cash flows the unscheduled cash flows result from prepayments, not from scheduled amortization. Therefore, the choice of an accurate prepayment factor is the main driver to calculate the liquidity and valuation metrics. There are numerous sources of commercial or third-party prepayment models. One of the most popular models is the Bloomberg median estimates. This model is an average of the mortgage rate via a survey of the research departments of several Wall Street broker/dealers. BondEdge is a tool also on Bloomberg, widely used as a fixed income portfolio analytics system by many banks and financial institutions (Bloomberg, 2009). Another model that is popular is the Andrew Davidson Co. (ADCO) model. (Bloomberg, 2009). This proprietary model is different from the above two as it provides a loan level detail and is also available via Bloomberg. The third model is the Applied Financial Technologies (AFT) model. This proprietary model is also available via Bloomberg and can be used at the loan or MBS level inside several advanced Asset Liability Management (ALM) models (Fan, Sing, & Ong, 2012). Aside from the above mentioned statistical approaches, another approach used in the industry is the mathematical approach. In this process the model is based on mathematical 12
  • 22. finance and is sub-classed into option-based approach or structural approach, which is predicting prepayment via credit risk modeling. Majority of the prepayment models are based on multi- factor regression and/or optimization models using the below mentioned factors (Nakamura, 2011). 2.3.1 Perfect Payer (Refinance Activity). This has a direct effect on the factor perfect payer, if a loan is refinanced , the old loan is pain in full and thus perfect pay percentage increases in the pool. Under this factor, the market loan interest rate is lower than the original term, thus the borrower refinances to a new lower-rate loan. According to Guttentag (2004), “to repay a loan by taking out another loan, refinancing can allow one to secure a lower interest rate; for example, one can replace a loan at an 8.5% rate with one at 5.5%. In the case of a balloon loan, refinancing can repay the principal if one does not have sufficient funds to do it. This implies that if one has made only interest payments over the life of the loan and has not reduced the principal amount when the loan comes due, refinancing can prevent bankruptcy. There are two main drawbacks to refinancing. First, there is no certainty that one will be approved for it. One thus takes a risk every time one decides to make only interest payments on a loan or mortgage. Secondly, refinancing generally resets the repayment period; that is, if one refinances six years into a 10 year loan, the one generally repays the new loan over 10 years instead of the remaining four” (Guttentag, 2004,p.18). 2.3.2 Perfect Payer (Age of the Mortgage Assets). The industry standard is to use the Public Securities Association (PSA) approach to ramp up prepayments over the first 30 months of a mortgage, and then the prepayments are assumed to be stable. The main rational is that when a pool is new, borrowers who have good credit will move out sooner when they get good offers, similarly, borrowers who are not credit worthy but 13
  • 23. somehow got the loan will default, thus during the first 30 months the outliers, both good and bad borrowers will exit the pool early. But newer models are factoring in other factors such as buyer laziness or lack of opportunistic behavior even when there are economic advantages of doing so after the initial year or so have passed. 2.3.3 Loan Balance. It has been observed using historical data that loan balances with lower balances prepay slower; this is assumed that the borrower has less incentive as the dollar advantage of a refinance is minimal. “The general loan limits for 2015 are unchanged from 2014 (e.g., $417,000 for a 1-unit property in the continental U.S.) and apply to loans delivered to Fannie Mae in 2015 (even if originated prior to 1/1/2015).” (Mortgage Refinance Financial Glossary, 2011). 2.3.4 FICO score. It has been observed that loans with lower FICO scores than the national average tend to prepay slower, perhaps because they cannot get favorable loan terms thus the incentive to refinance is not there. The lower prepayment risk was however not sufficient compensation for the higher repayment risk. Bhardwaj & Sengupta (2011) in their paper suggested that “FICO score is a simple yet effective measure for evaluating the performance of credit scoring. As mentioned earlier, the advantage of using such a measure is twofold. First, it lends itself to both non-parametric and parametric estimation. Second, it minimizes the impact of situational factors on this measure of credit score performance. Using this measure, we find that credit score performance is robust to both high and low default environments. However, evidence suggests that some of the increase in credit scores over the cohorts can be explained as adjustment for the increased riskiness in other attributes on the originations. This was particularly true for low levels of credit scores resulting in a sharp deterioration of credit score performance in terms of our nonparametric measure. Significantly, 14
  • 24. once we control for other (riskier) attributes in the origination, our parametric credit score performance shows improvement over the cohorts. This would suggest an over-reliance on credit scoring not only as a measure of credit risk but also as a means to set risk on other origination attributes. In part, this reliance led to deterioration in loan performance even though average credit quality as measured in terms of credit scores actually improved over the year” (Bhardwaj & Sengupta, 2011). 2.3.5 Geographic. Longstaff (2005) conducted an empirical analysis and observed that certain parts of the country prepay faster than the others. This is a function of job mobility, younger demographics, etc. Regardless of the method or model of prepayment estimates, it is advised to back-test projected prepayments versus actual prepayment. Seasonality, historical data have shown that mortgages prepay faster during the summer months than during winter months in most parts of the country (Longstaff, 2005). Valuing MBS requires that a model takes into consideration both the behavior and the prepayment of the mortgages in the pool. After the economic crisis of 2008, the renewed focus on this sector has increase significantly. This has resulted in us developing a better understanding of MBS, however several challenges remain. “These challenges include the persistence of model-based MBS pricing errors (option adjusted spread, or OAS), the observed variance in bids for MBS derivative auctions” (Bernardo & Cornell, 1997). Another prepayment model approach is the option-based model. Here a no-arbitrage pricing theory is used but in a discrete time setting. Kariya and Kobayashi (2000), formulated a framework for pricing a mortgage-backed security (MBS) that predicted the burnout effect based on a one-factor valuation model. However, this option-based approach implicitly and usually assumes homogeneous mortgagors. This is a serious short-coming since it is very rare to have a 15
  • 25. pool of mortgagors that are homogeneous. The mortgagors in an MBS pool are typically heterogeneous, with different incomes, FICO scores, geographic locations (Ushiyama & Pliska, 2011). This was however not the case with the sub-prime mortgage ABS, which were composed of largely credit homogenous mortgages. Geographical diversification, if any, brought little solace to the investors. 2.4 Interest Rate Risk Models The other major source of uncertainty in MBS valuation is the use of interest rates. Different models are used to value that segment, thus making the one-factor-model valuation less accurate. A large decrease in the mortgage rate that follows a decrease in the short-term rate tends to lower the value of an MBS due to the refinancing activity. On the other hand a decrease in the short-term rate also has an opposite effect thereby increasing the value of an MBS by increasing the discount factors. Therefore, it is important to balance the two and incorporate their separate roles (Ushiyama & Pliska, 2011). In a recent study, Tahani and Li (2011), came to the conclusion that the interest rate behavior is not Gaussian but Brownian in nature, as evidenced by the changing volatility of the interest rates (Tahani & Li, 2011). Brownian motion refers to the motion of gas particles as they move about randomly. Using this concept Vervaat (1979), has shown that interest rates mimic the random behavior of the gas particles. Thus, financial models that incorporate the random walk are more accurate. The literature discussed above goes on to show that there are various approaches that can be adopted to calculate and develop interest rate models. Some of the major models following the earlier mentioned approached are: 2.4.1 Vasicek model. The Vasicek model is a mathematical model used in finance predicting how interest rates effect fixed-income valuation, such as that of a mortgage-backed security. The Vasicek model is a one-factor model where short-term rates are the main driver, as it contributes interest rate movements as driven by only one source of market risk, which in this 16
  • 26. model is the short-term interest rate (Vasiçek,1977). The significance of this model is that it was the first of its kind and subsequent models are based on it. 2.4.2 Cox, Ingersoll and Ross (CIR) Model. The Cox–Ingersoll–Ross model (or CIR model) is used to model interest rates in the valuation of MBS. The CIR model is a one-factor model mostly factoring in short-term interest rates, and the interest rate fluctuations are driven by only one source of market risk. CIR model was introduced in 1985 by John C. Cox, Jonathan E. Ingersoll and Stephen A. Ross as an extension of the Vasicek model. The extension that this model added was time-varying functions that replaced the factors and they can be introduced in the model to make it sync with a set of predetermined term structure and volatility of interest rates (Cox, Ingersoll, & Ross, 1985). 2.4.3 Black–Derman–Toy Model. The Black–Derman–Toy model (BDT) is a popular short-rate one-factor model used in the pricing mortgage-backed securities. The short-term rate is the single most important stochastic factor that determines the predictions of the model. This model is extremely popular within the industry, and used widely, as it was the first model to combine the mean reverting behavior of short-term interest rates with lognormal distribution. This model was developed in-house by Goldman Sachs in the 1980’s by Fischer Black, Emanuel Derman, and Bill Toy (Black, Derman, & Toy, 1990). The popularity of this model stems from the fact that it is used by one of the most influential player in the MBS market. Another salient feature of the BDT model is that it uses a binomial lattice. The model is calibrated using balance and fit of the volatility of interest rates caps, and the current yield curve or the interest rates structure. Thus once we have the calculated or calibrated lattice, then it is easier to value the complex interest-rate sensitive MBS. 17
  • 27. The model was developed by its originator for a lattice-based environment; however the model has shown it is the following continuous stochastic differential equation: where, = short-term rate at a given point t = value of the asset = short-term rate volatility at a given time t = Brownian motion under a risk-neutral probability measure Black, Derman, & Toy (1990). 2.4.4 Ho–Lee Model. The Ho–Lee model was developed in 1986 by Thomas Ho and Sang Bin Lee (1986). It was the first arbitrage-free model of interest rates. An arbitrage-free model is a financial engineering model that calculates prices or valuation in such a way that it is impossible to construct arbitrages between two or more of those prices. Thus, the profit of buying from one seller and simultaneously selling to another buyer, and making a profit, is not there. Under this model, the short rate follows a normal process: The Ho–Lee model adds values since it is fine-tuned to the market data thus the valuation is essentially the fair market price. The Ho–Lee model can therefore accurately calculate the price of the bonds with the market yield curve. The model calculates the yields based on a binomial lattice based method (Ho & Lee,1986). However, one of the weaknesses of the model is 18
  • 28. that it does not incorporate mean reversion. Additionally it generates bell-shaped distribution of rates in the future that makes it unpredictable as with this distribution negative rates are possible. 2.4.5 Hull–White Model. The Hull–White model is a model used to calculate future interest rates. The Hull–White model is based on the principles of no-arbitrage models, which are more practical given the present-day interest-rate term structure. The model easily translates the mathematical description of the future interest rates for a binominal tree; hence derivatives such as Bermudan swaptions can be valued in the model. The first Hull–White model is still popular today and was introduced in 1990 by John C. Hull and Alan White (Hull & White, 2001). The model is a short-rate model. There are disagreements among the users about to the exact time-dependent parameters, but the most commonly accepted hierarchy has θ and α constant – the Vasicek model θ has t dependence – the Hull–White model (Hull & White, 2001). 2.4.6 The Black–Karasinski Model. The Black–Karasinski model is used for the calculation of the term structure of interest rates. This model is also from the family of no- arbitrage models and uses a one-factor model for predicting interest rate movements influenced by a single source of randomness. The model is a good fit for today’s market, as in its most generic form, for the calculation of the call options on the underlying loans of the MBS. The main driving factor of the model is the short-term rate. The short-term rate is assumed to follow the following stochastic differential equation (under the risk-neutral measure): 19
  • 29. In the above equation dWt is a standard Brownian motion. The short-term interest rates are assumed to be log-normal distribution (Black & Karasinski, 1991). 2.5 Other Models In addition to the earlier mentioned models, other models used in the industry are as follows: 2.5.1 Heath–Jarrow–Morton (HJM) Model. Heath–Jarrow–Morton (HJM) model negates an assumption that is the core of the models above, i.e., no drift estimation is needed. The HJM model is different from other models as, this model captures the full dynamics of the entire forward rate curve; whereas the other models incorporating drift only capture dynamics of one point of the curve, or short rate. HJM frameworks are usually non-Markovian with infinite dimensions. But recent research has shown that they can be computed in a finite manner, making it computationally feasible (Heath, Jarrow, & Morton, 1990). 2.5.2 LIBOR Market Model. The LIBOR market model is used for predicting the future curve of interest rates. In the LIBOR model, the quantities are modeled to get the interest rate risk, rather than the individual LIBOR forward rates. This method offers a better understanding of the volatilities that are directly linked to the underlying contracts and can be observed easily in the market. In the LIBOR model a lognormal process is used to model the individual forward rate. Black model leads to a Black formula for interest rate caps, that tells us what maximum value an option can have a , in other words what is the cap on it. The most popular formula is the Black formula for interest rate caps, this formula is the market standard to quote cap prices in terms of implied volatilities, hence the term "market model". The LIBOR market model in a simple explanation is a collection of forward LIBOR of different forward rates. The LIBOR Market Model (LMM) differs from short-rate models as it 20
  • 30. uses the lognormal LMM for each forward rate, in that it evolves a set of discrete forward rates. Specifically, where dW is an N-dimensional geometric Brownian motion with The LMM relates the drifts of the forward rates based on no-arbitrage arguments. Specifically, under the Spot LIBOR measure, the drifts are expressed as the following: Nekrasov, (n.d.). Definitions 2.6 Subprime and Prime Mortgages “The main difference between prime and subprime mortgages lies in the risk profile of the borrower; subprime mortgages are offered to higher-risk borrowers. Specifically, lenders differentiate among mortgage applicants by using loan risk grades based on their past mortgage or rent payment behaviors, previous bankruptcy filings, debt-to income (DTI) ratios, and the level of documentation provided by the applicants to verify income. Next, lenders determine the price of a mortgage in a given risk grade based on the borrower’s credit risk score, e.g., the Fair, Isaac, and Company (FICO) score, and the size of the down payment.” (Agarwal & Ho, 2007). “Subprime loans, which are loans to borrowers with relatively low credit scores and records of poor credit performance or little credit experience, have become an increasing share of all mortgages in this decade and currently make up about 13 percent of such loans. In 2000 and earlier, subprime loans were negligible. Other higher risk mortgages today include credit 21
  • 31. extended by the Federal Housing Administration (FHA) and so-called "alt-?" loans, which are loans to borrowers usually with prime credit scores, but who do not provide any documentation ("no-doc") of income or wealth or ability to service pay the loan, or very little documentation ("low-doc"). They have been reported to constitute over 10 percent of all mortgages. When all three categories are added together, nearly 30 percent of loans outstanding are estimated to be in the high-risk category. Subprime loans have foreclosure rates that are much higher than that for prime loans” (Tatom, 2009). 2.7 Foreclosure Process “Foreclosure processes are different in every state. Differences among states range from the notices that must be posted or mailed, redemption periods, and the scheduling and notices issued regarding the auctioning of the property. In general, mortgage companies start foreclosure processes about 3-6 months after the first missed mortgage payment. Late fees are charged after 10-15 days; however, most mortgage companies recognize that homeowners may be facing short-term financial hardships. It is extremely important that you stay in contact with your lender within the first month after missing a payment. After 30 days, the borrower is in default, and the foreclosure processes begin to accelerate. If you do not call the bank and ignore the calls of your lender, then the foreclosure process will begin much earlier. Three types of foreclosures may be initiated at this time: judicial, power of sale and strict foreclosure. All types of foreclosure require public notices to be issued and all parties to be notified regarding the proceedings. Once properties are sold through an auction, families have a small amount of time to find a new place to live and move out before the sheriff issues an eviction notice. 22
  • 32. 2.7.1 Judicial Foreclosure. All states allow this type of foreclosure, and some require it. The lender files suit with the judicial system, and the borrower will receive a note in the mail demanding payment. The borrower then has only 30 days to respond with a payment in order to avoid foreclosure. If a payment is not made after a certain time period, the mortgage property is then sold through an auction to the highest bidder, carried out by a local court or sheriff's office (Foreclosure Process/U.S. Department of Housing and Urban Development, 2015). 2.7.2 Power of Sale. This type of foreclosure, also known as statutory foreclosure, is allowed by many states if the mortgage includes a power of sale clause. After a homeowner has defaulted on mortgage payments, the lender sends out notices demanding payments. Once an established waiting period has passed, the mortgage company, rather than local courts or sheriff's office, carries out a public auction. Non-judicial foreclosure auctions are often more expedient, though they may be subject to judicial review to ensure the legality of the proceedings (Foreclosure Process/U.S. Department of Housing and Urban Development, 2015). 2.7.3 Strict Foreclosure. A small number of states allow this type of foreclosure. In strict foreclosure proceedings, the lender files a lawsuit on the homeowner that has defaulted. If the borrower cannot pay the mortgage within a specific timeline ordered by the court, the property goes directly back to the mortgage holder. Generally, strict foreclosures take place only when the debt amount is greater than the value of the property” (Foreclosure Process/U.S. Department of Housing and Urban Development, 2015). 2.8 Conclusion In conclusion, the literature indicated that there are several models that have been developed over the years for the purpose of valuation of mortgage-backed securities. Each model is valuable and correct in its methodology as shown by its authors. However, different 23
  • 33. circumstances and priorities make one model better than the other. There is no one model that is the industry standard and superior to the other. However, each of the models relies on input factors that are similar. The major factors are FICO score, loan balances, geography of the loans, perfect payee. In the following sections, I used the variables identified by the literature, as a part of my model, and developed an understanding of the level of efficiency of each of the models identified above, in understanding Mortgage Backed Securities. 24
  • 34. Chapter 3 - Research Methodology And Design 3.1 Introduction In the section related to research methodology and design, I clearly identified the research question, and enumerated on the research design, adopted by me to test my research question. I also identified the research methodology as well as the various variables that I used to test the research question. 3.2 Research Question and Hypothesis The primary goal of this thesis was to investigate the various methodologies and models that are used to calculate the value or price of a distressed MBS security, and conduct a correlation analysis between the main inputs into the models (FICO score, geography, loan balances) with the foreclosure rates. Such an analysis enabled us to understand whether the input variables into the models have a high explanatory potential or not; and whether the models are using the correct factors or not. The thesis therefore answered the following research question, R.Q.: Does a correlation exist between the foreclosure rate of the pool and the factors used by the most common risk models used to predict foreclosure rates? In order to answer this research question, I used existing literature to develop the following hypotheses: H0: There exists no correlation between the foreclosure rate of the pool, and variables such as Credit Score, Perfect Payer, Balance and Geography; which are used to predict foreclosure rates. H1: There exists a correlation between the foreclosure rate of the pool, and variables such as Credit Score, Perfect Payer, Balance and Geography; which are used to predict foreclosure rates. 25
  • 35. 3.3 Relevance of Topic This thesis is intended to benefit the mortgage-backed security professionals, bank and valuation experts, who utilize numerous methods to value their risk or portfolio on a daily basis, without the knowledge whether one methodology is superior to other. As there are numerous models, and all of them have a sound logical and mathematical basis, there is no one model that may be considered superior in all instances compared to others. The thesis, therefore, provided a brief introduction to various models used to value MBS, and established a correlation between the main inputs that drive the model, and the foreclosure rate. Additionally, this thesis provided recent graduates entering into the MBS structure finance field with a summary of the valuation methods, and a reference of how valuation is done for such products. Table 3-1. Models Used in the Industry and Their Component Variables Factor used FICO Geography Loan balance Perfect payer Vasicek model    Cox, Ingersoll and Ross (CIR) model    Black–Derman–Toy model     Ho–Lee model     Black–Karasinski (B-K) model     Heath–Jarrow–Morton (HJM) model     LIBOR market model     26
  • 36. The thesis did not try to analyze the internal logic of all the models mentioned (Table 3- 1), as this stream of research has been extensively studied by numerous scholars (Vasiçek,1977; Black, Derman, &Toy,1990). 3.4 Research Methodology To answer the research question enumerated above, a quantitative method approach was adopted using a longitudinal study over seven years from 2008 to 2014 focusing on the factors that are used in the model. The use of a longitudinal study as a methodology, instead of surveys and interviews, was adopted as this approach is more robust and prevents individual biased from impacting the final result. For example, if a survey of finance professionals was conducted on the correlation between FICO score and foreclosure rate, then the results would have a personal bias experience component. This would result in the data not being homogeneous, and call into question the validity of the data. Similarly, if a regression analysis were based on survey, the results would be skewed. In the case of interviews, the same issues would persist making the analysis unreliable. The longitudinal study used here evaluated four input variables. These factors are FICO score, loan balance, geography of the location of the house that make up the pool of loan in the mortgage backed security, and the payment history of the borrowers. These variables were evaluated over a 7-year period from 2008 to 2014 to identify whether there is a significant correlation between the factors and foreclosure rate. The major emphasis of the thesis was to identify the correlation between the dependent variable and the independent variables. The nature of the correlation, i.e., if it is positive or negative, is outside the scope of the study. The main reason a period between the years 2008 and 2014 was chosen was, because it was during this time frame that the largest collapse in the housing market in the history or modern economics took place. The epic-center of such a housing crisis was also based in the mortgage 27
  • 37. backed securities market, making it the ideal time frame to study (Goliath, 2011). The rational for picking the period was that if we were to see the nature of relationship between the dependent variable and the four independent variables we have chosen for the study. Then the period of extreme change and foreclosures and recovery would be better than using a period where there is little change in the macro economic situation of the country in general and the borrowers in particular. The foreclosure rate was the metric chosen as a benchmark, rather than price, as default is a major risk event in a mortgage-backed security. It would make price meaningless if there is going to be no future cash flow. With the longitudinal survey, the study investigated if there is a correlation between the main factors the models use, and the foreclosure rate. The reason correlation with the foreclosure rate and input factors as a method was chosen, because the greater the correlation the higher the reliability level of the models. 3.5 Research Design: Variables Identified The method used in the thesis was a longitudinal study using a 7-year time frame. The main independent variables are FICO score, geography of the loans in the pool, loan balances, and perfect payers (see Table 1). The dependent variable is the foreclosure rate. It is defined as the mortgage foreclosure rate: the dollar value of 1–4 family mortgages that are delinquent by 30 days or more or are in foreclosure, divided by the dollar value of all 1–4 family mortgages. For example, if we assume there are 100 loans in the pool of a mortgage back security, of the 100 loans, 10 are not making payments for over 30 days and five are in foreclosure. Then the foreclosure rate for this pool is 15%. In the research design, it is important to not only identify the constructs, but also to clearly explain them so that when I test my research question, I am able to ensure robustness. The independent variables or constructs are as follows: 28
  • 38. 3.5.1 FICO Score. FICO score is a credit score based on a mathematical formula using payment history, debt balance, and length of credit history, types of credit used, and recent inquiries. The score ranges from 300 to 850. The score is used by mortgage lender to access the borrowers’ credit worthiness and risk. ‘A FICO Score is a three-digit number calculated from the credit information a credit report. Lenders use these scores to estimate their credit risk, which is, how likely is the borrower to pay his credit obligations as agreed. A FICO Score assesses the information in a borrowers credit report at a particular point in time. It helps lenders evaluate credit risk reliably, objectively, and quickly. And it helps the borrower obtain credit based on his actual borrowing and repayment history, filtering out extraneous details such as race or religion’ (What’s in My FICO Score. 2014). 3.5.2 Geography of Loan. This metric classifies where the collateral/home is located. Since a mortgage-backed security has a large number of loans, they tend to be a mixture from all over the country. For example, some pools are only from one state such as Florida; while some are a mixture of various states such as California, New York and Michigan. “Geographic diversification does not guarantee diversification in housing market returns. To the extent that the housing market is associated with the probability of loan default, a relevant measure of loan diversification is the correlation between the returns in housing markets of the loan collateral. As an illustration of this point consider that despite the geographic distance, returns on a California house price index have a correlation coefficient of 0.87 with returns on an index measuring house price returns in Washington DC. We construct a Herfindahl index of the geographic concentration in each deal as follows. For each deal, we calculate the percentage of the deal principal that is concentrated in each of the 50 states, plus Washington, DC. The deal-level 29
  • 39. Herfindahl index is then calculated as the sum of the squared weights” (Nadauld,& Sherlund, 2009). 3.5.3 Loan Balances. This metric provides the unpaid balance (UPB) information. The UPB is defined as the amount owned by the borrower on the loan. The loan balance is not a fixed amount. The original balance is reduced as payments are made based on an amortization schedule. The payments are applied to both interest and principal and over time the balance becomes zero. Thus the higher the balance the greater the effect the loan has on the pool as the default of a high-balance loan will have more impact on the pool than a low-balance loan. This behavior of this variable makes it an ideal candidate to be included in the research as one of the factors that helps us identify the level of correlation with the foreclosure rate. 3.5.4 Perfect payers. This metric reflects the percentage of people in the pool who have been paying on time over a period of time. There are several sub categories within this. The first sub-category is the 24 month payer, which is comprised of individuals who have not missed a payment in the past 24 months. The second sub-category is the 60 month perfect payer, which is comprised of borrowers who have paid on time for the past 60 months. The category we are benchmarking in this study is the perfect payer, which is individuals who have not missed a single payment at all. 3.6 Conclusion Based on the literature review in Chapter 2, we were able to identify the research question, and how it fits within the overall literature on the topic. This chapter took the study further and clearly identified the research question, the research design, as well as the dependent and independent variables. The next chapter identified how the data was collected and analyzed. 30
  • 40. Chapter 4: Data Collection 4.1 Introduction The data collected in this study was a time series date. Most of the data covered a 10-year period of mortgage-backed securities issued before 2004, and was active with payments being made by the underlying cash flow. To add diversity to the data, the study also included some mortgage-backed securities that either collapsed or were paid out. The main data source was the Bloomberg terminal that was accessed through the NYU library. Specifically, using the Bloomberg fixed-income section, and sorting for active pools of mortgage-backed securities issued before 2004, the database collected pools of data that includes MBS with both active payments, as well as mortgage-backed securities that either collapsed or were paid out. The database consisted of a total of 1000 such unique securities. For the study we choose 10 pools that had more than 1,000 individual mortgages inside then, so the N in the regression analysis would be large and no outlier would have minimal effect on the results. The pools have not been modified and the regression analysis was performed on the original data. These pools are existing pools and have been chosen to give a diverse representation. The pools were chosen from different banks and not from one single bank. Additionally, care was taken that the pools represented different geography and cont concentrated from one area, such as New York or Texas. 4.2 Database Description: Population and Sample The population of the data set comprised of all mortgage-backed securities issued in the United States of America prior to 2004. The mortgage-backed securities range from FICO scores of 400 to 800, geographically they represent all 50 states, and have loan balances ranging from $10 million to $2 billion. The total population of such loans is over 1,000. The sample that I 31
  • 41. chose from this population is around 300. These randomly chosen MBS have a diverse loan balance, FICO score, and geography and payment histories. The choice of a sample size of 300 is apt, as it is sufficient to resolve any issues arising from missing data or presence of outliers. The sample is large enough to resolve any self-selection biases and other statistical errors that are commonly observed in datasets with a low sample sizes. The method of identifying samples is based on the criteria defined earlier in the study. Once I was able to identify the population set, I randomly picked 300 pools that have all the four independent variables. This pool of 300 securities, of different vintages and characteristics, was subjected to statistical tests to answer the research question. Results of 10 pools from the 300 tested are discussed in detail. The reason we discussed 10 pools in detail was we wanted to elaborate the relationship how the dependent variable is affected by the independent variable. We discussed the macro economic, political, business environment impacting the variables in details so the reader can understand in details the relationship between the independent variable and dependent variables. At this stage it is important to point out that the loans chosen consist of prime loans and, not sub-prime loans, but the prime loans are distressed given the rescission in the chosen period of 2008 to 2014. The aim of the thesis was to test the relationship of the input factor to the foreclosure rate, thus having sub-prime mortgages might have had given us unreliable data. The study of MBS in the sub-prime mortgage is the subject of another study, but beyond the scope of this thesis. The primary reason why the sub-prime mortgage was because they are biased towards default, especially given the credit and income profile of the borrower, and also not geographically dispersed. “Subprime originations appear to be heavily concentrated in fast-growing parts of the country with considerable new construction, such as Florida, California, Nevada, and the Washington DC area. Subprime loans were also heavily 32
  • 42. concentrated in zip codes with more residents in the moderate credit score category and more black and Hispanic residents. Areas with lower income and higher unemployment had more subprime lending.” (Mayer & Pence, 2008). The economic recovery since 2009 has been unlike any other, and this slow recovery has caused wages to be depressed. Additionally, there is the concept of shadow unemployment, where people are working part-time or working jobs they are more qualified for. This may be causing substantial drift in the credit standing of loans previously deemed non sub-prime. This trend may be further accentuated by continuing restrictions on credit to non- perfect credit score borrowers. Thus, having sub-prime pools might have had resulted in the identification of spurious result between the foreclosure rate and the input variables. The major drawback would be a self- selection bias, i.e. bad pools being tested for default. My objective is to keep the data as close to the source as possible without having to amend it. Having sub-prime pools would require us to adjust for locations, or greater default, to normalize it with other pools. In short the pool of data would not be homogeneous, and therefore not comparable. The data collection design mostly leveraged the resources of the Bloomberg fixed- income section available at NYU libraries, to gain access to the data. Sample securities were identified using the criteria enumerated above, and then saved in an Excel format. The data analysis plan adopted a two-step methodology. The first step was sorting the securities in Excel based on the year of maturity, FICO score average, loan balance average, and average payment history. Since these are our independent variables, such a process helped weed out securities that were too similar. For, example, if two pools were made by a bank using similar mortgages from a common larger pool, then we can see that the pools are too similar and we would take only one pool for our study not the other. This would help us minimize any biases .The second step of the 33
  • 43. analysis involved conducting a regression analysis using each independent variable, and the dependent variable as the foreclosure rate. The focus of the study was to find the level of dependence between the independent variables and the dependent variable. For example, does a low FICO score pool result in a high default among borrowers, or does a high FICO score pool have a high foreclosure rate too? Thus, using correlation and regression analysis through the SPSS software version 23 was a perfect tool to identify such a relationship. 4.3 Database Description: Reliability and Validity According to Morse and Davidshofer (2005), “Joppe (2000) defines reliability as: The extent to which results are consistent over time and an accurate representation of the total population under study is referred to as reliability and if the results of a study can be reproduced under a similar methodology, then the research instrument is considered to be reliable.” Validity is defined as the statistical measure that a writer employees to show that the test in this case the regression analysis is measuring what the test intends to measure (Murphy & Davidshofer, 2005). There are numerous methods to measure validity. For this study, we adopted the concept of construct validity; which is a measure of how well observed relationships between test constructs, match those predicted by some theory (Cronbach & Meehl, 1955). According to Morse and Davidshofer (2005), “Kirk and Miller (1986) identify three types of reliability referred to in quantitative research, which relate to: (1) the degree to which a measurement, given repeatedly, remains the same (2) the stability of a measurement over time; and (3) the similarity of measurements within a given time period.” Charles (1995) adheres to the notions that consistency with which questionnaire [test] item are answered or individual’s scores remain relatively the same can be determined through the test-retest method at two different times. This 34
  • 44. attribute of the instrument is actually referred to as stability. If we are dealing with a stable measure, then the results should be similar. A high degree of stability indicates a high degree of reliability, which means the results are repeatable” (Morse & Davidshofer, 2005) To establish a relationship between the input factors and foreclosure rate, it is important to use the correct data from a non-biased source. The study, therefore, used data available in the public domain. By public domain, we mean data that is available academically, that is not proprietary, and data that is not cleaned to remove any markers or identifiers. The rational for using such a data is to keep the study transparent, and remove any data biases that maybe inherent in proprietary data. Such a data source allows for easy replicability of my conclusions, adding to further robustness of the research method. If we were to primarily use only proprietary data, such as data from Goldman Sachs, then there is a possibility that the data may be biased. This is true for the data made available by Goldman Sachs, as it tends to be biased towards high FICO scores, primarily because they refuse to deal in loans that have a high foreclosure rate. Using proprietary data will also defeat the purpose of the study, which was to establish a relationship between both high and low FICO score and foreclosure rate. The study therefore incorporated pools that were diverse and structurally different from each other. The criteria for different sources was based on the independent variables that the study has identified for the study, i.e. the FICO score, the location of the mortgages, loan balances and payment history. According to Morse et al. (2004); “Joppe, (2000) detects a problem with the test-retest method which can make the instrument, to ascertain degree, unreliable. She explains that test- retest method may sensitize the respondent to the subject matter, and hence influence the responses given. We cannot be sure that there was no change in extraneous influences such as an attitude change that has occurred. This could lead to a difference in the responses provided”. 35
  • 45. Similarly, Crocker and Algina (1986) noted that when a respondent answers a set of test items, the score obtained represents only a limited sample of behavior. As a result, the scores may change due to some characteristic of the respondent, which may lead to errors of measurement. These kinds of errors will reduce the accuracy and consistency of the instrument and the test scores. Hence, it is the researchers’ responsibility to assure high consistency and accuracy of the tests and scores. Crocker and Algina (1986) suggested that, "test developers have a responsibility of demonstrating the reliability of scores from their tests." Morse et al. (2004). Thus, after evaluating several sources of data, the dataset from Bloomberg was identified as the most viable source. The decision to use Bloomberg data source, over others available sources, was guided by numerous factors. The first reason to use the Bloomberg data is the ease of availability. It is available for NYU student and faculty via Bloomberg terminal. Secondly, Bloomberg has data on mortgage-backed securities for almost all securities that were issued by almost all parties dating back to the 1970s. This is important for the study because we conducted a longitudinal study over a 10 year period, and used pools issued by several banks. The rational in evaluating data spread over a seven year period was to lower the possibility of a bad year or cyclical macroeconomic events having a bias effect on the data. For example, interest rates, employment level etc. may have an impact on the pool for a particular given year, but by using a 10 year period, we minimized the effect of business cycle on the data. We need to however bear in mind, that the last 10 years have not been representative of the business cycles since 1945. The third reason for using Bloomberg data was because it was used by a majority of finance firms, and is considered as the industry standard. Therefore, the data made available through the Bloomberg terminal on mortgage-backed pools is current. Bloomberg has access to 36
  • 46. almost all the major issuers of mortgage backed securities. This is evidenced by the fact that my initial search resulted in over a 1,000 such pools with different FICO scores, loan balances, payment history. Therefore, the breadth of data available helps us address and test input factors that is hardest to decipher, i.e. geography. Usually a mortgage-backed issuance is heavily loaded with mortgages from one region. Thus using different pools of mortgage-backed securities enabled us to ensure that there existed a valid relationship between the geography of the loans in the pool, and foreclosure rate. For example, if we were to use a pool that has mortgages originating from mostly Florida, then our regression analysis would show a spurious relationship between geography and foreclosure rates. Therefore, it was necessary to have a dataset that is geographically dispersed to ensure that the correlations identified are meaningful and valid. According to Moorse (2004), “the traditional criteria for validity, finds their roots in a positivist tradition, and to an extent, positivism has been defined by a systematic theory of validity. Within the positivist terminology, validity resided amongst, and was the result and culmination of other empirical conceptions: universal laws, evidence, objectivity, truth, actuality, deduction, reason, fact and mathematical data to name just a few” ( page?) Based on the above developed arguments, the study adopted regression analysis techniques, specifically the R2 functionality in SPSS, to develop a reliable parameter and show that the independent variables have an effect on the dependent variable, and it is not a random correlation or a relation. A common tendency for quantitative researchers is to focus on the tangible outcomes of the research , a single figure or a number to explain the research question, rather than demonstrating what verification strategies were used in the research. According to Morse et al. (2004), “While strategies of trustworthiness may be useful in attempting to evaluate rigor, they 37
  • 47. do not in themselves ensure rigor. While standards are useful for evaluating relevance and utility, they do not in themselves ensure that the research will be relevant and useful.” Therefore, it is time to reconsider the importance of verification strategies used by the researcher in the process of inquiry so that reliability and validity are actively attained, rather than proclaimed by external reviewers on the completion of the project. “These strategies to include rigor include investigator responsiveness, methodological coherence, theoretical sampling and sampling adequacy, an active analytic stance, and saturation. These strategies, when used appropriately, force the researcher to correct both the direction of the analysis and the development of the study as necessary, thus ensuring reliability and validity of the completed project” Morse et al. (2004). Based on the above guidelines, it can be argued that the research methodology of adopting a longitudinal study over a seven year period, with data available from a third party (Bloomberg), is the correct manner in which the research question can be empirically tested and verified. Therefore, the method of testing adopted in the thesis was a combination of regression analysis and interpretation of result tables. This was accomplished by using the SPSS/excels software. The research design therefore enumerated by me, and the research methodology identified, helped make the study parsimonious, verifiable and reliable; as they meet all the criteria discussed by Morse necessary in any research. 4.4 Conclusion The main objective of this thesis was to identify the relationship between the four input factors and the foreclosure rate, quantifying it by adopting regression analysis. Additionally, the thesis attempted to provide an economic/business analysis between the main inputs into used in various models (FICO score, geography, loan balances), and its relationship with foreclosure rates. Such an analysis enabled us to understand whether the input variables into the models have 38
  • 48. a high explanatory potential or not, and whether the models are using the correct factors or not. The thesis therefore answered the question, what is the nature of correlation between the factors used by the most common risk models (e.g. FICO score, geography, loan balance, etc.) used to predict foreclosure rates and the foreclosure rate of the pool? The research design and methodology enumerated earlier in the chapter played an important role in helping us identify the road map to be followed in order to answer the research question. 39
  • 49. Chapter 5 - Results and Analysis 5.1 Introduction In the earlier sections, not only has the research question been clearly identified, but also it was highlighted how the question fits within the overall literature on the topic of MBS. Subsequently, the research methodology was identified, and the research design was enumerated. Following the research methodology identified in chapter 4, the data was collected, and a time series database was developed. Both regression analysis and ANOVA tools were used to deconstruct the data and develop a better understanding of how the data answers the research question. This chapter deals with the analysis of the data, the subsequent interpretation of the data, and how it helped answer the research question. 5.2 Data Analysis and Interpretation The regression analysis presented in the tables below summaries the findings. In the tables, the number that really need to be paid attention to is the pool number on the top left hand corner. This is the pool number that can be used to identify a specific mortgage pool and it can also be used as a reference on Bloomberg to identify the source data. The dependent variable, foreclosure percent is for reference only. The variable gives us the average foreclosure rate for the pool for the year. For example, if there were 10,000 mortgages in the pool and 10 of them went into foreclosure, then the foreclosure rate for the year will be 1%. The primary reason why foreclosure rate was used as a variable is because it acts as a reference, so that we can observe the degree to which the rates have changed over the seven year study period. Such an analysis helps to ground the R2 results and gives us a reference point. The literature below in this section suggested that any value less than 25% for R2 would not be significant, and that is the parameter used in the entire body of research. That being said, Table 1 gives us a good reference with respect to R2 and its explanatory power. Table 5-1 is a representation of what percentage of R2 40
  • 50. represents its explanatory power in predicting or explaining one standard deviation. Based on the data in table 5-1, in this thesis we benchmarked R2 value above 25% as being statistically significant. Table 5-1. R2 Explanatory Power 41
  • 51. Based on the results of the regression analysis, presented in the discussion below, it was observed that that there is a trend where foreclosure rate, the dependent variable; and the four independent variables have a relationship. As the R2 is the main measure, it has been used to determine the model fit in percentage terms, as well as whether the independent variables have an influence on the dependent variables or not. The results showed that there was no constant relationship between the dependent variable and the independent variable, but more of a dynamic relationship between the foreclosure rate and the four independent variables. Additionally, the input variables had a significant explanatory power over the foreclosure rate. If we evaluate the data in Table 5-2, we look at mortgage backed security pool BCAP 2007-AA2 22A1, in the year 2008. The results show that for this MBS pool, credit score could explain only 2.6 % of the total variance for the dependent variable. However, in 2012 credit score accounted for 53.96% and 59.70% of variance in 2014 respectively. This is significant and above our threshold. Table 5-2. Statistical Analysis for pool BCAP 2007-AA2 22A1 Pool:BCAP2007-AA222A1 2008 2009 2010 2011 2012 2013 2014 Dependentvariable:Foreclosure % Foreclosure%averagefor theyearforreference 0.52% 4.29% 4.30% 13.36% 9.41% 6.29% 6.52% R²values Independentvariable:1 Creditscore 2.60% 3.52% 20.73% 28.00% 53.96% 0.60% 59.70% Independentvariable:2 PerfectPayer% 32.10% 3.87% 6.90% 23.85% 9.20% 0.97% 59.09% Independentvariable:3 Balance<417k 6.54% 3.58% 6.90% 23.85% 9.20% 0.97% 59.09% Independentvariable:4 Geo<50%ofthepool 30.94% 1.05% 30.37% 35.38% 54.42% 11.61% 50.99% Regressionanalysisusing95%confidenceinterval 42
  • 52. BCAP here mean the name of the bank that produced the pool, BCAP is the code for Barclay’s Capital, BOAA is the code for Bank of America. The next row of foreclosure % , this row tells us of the total mortgages in the pool how many are in foreclosure, this is not a R2 , but a simple percentage. For example if there are 10,000 loans and 1,000 are in foreclosure then the foreclosure % is 10%. This is included here to give the reader a sense how the pool is behaving, higher foreclosure rate means pool is not performing and mortgages are failing. Since this is a master thesis and not a PHD study, we are limited in the scope and breathe of the research and presentation we can perform and display. Thus displaying all the variables is not feasible or prudent. The next row gives us the R2 by year between the four independent variables and the dependent variable. What a 53.96% R2 implies is that, credit score accounted for at least 29% of the standard deviation, and hence credit score can account for 29% of the 9.41% of foreclosure rate. If we look at the average foreclosure rate, in 2008 it was 0.52% of the pool and by 2014 the foreclosure rate increased to 6.52%. There is an argument that over time, the good mortgages leave the pool and only bad mortgages are left in the pool, so that the foreclosure rate automatically increases. We agree with the statement in theory, but this cannot explain the sudden spikes in foreclosure rate. An analysis of the year 2009 and 2008 shows that, the foreclosure rate jumped to 4.29% in 2009 from 0.52% in 2008. Post this, in the year 2013, the foreclosure rate dropped to 9.41% from 13.36% in 2012. As R2 is a measure of the explained variance, it can be argued that good borrowers exited the pool as they found better rates, or moved and sold their houses, and hence the resulting changes in the foreclosure rate. Pool BCAP 2007-AA2 22A1; gives us a good example of how there is a linear relationship between an increase in foreclosure rates, and credit scores. The pool highlights how 43
  • 53. as the foreclosure rate increases, the explanatory power of credit score also increases. In the year 2013, all four independent variables fail to have any explanatory power, this tells us that there was some factor outside the four we tested that had an impact on the foreclosure rate. The foreclosure rate declined dramatically, from 9.41% to 6.29%, and all four independent variable were below the 25% threshold. Based on the analysis of pool BCAP 2007-AA2 22A1, it can be concluded that relationship between the independent variables and the dependent variables is dynamic, and the input variables have an explanatory power over the dependent variable majority of the time. If we were to interpret the results for its significance in a business decision environment, based on the regression analysis, we can argue that in the case of pool BCAP 2007-AA2 22A1; for the year 2008, perfect payer and geography of the loans were the main factor in the foreclosure rate. During this time period the average foreclosure rate was 0.52% of the total pool. The results show that perfect payer and geography of the loans taken together highlight that fact that, as the borrower’s current income situation in a particular region took a negative turn, there was a negative impact on the cash flow of the borrower, giving rise to the increased foreclosure rates. However, in 2013, geography was the most relevant but not the main factor, suggesting that one particular state was having a macroeconomic issue that was causing the foreclosure rate to spike. Alternatively, in 2014, all four factors were equally at play. Although the models discussed above are used to come up with a probability of default and not for pinpointing the exact macro environment behind the cause. However, it is important to see if the input variables R2 results are telling a story that has a logical backing and are not just mathematically related. An evaluation of Table 5-3 (pool: BOAA 2005-1 2A1), showed that in 2008, perfect payer accounted for 49.94% of the variance in the dependent variable, and 84.05% in 2010. This 44
  • 54. influence on the variance in the dependent variable decreased to 30.24% in 2011, and maintained a downward trajectory reaching 7.77% in 2013 and 6.56% in 2014 respectively. In this pool, perfect payer has the best explanatory power from 2008 to 2011. However, after 2012 even though the foreclosure rate was high, the perfect pay variables’ ability to explain the variation diminished significantly. The perfect payer is a variable that measures the borrower’s current cash flow situation. Thus, if the borrower has sufficient income or saving, he or she can meet the monthly mortgage payment. However, recent research has shown that paying mortgages first is no longer a priority. “If we've learned one thing from the housing downturn, it's that making the monthly mortgage payment is no longer a sacred concept in many American households. In recent years, when facing financial pressure, homeowners have been more likely to let the mortgage slide before they would fall behind on their credit card bills, researchers have found. But it turns out that the mortgage is even less sacred than we thought: When times are tight, consumers put paying for their cars first. Then the credit cards will be paid. The once-mighty mortgage has slipped to No. 3” (Umberger, 2012page?). Table 5-3. Statistical Analysis for pool BOAA 2005-1 2A1 45
  • 55. Pool:BOAA2005-12A1 2008 2009 2010 2011 2012 2013 2014 Dependentvariable:Foreclosure % Foreclosure%averagefor theyearforreference 0.07% 0.69% 3.87% 4.14% 4.95% 2.54% 5.49% R²values Independentvariable:1 Creditscore 22.96% 63.12% 0.03% 1.13% 1.77% 0.95% 10.38% Independentvariable:2 PerfectPayer% 49.94% 39.88% 84.05% 30.24% 1.70% 7.77% 6.56% Independentvariable:3 Balance<417k 16.71% 16.05% 7.31% 5.54% 5.71% 13.12% 9.90% Independentvariable:4 Geo<50%ofthepool 0.53% 61.01% 17.86% 17.37% 10.45% 0.11% 10.05% Regressionanalysisusing95%confidenceinterval Taking a look at the data for the pool on a monthly basis we observed that in 2008 the prefect payer percentage was 98.65%. This implies that 98.65% of the loans in the pool were being paid on time, and the borrower had not missed a single payment. The total loans in the pool for February 2008 were 29,654. By February 2012, the year, when perfect payer starting to lose it significance in explaining the foreclosure rate, the perfect payer percentage dropped to 69.46%, and the number of loans in the pool also decreased to 15,940. Subsequently, by February 2014, the perfect payer percentage of the pool had gradually decreased to 59.34%, and the number of loans in the pool had also decreased to 7,834. Given the fact that the pool shrunk in size from 29,654 to 7,834 between 2008 and 2014, and the foreclosure rate was below 5% during this time period, it can be argued that foreclosure was not the main reason for the decrease in the pool. It could be one of the other factor that cause a tremendous runoff in the number of loans that have disappeared from the pool, and not via the foreclosure route. A mortgage backed security is a closed-ended security, which means that once a pool is formed, then as the mortgages retire from it the pool, this change is not refilled with new loans. 46
  • 56. There are only two ways for the loan to exit out of the pool. The first one is voluntary runoff or full payment by the borrower, either through cash from his saving or other sources, or by selling the house. The second route is through foreclosure, where the borrower defaults on the loan by not paying on his monthly payments. An analysis of the pool, BOAA 2005-1 2A1, showed that the relationship between the independent variable and the dependent variable is dynamic and constantly changing due to the changing macroeconomic picture, and the unpredictable behavior of the consumer based on his income and the overall economy. These variances were captured by each of the four independent variables in their own way. In the case of pool BOAA 2005-1 2 A1, we observed that in 2008 the foreclosure rate was only 0.07% of the total pool, but by 2012 it reached 4.95%; geography had the maximum explanatory power at 10.45%, however perfect payer went from 49.94% to 1.70%. An analysis of the data in Figure 5-3, shows that from 2008 to 2011, the four factors to some degree had significant relationship with foreclosure rate and after 2011, other factors came into play that had an impact on the foreclosure rate. The data in Table 5-4 (pool BOAA 2005 10 5A1) showed that the average foreclosure in the pool between 2008 and 2013 was less than 1%, and only 3.96% in 2014. With such a low foreclosure rate, we can see that all four independent variables have a significant explanatory power during different times. For example, credit score had a significant R2 of 47.62% in 2008, 46.43% in 2011 and 65.26% in 2012. Perfect payer had significant R2 of 40.07% in 2008, 46.09% in 2010, 52.35% in 2011 and 57.41% in 2014. The balance had a significant R2 of 67.94% in 2009 and 44.26% in 2011. Geography had 65.68% R2 in 2010 and 66.91% in 2012. 2013 have zero foreclosure rate this could be due to the fact that some pools were subject to Robo-signing scrutiny were it was alleged that Bank of America did not follow proper procedure 47
  • 57. to foreclosure , thus Bank of America stopped all foreclosure activity, thus leading to a different result. Table 5-4. Statistical Analysis for Pool B0AA 2005 10 5A1 Pool:BOAA2005105A1 2008 2009 2010 2011 2012 2013 2014 Dependentvariable Foreclosure%averagefor theyearforreference 0.43% 0.32% 0.50% 0.99% 0.89% 0.00% 3.96% R²values Independentvariable:1 Creditscore 47.62% 3.69% 21.28% 46.43% 65.26% 9.33% 3.94% Independentvariable:2 PerfectPayer% 40.07% 15.82% 46.09% 52.35% 0.08% 100.00% 57.41% Independentvariable:3 Balance<417 0.03% 67.94% 23.96% 44.26% 0.11% 100.00% 3.94% Independentvariable:4 Geo<50% 1.02% 15.74% 65.68% 13.11% 66.91% 100.00% 22.39% Regressionanalysisusing95%confidenceinterval Table 5-5. Statistical Analysis for pool BCAP 2007-AA2 33A1 Pool:BCAP2007-AA233A1 2008 2009 2010 2011 2012 2013 2014 Dependentvariable:Foreclosure % Foreclosure%averagefor theyearforreference 0.00% 2.14% 3.03% 5.58% 7.55% 11.76% 12.25% R²values Independentvariable:1 Creditscore 100.0% 64.3% 0.2% 0.0% 9.5% 77.1% 1.0% Independentvariable:2 PerfectPayer% 100.0% 78.8% 24.1% 14.1% 6.8% 62.8% 6.6% Independentvariable:3 Balance<417k 100.0% 66.7% 0.6% 14.5% 57.7% 99.0% 19.7% Independentvariable:4 Geo<50%ofthepool 100.00% 70.19% 30.78% 25.28% 0.82% 93.26% 10.52% Regressionanalysisusing95%confidenceinterval 48