Under the Basel II Accord, financial institutions are required for the first time to determine capital requirements for a new class of risk – operational risk. Large and internally active banks are required to estimate operational risk exposure using the Advanced Measurement Approach (AMA), which relies on advanced empirical models. As banks continue to develop and enhance their own AMA models for operational risk measurement, they are increasingly utilizing R to perform various modeling tasks.
In this presentation, Northern Trust will discuss the use of R in the loss distribution approach (LDA), the most widely used empirical approach for the measurement of operational risk. In Northern Trust’s experience, R offers unparalleled access to various distributions that are most relevant for modeling the frequency and severity of operational loss events. Additionally, Northern Trust utilizes R to perform large scale Monte Carlo simulations within the context of the LDA. These simulations are computationally intense from a processing perspective, taking many hours and sometimes days to complete with the open source distribution but much less with Revolution R Enterprise.
2. Agenda
Basel II Overview
Operational Risk – Definition
Requirements of an Operational Risk Exposure Estimate
Loss Distribution Approach (LDA)
Segmenting Loss Data into Units of Measure
Literature on Frequency and Severity Modeling
Monte Carlo Simulation within an LDA based Operational Risk Exposure Model
Potential solutions for faster Monte Carlo Simulation
Description of the test environments utilized
Results from various methods of enhancement
2
3. Operational Risk Loss Events in the News
Barings Bank (1995) –$1.3 billion loss due to speculative trading performed by currency trader Nick Leeson. This
loss ultimately lead to the collapse of the bank.
Societe Generale (2008) - $7 billion loss based on the fraudulent activities of rogue futures trader Jerome Kerviel
DBS Bank, Ltd. (2010) - $310 million penalty imposed by the Monetary Authority of Singapore due to a seven hour
system-wide outage that left customers unable to use mobile, internet, and ATM services. Additionally, customers
were not able to make any debit or credit card transactions during the outage.
Citibank (2011) - $285 million settlement related to a failure to disclose to investors its role in the asset-selection
process for a hybrid Collateralized Debt Obligation the bank offered.
Multiple Banks (2012) - $25 billion in settlements and penalties regarding five large lenders’ improper foreclosure
practices between January 2008 and December 2011.
Note: Details for each of the events above were obtained from the Algorithmic’s FIRST database.
3
4. Basel II and Operational Risk
In December of 2007, the US Federal Reserve System finalized a document commonly referred to as the “Final Rules”
which set forth general requirements for the measurement of operational risk by large US financial institutions.1
These rules defined operational risk as the risk of loss resulting from inadequate or failed internal processes, people, and systems or
from external events (including legal risk but excluding strategic and reputational risk)
Seven Distinct Basel Loss Event Types2:
1. Internal Fraud
2. External Fraud 5. Damage to Physical Assets
3. Business Disruptions/System Failure 6. Clients, Products, and Business Practice Matters
4. Execution, Delivery and Process Management 7. Employee Practices and Workplace Safety Issues.
Other classifications are available to describe losses as defined by regulators and banks
E.g. - Business Lines, Regions of Operations, Causal of Loss Category, etc
The Final Rules require banks to estimate an operational risk exposure amount that corresponds to the 99.9th percentile of the
distribution of potential aggregate operational losses, as generated by the bank’s operational risk quantification system over a one-
year horizon.
Exposure estimates must:
a) Incorporate four data elements: Internal Loss Data, External Loss Data, Scenario Analysis Data, and Business
Environment/Internal Control Factor data.
b) Be calculated using systematic, transparent, verifiable, and credible methodologies
c) A separate exposure estimate must be calculated for each set of operational loss data demonstrating a statistically
distinct loss profile.
The banking industry has focused on the use of the Loss Distribution Approach (LDA) to calculate operational risk
exposure estimates.
1.Risk-Based Capital Standards: Advanced Capital Adequacy Framework – Basel II; Final Rule (2007), Federal Register 72(235), 69407 – 408. Also, see
Operational Risk – Supervisory Guidelines for the Advanced Measurement Approaches; BIS June 2011.
2. See the Appendix for examples of each loss event type
4
5. Overview of the Loss Distribution Approach (LDA)
Under the LDA, banks must segment their loss Segment Loss Data
data to obtain datasets that are not demonstrably into Homogenous
heterogeneous. Loss Datasets
These datasets are referred to as units of measure or
UOMs Loss Distribution Approach
These datasets are used for subsequent modeling
within the LDA
Frequency
The LDA models two primary components of Distribution
operational loss data: λ
Loss Frequency Aggregate Loss
# of loss events per year Distribution
The banking industry has widely accepted a
Poisson distribution as an appropriate distribution. Internal
and/or
Monte Carlo
Loss Severity Simulation
External
Fitting a parametric distribution to operational loss Loss
data is one of the biggest challenges in measuring Data
operational risk exposure.
Severity
Distribution Operational Risk Exposure
Monte Carlo Simulation is then utilized to compound the is estimated as the 99.9th
two distributions. percentile of the aggregate
loss distribution; a 1/1,000
A large number of simulations must be run to observe a sufficient
$ value of loss event year event *
number of losses to reasonably assess what a 1 in 1,000 year
event might look like…More on this shortly
* Banks typically sum the VaR estimates for their UOMs and
perform diversification modeling to move away from the
assumption of positive correlation.
5
6. Segmenting Loss Data into Units of Measure
Banks must segment their loss data to obtain datasets that are not demonstrably heterogeneous.
Which data classifications captured in the bank’s operational loss database best characterize the bank’s operational risk exposure?
Banks often capture a variety of details about individual loss events – E.g. Region of occurrence, Business Line, Basel Event Type
How granular should classification be?
Once an appropriate set of classifying variables has been identified, a natural starting point to narrow in on
homogenous datasets is to look at loss frequency and loss severity within the identified variables
Example Data:
Basel Loss Event Types In this example, the bank has
Loss Counts by Business
CPBP BDSF IF EPWS EF DPA EDPM Total
determined that 4 Business Lines
Line and Event Type
and the 7 Basel Loss Event Types are
Commercial Banking 20 50 - 20 550 50 - 690
in % of Total 2.90% 7.25% 0.00% 2.90% 79.71% 7.25% 0.00% 100.00% a reasonable representation of
Payment and Settlement
in % of Total
10
1.27%
30
3.80% 1.27%
10 15
1.90%
440 260
55.70% 32.91%
25 790
3.16% 100.00%
operational risk exposure.
Agency Services 80 130 10 30 850 10 5 1,115
in % of Total 7.17% 11.66% 0.90% 2.69% 76.23% 0.90% 0.45% 100.00%
Not every business line has a large
Other - 5 5 30 10 5 5 60 number of loss events.
in % of Total 0.00% 8.33% 8.33% 50.00% 16.67% 8.33% 8.33% 100.00%
Total 110 215 25 95 1,850 325 35 2,655 Within a business line, not every
in % of Total 4.14% 8.10% 0.94% 3.58% 69.68% 12.24% 1.32% 100.00% Basel Event Type classification level
Basel Loss Event Types has a large number of data
Loss Amounts by Business
Line and Event Type
(in $ MM)
CPBP BDSF IF EPWS EF DPA EDPM Total Basel Loss Event Type:
Commercial Banking $ 3.00 $ 18.00 $ 1.00 $ 10.00 $ 90.00 $ 1.00 $ - $ 123.00 BDSF and EF look distinct
in % of Total 2.44% 14.63% 0.81% 8.13% 73.17% 0.81% 0.00% 100.00%
Payment and Settlement $ 1.00 $ 7.00 $ 1.00 $ 4.00 $ 70.00 $ 25.00 $ 25.00 $ 133.00
in % of Total 0.75% 5.26% 0.75% 3.01% 52.63% 18.80% 18.80% 100.00% Business Line:
Agency Services $ 6.00 $ 240.00 $ 3.00 $ 2.00 $ 225.00 $ 5.00 $ 150.00 $ 631.00
in % of Total 0.95% 38.03% 0.48% 0.32% 35.66% 0.79% 23.77% 100.00% All 4 Business Lines might be distinct
Other $ - $ 3.00 $ 1.00 $ 15.00 $ 1.00 $ 3.00 $ 10.00 $ 33.00
in % of Total
Total $
0.00%
10.00 $
9.09%
268.00 $
3.03%
6.00 $
45.45%
31.00 $
3.03%
386.00 $
9.09%
34.00 $
30.30%
185.00 $
100.00%
920.00
Additional testing is required to
in % of Total 1.09% 29.13% 0.65% 3.37% 41.96% 3.70% 20.11% 100.00% identify homogenous datasets
Note: The data above are fiction, created for this example
6
7. Using R to Identify a Homogenous Loss Dataset
R can produce a variety of descriptive statistics, graphics, and hypothesis tests that are useful to evaluate whether
loss data should be merged (homogenous) or separated (heterogeneous).
Example: Is Business Disruptions & Systems Failure Loss Event Type statistically distinct from the
Commercial Banking Business Line?
Quantiles
Datasets Count Mean(Log) SD(Log) 50.0% 75.0% 90.0% 95.0% 98.0% 99.0% 99.5% Max
Commercial Banking 640 9.40 1.08 $ 7.89 $ 16.96 $ 65.39 $ 120.79 $ 281.52 $ 423.09 $ 777.20 $ 3,376.89
Business Disruptions & Systems Failure 215 11.55 2.08 $ 74.94 $ 424.44 $ 1,520.39 $ 5,904.35 $ 12,743.90 $ 17,255.26 $ 19,750.42 $ 19,954.35
- Quantiles in $ Thousands
Test Statistic pValue
Kolmogorov-Smirnov 0.53 0
Chi- Square 270.57 3.36674E-50
Anderson Darling 151.16 0
Conclusion:
The preponderance of evidence suggests that
Commercial Banking and BDSF are statistically distinct.
A separate risk exposure estimate should be calculated
for each of these datasets.
7
8. Using R to Identify Homogenous Loss Datasets
R offers the capability to produce a variety of descriptive statistics, graphics, and hypothesis tests that are useful to
evaluate whether loss data should be merged (homogenous) or separated (heterogeneous).
Example: Is Other Business Line statistically distinct from the Commercial Banking Business Line?
Quantiles
Datasets Count Mean(Log) SD(Log) 50.0% 75.0% 90.0% 95.0% 98.0% 99.0% 99.5% Max
Commercial Banking 640 9.40 1.08 $ 7.81 $ 17.61 $ 51.60 $ 106.22 $ 390.62 $ 617.59 $ 1,162.25 $ 3,142.99
Other 55 9.32 1.02 $ 7.70 $ 14.58 $ 40.38 $ 91.85 $ 138.79 $ 434.62 $ 607.40 $ 780.18
- Quantiles in $ Thousands
Test Statistic pValue
Kolmogorov-Smirnov 0.061 0.992
Chi- Square 6.580 0.884
Anderson Darling -0.998 0.627
Conclusion:
The preponderance of evidence suggests that we cannot
conclude the Commercial Banking and ‘Other’ Business
Lines are statistically distinct.
These data can be aggregated into a single data set for
frequency and severity modeling.
If a business rationale exists to keep these data sets
separate, banks may do so.
8
9. Frequency Distribution Fitting in R
Fitting a Frequency Distribution:
The banking industry has focused on the use of a Poisson distribution to model the frequency of operational loss
events.
The Poisson is parameterized by one parameter, λ, which is equivalent to the average frequency over the time horizon being
estimated (1 year).
Various methods are used to parameterize the Poisson distribution
Simple Annual Average
Bank identified internal and
Regression Analysis based on internal/external variables – See the function lm() in R external data characteristics might
Poisson Regression based on internal/external variables – See the function glm() in R help explain operational loss
frequency
Commercial Banking
Year Loss Counts
2005 76 Once a parameter estimate, λ ,has been identified,
2006 82 obtaining the density, distribution function, quantile
2007 94 function and random generation for the Poisson
2008 64
distribution is quite easy:
2009 90
2010 103 See dpois, ppois, rpois in R for more details.
2011 96
2012 85
Total 690
Average 86.25
9
10. Fitting a Severity Distribution
Fitting a Severity Distribution:
Many great authors have published overviews on the process for severity distribution fitting within the context of an
LDA model*.
The industry currently practices a variety of loss severity modeling techniques
Fitting a single parametric distribution to the entire dataset (e.g. – log normal, pareto, log gamma, weibull, etc.)
Fitting a mixture of parametric distributions to the loss severity data
Fitting multiple parametric distributions that have non-overlapping ranges (“Splicing”)
Extreme Value Theory (EVT) and the Peaks Over Thresholds Method
Challenges associated with fitting a severity distribution include:
1. The Final Rule asks banks to estimate a 1 in 1,000 year event based on less than 15 years of operational loss data
2. Data collection thresholds – Use of shifted distributions or truncated distributions?
3. Operational Loss Databases are often “living” – Loss severities, loss data classifications, and risk types can be modified
4. Data Paucity – In many cases banks have units of measure that have a small number of observations (< 1,000).
5. Undetected Heterogeneity of Datasets – Tests performed to identify heterogeneous datasets are not perfect at doing so
Small data sets can impede this effort.
6. Fat-Tailed Data – Banks are faced with UOMs that have a small number of observations which are often best described by a
heavily skewed distribution.
Limited data in the tail can result in volatile capital estimates (e.g. – capital can swing upwards or downwards by hundreds of
millions of $) based on the inclusion of a few events.
Volatile results can present subsequent challenges for obtaining senior management buy-in on risk exposure estimates.
* Please see the references slide at the end of this presentation for a short list of books and papers that provide additional detail on
operational risk modeling.
10
11. Fitting a Severity Distribution in R
Fitting a Severity Distribution:
A variety of optimization routines exist in R that are capable of fitting severity distribution to loss data.
Using the optim() in R, one needs to specify:
1. Density Function: - sum(densityFunction(x=data, log=TRUE))
2. Starting Parameters: Contingent upon the distribution being fit
3. Optimization Routine: Nelder-Mead, BFGS, SANN, etc.
See B. Bolker for more on optimization routines in R beyond the optim() function.
Fitting truncated severity distributions
The actuar package provides density, distribution, and quantile functions as well as random number generators for fat-tailed
distributions
See Nadarajah and Kotz for code that will facilitate the fitting of a truncated density, distribution, quantile function, and random
number generator.
Identifying a “best-fit” severity distribution to the loss data
QQ-Plot of the empircal data against the fitted distributions – plot(), qqplot()
Plot the empirical cdf against the fitted distribution – ecdf()
See truncgof R package and A. Chernobai, S. T. Rachev, F. J. Fabozzi for goodness-of-fit tests and some adjusted exploratory tools
that work with left truncated data.
Many packages exist that perform EVT severity distribution fitting:
See A. J. McNeil, R. Frey, P. Embrechts and the evir package in R.
Fitting and evaluating mixture distributions are more complex endeavors…
See the GAMLSS package in R and http://www.gamlss.org/
11
12. Overview of the Loss Distribution Approach (LDA)
Thus far we have discussed:
Segmentation of loss data to obtain datasets that are not Segment Loss Data
demonstrably heterogeneous. into Homogenous
Loss Frequency Modeling Loss Datasets
Loss Severity
Loss Distribution Approach
We have not yet discussed Monte Carlo Simulation…
Many simulations containing millions of iterations must be Frequency
run to observe a sufficient number of losses to reasonably Distribution
assess what a 1 in 1,000 year event might look like λ
This results in multiple days being lost to wait on code to Aggregate Loss
complete. # of loss events per year Distribution
Northern explored opportunities to parallelize Internal
Monte Carlo simulation with Revolution Analytics Monte Carlo
and/or Simulation
External
Loss
Data
Example Code:
# Randomly draw n frequency observations from a Poisson distribution,
then draw random severities from the specified truncated severity Severity
distribution, truncated at point a. Sum up each of the individual loss Distribution Operational Risk
amounts. Exposure is estimated as
f_tr <- function() { the 99.9th percentile of
sum(do.call("rtrunc", c(n=rpois(1, lambda), the aggregate loss
$ value of loss event distribution; a 1/1,000
spec=distName, a=a, parList)))
} year event
# Simulate a large number of iterations and replicate the simulation a
number of times to reduce sample noise
simuMatrix <- replicate(30, replicate(1e+6, f_tr()))
12
13. Monte Carlo Simulation Benchmarking Analysis
Northern Trust and Revolution Analytics Evaluate Various Methods to Enhance Monte Carlo Simulation
Use a different version of R: 32B, 64B (e.g. – Update your operating system)
Use various parallelization packages: doSNOW, doRSR, & doSMP, (doRSR & doSMP are Revolution Analytics product offerings)
Use multiple processors and/or machines:
Single node with multiple cores
Cluster of CPUs with multiple cores
Hardware Environments:
4-core laptop
3-node High Performing Cluster (HPC) on Amazon Cloud
Configured and run with 8-cores on each node
Each node was restricted from 16- to 8-cores
Metrics used to evaluate each method:
Elapsed Time by Step
Memory usage
13
14. Monte Carlo Benchmarking Highlights
Revolution Analytics’ parallelization can be easily scaled up from laptop/server to
the cluster using Revolution Analytics’ distributed computing capabilities
Parallelization greatly improves simulation performance
64bit is better
Elapsed time is linear in # of iterations
Performance improves with # of cores
Revo ~ Cran within a node (no MKL impact in this study)
doRSR slightly better than doSMP on a single server
64bit marginally better that 32bit
Performance scales with cluster resources
Memory use just driven by # of iterations
doRSR ~ doSMP Memory Trends
within a node Scales with # Cores
14
15. Take-Aways, Next Steps, and Contacts
Parallelizations Offers Business Enhancements:
Less time spent waiting on programs to complete
Means more time to analyze drivers of change (e.g. – underlying data changes)
More efficient management of computing resources
No need to manually manage/schedule programs
Scalability of the solution to available resources
Revolution Analytics’ parallelization routines are scalable to the resources available
Contact Information:
Dave Humke, Northern Trust, Vice President, (dh98@ntrs.com)
Derek Norton, Revolution Analytics, (derek.norton@revolutionanalytics.com)
15
16. Appendix – Basel Loss Event Type Definition
Event Type Category Definition Categories (Level 2) Activity Examples (Level 3)
(Level 1)
Internal Fraud Loss due to acts of a type Unauthorized Transactions not reported (intentional)
intended to defraud, Activity Transaction type unauthorized (with monetary loss)
misappropriate property or Mismarking of position (intentional)
circumvent regulations, the
Theft and Fraud Fraud / credit fraud / worthless deposits
law or company policy,
excluding diversity / Theft / extortion / embezzlement / robbery
discrimination events, which Misappropriation of assets
involves at least one internal Forgery
party. Check kiting
Smuggling
Account take-over / impersonation, etc.
Tax non-compliance / evasion (willful)
Bribes / kickbacks
Insider trading (not on firm's account)
External Fraud Losses due to acts of a type Theft and Fraud Theft / robbery
intended to defraud, Forgery
misappropriate property or Check kiting
circumvent the law, by a
Systems Security Hacking damage
third party
Theft of information (with monetary loss)
Employment Losses arising from acts Employee Relations Compensation, benefit, termination issues
Practices and inconsistent with Organized labor activities
Workplace Safety employment, health or safety Safe Environment General liability (slips and falls, etc.)
laws or agreements, from
Employee health & safety rules and events
payment of personal injury
claims, or from diversity / Workers compensation
discrimination events. Diversity & All discrimination types
Discrimination
16
17. Appendix – Basel Loss Event Type Definitions (Continued)
Event Type Category Definition Categories (Level 2) Activity Examples (Level 3)
(Level 1)
Clients, Products & Losses arising from an Suitability, Fiduciary breaches / guideline violations
Business Practice unintentional or negligent Disclosure & Suitability / disclosure issues (KYC, etc.)
failure to meet a professional Fiduciary Retail consumer disclosure violations
obligation to specific clients
Breach of privacy
(including fiduciary and
suitability requirements), or Aggressive sales
from the nature or design of Account churning
a product. Misuse of confidential information
Lender liability
Improper Business or Antitrust
Market Practices Improper trade / market practice
Market manipulation
Insider trading (on firm's account)
Unlicensed activity
Money laundering
Product Flaws Product defects (unauthorized, etc.)
Model errors
Selection, Failure t investigate client per guidelines
Sponsorship & Exceeding client exposure limits
E
Advisory Activities Disputes over performance or advisory activities
Damage to Physical Losses arising from loss or Disasters and Other Natural disaster losses
Assets damage to physical assets Events Human losses from external sources (terrorism,
from natural disaster or other vandalism)
Business Disruption Losses arising from disruption Systems Hardware
& Systems Failures of business or system Software
failures Telecommunications
Utility outage / disruptions
17
18. Appendix – Basel Loss Event Type Definitions (Continued)
Event Type Category Definition Categories (Level 2) Activity Examples (Level 3)
(Level 1)
Execution, Delivery Losses from failed Transaction Miscommunication
& Process transaction processing or Capture, Execution Data entry, maintenance or loading error
Management process management, from & Maintenance Missed deadline or responsibility
relations with trade
Model / system misoperation
counterparties and vendors
Accounting error / entity attribution error
Other task misperformance
Delivery failure
Collateral management failure
Reference data maintenance
Monitoring & Failed mandatory reporting obligation
Reporting Inaccurate external report (loss incurred)
Customer Intake & Client permissions / disclaimers missed
Documentation Legal documents missing / incomplete
Customer / Client Unapproved access given to accounts
Account Incorrect client records (loss incurred)
Management Negligent loss or damage of client assets
Trade Non-client counterparty misperformance
Counterparties Misc. non-client counterparty disputes
Vendors & Suppliers Outsourcing
Vendor disputes
18
19. Appendix - References
References on Loss Distribution Approach Modeling, Frequency and Severity Fitting, and Monte Carlo Simulation:
1. A. Chernobai, S. T. Rachev, F. J. Fabozzi (2005), Composite Goodness-of-Fit Tests for Left-Truncated Samples, Technical report,
University of California Santa Barbara
2. A. J. McNeil, R. Frey, P. Embrechts (2005), Quantitative Risk Management: Concepts, Techniques, and Tools, Princeton University
Press, Princeton
3. B. Bolker (2007), Optimization and All That, Draft of Chapter 7 of B. Bolker (2008), Ecological Models and Data in R, Princeton
University Press, Princeton
4. G.J. McLachlan, D. Peel (2000), Finite Mixture Models, Wiley & Sons, New York
5. H. Panjer (2006), Operational Risk: Modeling Analytics, Wiley & Sons, New York, p. 293.
6. K. Dutta, J. Perry (2006), A Tale of Tails: An Empirical Analysis of Loss Distribution Models for Estimating Operational Risk Capital,
Working Paper No. 06-13, Federal Reserve Bank of Boston.
7. M. Moscadelli (2004), The Modelling of Operational Risk: Experience with the Analysis of the Data Collected by the Basel Committee,
Temi di Discussione No. 517, Banca d’Italia.
8. P. de Fontnouvelle, E. Rosengren, J. Jordan (2007), Implications of Alternative Operational Risk Modeling Techniques, In: M. Carey
and R.M. Stulz (eds), The Risks of Financial Institutions, University of Chicago Press, pp. 475-512.
9. S.A. Klugman, H.H. Panjer, G.E. Willmot (2008), Loss Models: From Data to Decisions, 3rd ed., Wiley & Sons, Hoboken, NJ
10. S. Nadarajah, S. Kotz (2006), R Programs for Computing Truncated Distributions, Journal of Statistical Software 16(2)
19