Achieving High-Performing, Simulation-Based Operational Risk Measurement with RevoScaleR

NORTHERN TRUST

Operational Risk Quantification System
Northern Trust Corporation

June 28th 2012
Achieving High-Performing, Simulation-Based Operational Risk
Measurement with R and RevoScaleR
Presented by David Humke, Vice President, Corporate Risk Analytics and Insurance, Northern Trust

Disclaimer: The views expressed in this presentation are the views of the author and do not
necessarily reflect the opinions of Northern Trust Corporation or Revolution Analytics ©2012.

Agenda

 Basel II Overview
 Operational Risk – Definition
 Requirements of an Operational Risk Exposure Estimate
 Loss Distribution Approach (LDA)
 Segmenting Loss Data into Units of Measure
 Literature on Frequency and Severity Modeling
 Monte Carlo Simulation within an LDA based Operational Risk Exposure Model
 Potential solutions for faster Monte Carlo Simulation
 Description of the test environments utilized
 Results from various methods of enhancement

2

Operational Risk Loss Events in the News

 Barings Bank (1995) –$1.3 billion loss due to speculative trading performed by currency trader Nick Leeson. This
loss ultimately lead to the collapse of the bank.

 Societe Generale (2008) - $7 billion loss based on the fraudulent activities of rogue futures trader Jerome Kerviel

 DBS Bank, Ltd. (2010) - $310 million penalty imposed by the Monetary Authority of Singapore due to a seven hour
system-wide outage that left customers unable to use mobile, internet, and ATM services. Additionally, customers
were not able to make any debit or credit card transactions during the outage.

 Citibank (2011) - $285 million settlement related to a failure to disclose to investors its role in the asset-selection
process for a hybrid Collateralized Debt Obligation the bank offered.

 Multiple Banks (2012) - $25 billion in settlements and penalties regarding five large lenders’ improper foreclosure
practices between January 2008 and December 2011.

Note: Details for each of the events above were obtained from the Algorithmic’s FIRST database.

3

Basel II and Operational Risk
 In December of 2007, the US Federal Reserve System finalized a document commonly referred to as the “Final Rules”
which set forth general requirements for the measurement of operational risk by large US financial institutions.1
 These rules defined operational risk as the risk of loss resulting from inadequate or failed internal processes, people, and systems or
from external events (including legal risk but excluding strategic and reputational risk)
 Seven Distinct Basel Loss Event Types2:
1. Internal Fraud
2. External Fraud 5. Damage to Physical Assets
3. Business Disruptions/System Failure 6. Clients, Products, and Business Practice Matters
4. Execution, Delivery and Process Management 7. Employee Practices and Workplace Safety Issues.
 Other classifications are available to describe losses as defined by regulators and banks
 E.g. - Business Lines, Regions of Operations, Causal of Loss Category, etc
 The Final Rules require banks to estimate an operational risk exposure amount that corresponds to the 99.9th percentile of the
distribution of potential aggregate operational losses, as generated by the bank’s operational risk quantification system over a one-
year horizon.
Exposure estimates must:
a) Incorporate four data elements: Internal Loss Data, External Loss Data, Scenario Analysis Data, and Business
Environment/Internal Control Factor data.
b) Be calculated using systematic, transparent, verifiable, and credible methodologies
c) A separate exposure estimate must be calculated for each set of operational loss data demonstrating a statistically
distinct loss profile.
 The banking industry has focused on the use of the Loss Distribution Approach (LDA) to calculate operational risk
exposure estimates.
1.Risk-Based Capital Standards: Advanced Capital Adequacy Framework – Basel II; Final Rule (2007), Federal Register 72(235), 69407 – 408. Also, see
Operational Risk – Supervisory Guidelines for the Advanced Measurement Approaches; BIS June 2011.
2. See the Appendix for examples of each loss event type

4

Overview of the Loss Distribution Approach (LDA)
 Under the LDA, banks must segment their loss Segment Loss Data
data to obtain datasets that are not demonstrably into Homogenous
heterogeneous. Loss Datasets
 These datasets are referred to as units of measure or
UOMs Loss Distribution Approach
 These datasets are used for subsequent modeling
within the LDA
Frequency
 The LDA models two primary components of Distribution
operational loss data: λ
 Loss Frequency Aggregate Loss
# of loss events per year Distribution
 The banking industry has widely accepted a
Poisson distribution as an appropriate distribution. Internal
and/or
Monte Carlo
 Loss Severity Simulation
External
 Fitting a parametric distribution to operational loss Loss
data is one of the biggest challenges in measuring Data
operational risk exposure.

Severity
Distribution Operational Risk Exposure
 Monte Carlo Simulation is then utilized to compound the is estimated as the 99.9th
two distributions. percentile of the aggregate
loss distribution; a 1/1,000
 A large number of simulations must be run to observe a sufficient
$ value of loss event year event *
number of losses to reasonably assess what a 1 in 1,000 year
event might look like…More on this shortly
* Banks typically sum the VaR estimates for their UOMs and
perform diversification modeling to move away from the
assumption of positive correlation.
5

Segmenting Loss Data into Units of Measure
 Banks must segment their loss data to obtain datasets that are not demonstrably heterogeneous.
 Which data classifications captured in the bank’s operational loss database best characterize the bank’s operational risk exposure?
Banks often capture a variety of details about individual loss events – E.g. Region of occurrence, Business Line, Basel Event Type
 How granular should classification be?
 Once an appropriate set of classifying variables has been identified, a natural starting point to narrow in on
homogenous datasets is to look at loss frequency and loss severity within the identified variables
Example Data:
Basel Loss Event Types  In this example, the bank has
Loss Counts by Business
CPBP BDSF IF EPWS EF DPA EDPM Total
determined that 4 Business Lines
Line and Event Type
and the 7 Basel Loss Event Types are
Commercial Banking 20 50 - 20 550 50 - 690
in % of Total 2.90% 7.25% 0.00% 2.90% 79.71% 7.25% 0.00% 100.00% a reasonable representation of
Payment and Settlement
in % of Total
10
1.27%
30
3.80% 1.27%
10 15
1.90%
440 260
55.70% 32.91%
25 790
3.16% 100.00%
operational risk exposure.
Agency Services 80 130 10 30 850 10 5 1,115
in % of Total 7.17% 11.66% 0.90% 2.69% 76.23% 0.90% 0.45% 100.00%
 Not every business line has a large
Other - 5 5 30 10 5 5 60 number of loss events.
in % of Total 0.00% 8.33% 8.33% 50.00% 16.67% 8.33% 8.33% 100.00%
Total 110 215 25 95 1,850 325 35 2,655  Within a business line, not every
in % of Total 4.14% 8.10% 0.94% 3.58% 69.68% 12.24% 1.32% 100.00% Basel Event Type classification level
Basel Loss Event Types has a large number of data
Loss Amounts by Business
Line and Event Type
(in $ MM)
CPBP BDSF IF EPWS EF DPA EDPM Total  Basel Loss Event Type:
Commercial Banking $ 3.00 $ 18.00 $ 1.00 $ 10.00 $ 90.00 $ 1.00 $ - $ 123.00  BDSF and EF look distinct
in % of Total 2.44% 14.63% 0.81% 8.13% 73.17% 0.81% 0.00% 100.00%
Payment and Settlement $ 1.00 $ 7.00 $ 1.00 $ 4.00 $ 70.00 $ 25.00 $ 25.00 $ 133.00
in % of Total 0.75% 5.26% 0.75% 3.01% 52.63% 18.80% 18.80% 100.00%  Business Line:
Agency Services $ 6.00 $ 240.00 $ 3.00 $ 2.00 $ 225.00 $ 5.00 $ 150.00 $ 631.00
in % of Total 0.95% 38.03% 0.48% 0.32% 35.66% 0.79% 23.77% 100.00%  All 4 Business Lines might be distinct
Other $ - $ 3.00 $ 1.00 $ 15.00 $ 1.00 $ 3.00 $ 10.00 $ 33.00
in % of Total
Total $
0.00%
10.00 $
9.09%
268.00 $
3.03%
6.00 $
45.45%
31.00 $
3.03%
386.00 $
9.09%
34.00 $
30.30%
185.00 $
100.00%
920.00
 Additional testing is required to
in % of Total 1.09% 29.13% 0.65% 3.37% 41.96% 3.70% 20.11% 100.00% identify homogenous datasets
Note: The data above are fiction, created for this example
6

Using R to Identify a Homogenous Loss Dataset
 R can produce a variety of descriptive statistics, graphics, and hypothesis tests that are useful to evaluate whether
loss data should be merged (homogenous) or separated (heterogeneous).
Example: Is Business Disruptions & Systems Failure Loss Event Type statistically distinct from the
Commercial Banking Business Line?
Quantiles
Datasets Count Mean(Log) SD(Log) 50.0% 75.0% 90.0% 95.0% 98.0% 99.0% 99.5% Max
Commercial Banking 640 9.40 1.08 $ 7.89 $ 16.96 $ 65.39 $ 120.79 $ 281.52 $ 423.09 $ 777.20 $ 3,376.89
Business Disruptions & Systems Failure 215 11.55 2.08 $ 74.94 $ 424.44 $ 1,520.39 $ 5,904.35 $ 12,743.90 $ 17,255.26 $ 19,750.42 $ 19,954.35
- Quantiles in $ Thousands

Test Statistic pValue
Kolmogorov-Smirnov 0.53 0
Chi- Square 270.57 3.36674E-50
Anderson Darling 151.16 0

Conclusion:
The preponderance of evidence suggests that
Commercial Banking and BDSF are statistically distinct.
A separate risk exposure estimate should be calculated
for each of these datasets.

7

Using R to Identify Homogenous Loss Datasets
 R offers the capability to produce a variety of descriptive statistics, graphics, and hypothesis tests that are useful to
evaluate whether loss data should be merged (homogenous) or separated (heterogeneous).
Example: Is Other Business Line statistically distinct from the Commercial Banking Business Line?

Quantiles
Datasets Count Mean(Log) SD(Log) 50.0% 75.0% 90.0% 95.0% 98.0% 99.0% 99.5% Max
Commercial Banking 640 9.40 1.08 $ 7.81 $ 17.61 $ 51.60 $ 106.22 $ 390.62 $ 617.59 $ 1,162.25 $ 3,142.99
Other 55 9.32 1.02 $ 7.70 $ 14.58 $ 40.38 $ 91.85 $ 138.79 $ 434.62 $ 607.40 $ 780.18
- Quantiles in $ Thousands

Test Statistic pValue
Kolmogorov-Smirnov 0.061 0.992
Chi- Square 6.580 0.884
Anderson Darling -0.998 0.627

Conclusion:
The preponderance of evidence suggests that we cannot
conclude the Commercial Banking and ‘Other’ Business
Lines are statistically distinct.
These data can be aggregated into a single data set for
frequency and severity modeling.

If a business rationale exists to keep these data sets
separate, banks may do so.
8

Frequency Distribution Fitting in R
Fitting a Frequency Distribution:
 The banking industry has focused on the use of a Poisson distribution to model the frequency of operational loss
events.
 The Poisson is parameterized by one parameter, λ, which is equivalent to the average frequency over the time horizon being
estimated (1 year).
 Various methods are used to parameterize the Poisson distribution
Simple Annual Average
Bank identified internal and
Regression Analysis based on internal/external variables – See the function lm() in R external data characteristics might
Poisson Regression based on internal/external variables – See the function glm() in R help explain operational loss
frequency

Commercial Banking
Year Loss Counts
2005 76  Once a parameter estimate, λ ,has been identified,
2006 82 obtaining the density, distribution function, quantile
2007 94 function and random generation for the Poisson
2008 64
distribution is quite easy:
2009 90
2010 103  See dpois, ppois, rpois in R for more details.
2011 96
2012 85
Total 690
Average 86.25

9

Fitting a Severity Distribution
Fitting a Severity Distribution:
 Many great authors have published overviews on the process for severity distribution fitting within the context of an
LDA model*.
 The industry currently practices a variety of loss severity modeling techniques
Fitting a single parametric distribution to the entire dataset (e.g. – log normal, pareto, log gamma, weibull, etc.)
Fitting a mixture of parametric distributions to the loss severity data
Fitting multiple parametric distributions that have non-overlapping ranges (“Splicing”)
Extreme Value Theory (EVT) and the Peaks Over Thresholds Method
 Challenges associated with fitting a severity distribution include:
1. The Final Rule asks banks to estimate a 1 in 1,000 year event based on less than 15 years of operational loss data
2. Data collection thresholds – Use of shifted distributions or truncated distributions?
3. Operational Loss Databases are often “living” – Loss severities, loss data classifications, and risk types can be modified
4. Data Paucity – In many cases banks have units of measure that have a small number of observations (< 1,000).
5. Undetected Heterogeneity of Datasets – Tests performed to identify heterogeneous datasets are not perfect at doing so
Small data sets can impede this effort.
6. Fat-Tailed Data – Banks are faced with UOMs that have a small number of observations which are often best described by a
heavily skewed distribution.
Limited data in the tail can result in volatile capital estimates (e.g. – capital can swing upwards or downwards by hundreds of
millions of $) based on the inclusion of a few events.
Volatile results can present subsequent challenges for obtaining senior management buy-in on risk exposure estimates.
* Please see the references slide at the end of this presentation for a short list of books and papers that provide additional detail on
operational risk modeling.
10

Fitting a Severity Distribution in R
Fitting a Severity Distribution:
 A variety of optimization routines exist in R that are capable of fitting severity distribution to loss data.
 Using the optim() in R, one needs to specify:
1. Density Function: - sum(densityFunction(x=data, log=TRUE))
2. Starting Parameters: Contingent upon the distribution being fit
3. Optimization Routine: Nelder-Mead, BFGS, SANN, etc.
 See B. Bolker for more on optimization routines in R beyond the optim() function.

 Fitting truncated severity distributions
 The actuar package provides density, distribution, and quantile functions as well as random number generators for fat-tailed
distributions
 See Nadarajah and Kotz for code that will facilitate the fitting of a truncated density, distribution, quantile function, and random
number generator.
 Identifying a “best-fit” severity distribution to the loss data
 QQ-Plot of the empircal data against the fitted distributions – plot(), qqplot()
 Plot the empirical cdf against the fitted distribution – ecdf()
 See truncgof R package and A. Chernobai, S. T. Rachev, F. J. Fabozzi for goodness-of-fit tests and some adjusted exploratory tools
that work with left truncated data.

 Many packages exist that perform EVT severity distribution fitting:
 See A. J. McNeil, R. Frey, P. Embrechts and the evir package in R.

 Fitting and evaluating mixture distributions are more complex endeavors…
 See the GAMLSS package in R and http://www.gamlss.org/

11

Overview of the Loss Distribution Approach (LDA)
 Thus far we have discussed:
 Segmentation of loss data to obtain datasets that are not Segment Loss Data
demonstrably heterogeneous. into Homogenous
 Loss Frequency Modeling Loss Datasets

 Loss Severity
Loss Distribution Approach
 We have not yet discussed Monte Carlo Simulation…
 Many simulations containing millions of iterations must be Frequency
run to observe a sufficient number of losses to reasonably Distribution
assess what a 1 in 1,000 year event might look like λ
 This results in multiple days being lost to wait on code to Aggregate Loss
complete. # of loss events per year Distribution
Northern explored opportunities to parallelize Internal
Monte Carlo simulation with Revolution Analytics Monte Carlo
and/or Simulation
External
Loss
Data
Example Code:
# Randomly draw n frequency observations from a Poisson distribution,
then draw random severities from the specified truncated severity Severity
distribution, truncated at point a. Sum up each of the individual loss Distribution Operational Risk
amounts. Exposure is estimated as
f_tr <- function() { the 99.9th percentile of
sum(do.call("rtrunc", c(n=rpois(1, lambda), the aggregate loss
$ value of loss event distribution; a 1/1,000
spec=distName, a=a, parList)))
} year event
# Simulate a large number of iterations and replicate the simulation a
number of times to reduce sample noise
simuMatrix <- replicate(30, replicate(1e+6, f_tr()))

12

Monte Carlo Simulation Benchmarking Analysis

 Northern Trust and Revolution Analytics Evaluate Various Methods to Enhance Monte Carlo Simulation
 Use a different version of R: 32B, 64B (e.g. – Update your operating system)
 Use various parallelization packages: doSNOW, doRSR, & doSMP, (doRSR & doSMP are Revolution Analytics product offerings)
 Use multiple processors and/or machines:
Single node with multiple cores
Cluster of CPUs with multiple cores
 Hardware Environments:
 4-core laptop
 3-node High Performing Cluster (HPC) on Amazon Cloud
Configured and run with 8-cores on each node
Each node was restricted from 16- to 8-cores
 Metrics used to evaluate each method:
 Elapsed Time by Step
 Memory usage

13

Monte Carlo Benchmarking Highlights
 Revolution Analytics’ parallelization can be easily scaled up from laptop/server to
the cluster using Revolution Analytics’ distributed computing capabilities
 Parallelization greatly improves simulation performance
64bit is better
 Elapsed time is linear in # of iterations
 Performance improves with # of cores
 Revo ~ Cran within a node (no MKL impact in this study)
 doRSR slightly better than doSMP on a single server
 64bit marginally better that 32bit
 Performance scales with cluster resources
 Memory use just driven by # of iterations

doRSR ~ doSMP Memory Trends
within a node Scales with # Cores

14

Take-Aways, Next Steps, and Contacts

Parallelizations Offers Business Enhancements:
 Less time spent waiting on programs to complete
 Means more time to analyze drivers of change (e.g. – underlying data changes)
 More efficient management of computing resources
 No need to manually manage/schedule programs
 Scalability of the solution to available resources
 Revolution Analytics’ parallelization routines are scalable to the resources available

Contact Information:
 Dave Humke, Northern Trust, Vice President, (dh98@ntrs.com)
 Derek Norton, Revolution Analytics, (derek.norton@revolutionanalytics.com)

15

Appendix – Basel Loss Event Type Definition

Event Type Category Definition Categories (Level 2) Activity Examples (Level 3)
(Level 1)
Internal Fraud Loss due to acts of a type Unauthorized Transactions not reported (intentional)
intended to defraud, Activity Transaction type unauthorized (with monetary loss)
misappropriate property or Mismarking of position (intentional)
circumvent regulations, the
Theft and Fraud Fraud / credit fraud / worthless deposits
law or company policy,
excluding diversity / Theft / extortion / embezzlement / robbery
discrimination events, which Misappropriation of assets
involves at least one internal Forgery
party. Check kiting
Smuggling
Account take-over / impersonation, etc.
Tax non-compliance / evasion (willful)
Bribes / kickbacks
Insider trading (not on firm's account)
External Fraud Losses due to acts of a type Theft and Fraud Theft / robbery
intended to defraud, Forgery
misappropriate property or Check kiting
circumvent the law, by a
Systems Security Hacking damage
third party
Theft of information (with monetary loss)
Employment Losses arising from acts Employee Relations Compensation, benefit, termination issues
Practices and inconsistent with Organized labor activities
Workplace Safety employment, health or safety Safe Environment General liability (slips and falls, etc.)
laws or agreements, from
Employee health & safety rules and events
payment of personal injury
claims, or from diversity / Workers compensation
discrimination events. Diversity & All discrimination types
Discrimination

16

Appendix – Basel Loss Event Type Definitions (Continued)

(Level 1)
Clients, Products & Losses arising from an Suitability, Fiduciary breaches / guideline violations
Business Practice unintentional or negligent Disclosure & Suitability / disclosure issues (KYC, etc.)
failure to meet a professional Fiduciary Retail consumer disclosure violations
obligation to specific clients
Breach of privacy
(including fiduciary and
suitability requirements), or Aggressive sales
from the nature or design of Account churning
a product. Misuse of confidential information
Lender liability
Improper Business or Antitrust
Market Practices Improper trade / market practice
Market manipulation
Insider trading (on firm's account)
Unlicensed activity
Money laundering
Product Flaws Product defects (unauthorized, etc.)
Model errors
Selection, Failure t investigate client per guidelines
Sponsorship & Exceeding client exposure limits
E
Advisory Activities Disputes over performance or advisory activities

Damage to Physical Losses arising from loss or Disasters and Other Natural disaster losses
Assets damage to physical assets Events Human losses from external sources (terrorism,
from natural disaster or other vandalism)
Business Disruption Losses arising from disruption Systems Hardware
& Systems Failures of business or system Software
failures Telecommunications
Utility outage / disruptions

17

Appendix – Basel Loss Event Type Definitions (Continued)

(Level 1)
Execution, Delivery Losses from failed Transaction Miscommunication
& Process transaction processing or Capture, Execution Data entry, maintenance or loading error
Management process management, from & Maintenance Missed deadline or responsibility
relations with trade
Model / system misoperation
counterparties and vendors
Accounting error / entity attribution error
Other task misperformance
Delivery failure
Collateral management failure
Reference data maintenance
Monitoring & Failed mandatory reporting obligation
Reporting Inaccurate external report (loss incurred)
Customer Intake & Client permissions / disclaimers missed
Documentation Legal documents missing / incomplete
Customer / Client Unapproved access given to accounts
Account Incorrect client records (loss incurred)
Management Negligent loss or damage of client assets
Trade Non-client counterparty misperformance
Counterparties Misc. non-client counterparty disputes
Vendors & Suppliers Outsourcing
Vendor disputes

18

Appendix - References

References on Loss Distribution Approach Modeling, Frequency and Severity Fitting, and Monte Carlo Simulation:
1. A. Chernobai, S. T. Rachev, F. J. Fabozzi (2005), Composite Goodness-of-Fit Tests for Left-Truncated Samples, Technical report,
University of California Santa Barbara
2. A. J. McNeil, R. Frey, P. Embrechts (2005), Quantitative Risk Management: Concepts, Techniques, and Tools, Princeton University
Press, Princeton
3. B. Bolker (2007), Optimization and All That, Draft of Chapter 7 of B. Bolker (2008), Ecological Models and Data in R, Princeton
University Press, Princeton
4. G.J. McLachlan, D. Peel (2000), Finite Mixture Models, Wiley & Sons, New York
5. H. Panjer (2006), Operational Risk: Modeling Analytics, Wiley & Sons, New York, p. 293.
6. K. Dutta, J. Perry (2006), A Tale of Tails: An Empirical Analysis of Loss Distribution Models for Estimating Operational Risk Capital,
Working Paper No. 06-13, Federal Reserve Bank of Boston.
7. M. Moscadelli (2004), The Modelling of Operational Risk: Experience with the Analysis of the Data Collected by the Basel Committee,
Temi di Discussione No. 517, Banca d’Italia.
8. P. de Fontnouvelle, E. Rosengren, J. Jordan (2007), Implications of Alternative Operational Risk Modeling Techniques, In: M. Carey
and R.M. Stulz (eds), The Risks of Financial Institutions, University of Chicago Press, pp. 475-512.
9. S.A. Klugman, H.H. Panjer, G.E. Willmot (2008), Loss Models: From Data to Decisions, 3rd ed., Wiley & Sons, Hoboken, NJ
10. S. Nadarajah, S. Kotz (2006), R Programs for Computing Truncated Distributions, Journal of Statistical Software 16(2)

19

Achieving High-Performing, Simulation-Based Operational Risk Measurement with RevoScaleR

Recommended

Recommended

More Related Content

More from Revolution Analytics

More from Revolution Analytics (20)

Recently uploaded

Recently uploaded (20)

Achieving High-Performing, Simulation-Based Operational Risk Measurement with RevoScaleR