SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
NORTHERN TRUST




                                        Operational Risk Quantification System
                                        Northern Trust Corporation




June 28th 2012
Achieving High-Performing, Simulation-Based Operational Risk
Measurement with R and RevoScaleR
Presented by David Humke, Vice President, Corporate Risk Analytics and Insurance, Northern Trust

  Disclaimer: The views expressed in this presentation are the views of the author and do not
  necessarily reflect the opinions of Northern Trust Corporation or Revolution Analytics ©2012.
Agenda

   Basel II Overview
        Operational Risk – Definition
        Requirements of an Operational Risk Exposure Estimate
   Loss Distribution Approach (LDA)
        Segmenting Loss Data into Units of Measure
        Literature on Frequency and Severity Modeling
        Monte Carlo Simulation within an LDA based Operational Risk Exposure Model
   Potential solutions for faster Monte Carlo Simulation
        Description of the test environments utilized
        Results from various methods of enhancement




2
Operational Risk Loss Events in the News


   Barings Bank (1995) –$1.3 billion loss due to speculative trading performed by currency trader Nick Leeson. This
    loss ultimately lead to the collapse of the bank.


   Societe Generale (2008) - $7 billion loss based on the fraudulent activities of rogue futures trader Jerome Kerviel


   DBS Bank, Ltd. (2010) - $310 million penalty imposed by the Monetary Authority of Singapore due to a seven hour
    system-wide outage that left customers unable to use mobile, internet, and ATM services. Additionally, customers
    were not able to make any debit or credit card transactions during the outage.


   Citibank (2011) - $285 million settlement related to a failure to disclose to investors its role in the asset-selection
    process for a hybrid Collateralized Debt Obligation the bank offered.


   Multiple Banks (2012) - $25 billion in settlements and penalties regarding five large lenders’ improper foreclosure
    practices between January 2008 and December 2011.




        Note: Details for each of the events above were obtained from the Algorithmic’s FIRST database.


    3
Basel II and Operational Risk
   In December of 2007, the US Federal Reserve System finalized a document commonly referred to as the “Final Rules”
    which set forth general requirements for the measurement of operational risk by large US financial institutions.1
        These rules defined operational risk as the risk of loss resulting from inadequate or failed internal processes, people, and systems or
         from external events (including legal risk but excluding strategic and reputational risk)
            Seven Distinct Basel Loss Event Types2:
              1.   Internal Fraud
              2.   External Fraud                                         5.   Damage to Physical Assets
              3.   Business Disruptions/System Failure                    6.   Clients, Products, and Business Practice Matters
              4.   Execution, Delivery and Process Management             7.   Employee Practices and Workplace Safety Issues.
        Other classifications are available to describe losses as defined by regulators and banks
            E.g. - Business Lines, Regions of Operations, Causal of Loss Category, etc
        The Final Rules require banks to estimate an operational risk exposure amount that corresponds to the 99.9th percentile of the
         distribution of potential aggregate operational losses, as generated by the bank’s operational risk quantification system over a one-
         year horizon.
              Exposure estimates must:
               a) Incorporate four data elements: Internal Loss Data, External Loss Data, Scenario Analysis Data, and Business
                      Environment/Internal Control Factor data.
                   b) Be calculated using systematic, transparent, verifiable, and credible methodologies
                   c) A separate exposure estimate must be calculated for each set of operational loss data demonstrating a statistically
                      distinct loss profile.
   The banking industry has focused on the use of the Loss Distribution Approach (LDA) to calculate operational risk
    exposure estimates.
      1.Risk-Based Capital Standards: Advanced Capital Adequacy Framework – Basel II; Final Rule (2007), Federal Register 72(235), 69407 – 408. Also, see
      Operational Risk – Supervisory Guidelines for the Advanced Measurement Approaches; BIS June 2011.
      2. See the Appendix for examples of each loss event type

     4
Overview of the Loss Distribution Approach (LDA)
   Under the LDA, banks must segment their loss                              Segment Loss Data
    data to obtain datasets that are not demonstrably                          into Homogenous
    heterogeneous.                                                               Loss Datasets
         These datasets are referred to as units of measure or
          UOMs                                                                              Loss Distribution Approach
         These datasets are used for subsequent modeling
          within the LDA
                                                                                                  Frequency
   The LDA models two primary components of                                                      Distribution
    operational loss data:                                                                              λ
         Loss Frequency                                                                                                 Aggregate Loss
                                                                                         # of loss events per year       Distribution
          The banking industry has widely accepted a
            Poisson distribution as an appropriate distribution.     Internal
                                                                      and/or
                                                                                                   Monte Carlo
         Loss Severity                                                                            Simulation
                                                                     External
          Fitting a parametric distribution to operational loss       Loss
            data is one of the biggest challenges in measuring         Data
            operational risk exposure.

                                                                                                  Severity
                                                                                                  Distribution        Operational Risk Exposure
   Monte Carlo Simulation is then utilized to compound the                                                           is estimated as the 99.9th
    two distributions.                                                                                                percentile of the aggregate
                                                                                                                      loss distribution; a 1/1,000
         A large number of simulations must be run to observe a sufficient
                                                                                         $ value of loss event        year event *
          number of losses to reasonably assess what a 1 in 1,000 year
          event might look like…More on this shortly
                                                                               * Banks typically sum the VaR estimates for their UOMs and
                                                                                 perform diversification modeling to move away from the
                                                                                 assumption of positive correlation.
      5
Segmenting Loss Data into Units of Measure
   Banks must segment their loss data to obtain datasets that are not demonstrably heterogeneous.
           Which data classifications captured in the bank’s operational loss database best characterize the bank’s operational risk exposure?
                 Banks often capture a variety of details about individual loss events – E.g. Region of occurrence, Business Line, Basel Event Type
           How granular should classification be?
   Once an appropriate set of classifying variables has been identified, a natural starting point to narrow in on
    homogenous datasets is to look at loss frequency and loss severity within the identified variables
    Example Data:
                                             Basel Loss Event Types                            In this example, the bank has
     Loss Counts by Business
                             CPBP   BDSF     IF      EPWS        EF    DPA    EDPM     Total
                                                                                                determined that 4 Business Lines
      Line and Event Type
                                                                                                and the 7 Basel Loss Event Types are
    Commercial Banking           20     50     -          20       550     50     -       690
       in % of Total         2.90%  7.25%  0.00%      2.90%    79.71%  7.25%   0.00% 100.00%    a reasonable representation of
    Payment and Settlement
       in % of Total
                                 10
                             1.27%
                                        30
                                    3.80%  1.27%
                                                 10       15
                                                      1.90%
                                                                   440    260
                                                               55.70% 32.91%
                                                                                    25    790
                                                                               3.16% 100.00%
                                                                                                operational risk exposure.
    Agency Services                      80          130             10         30        850         10               5     1,115
      in % of Total                7.17%          11.66%       0.90%      2.69%        76.23%     0.90%         0.45%      100.00%
                                                                                                                                                                         Not every business line has a large
    Other                            -                  5            5          30         10             5            5         60                                       number of loss events.
      in % of Total                0.00%          8.33%        8.33%      50.00%       16.67%     8.33%         8.33%      100.00%
    Total                            110             215             25         95      1,850        325           35        2,655                                       Within a business line, not every
      in % of Total                4.14%          8.10%        0.94%      3.58%        69.68%    12.24%         1.32%      100.00%                                        Basel Event Type classification level
                                                                                 Basel Loss Event Types                                                                   has a large number of data
    Loss Amounts by Business
       Line and Event Type
             (in $ MM)
                                    CPBP                BDSF              IF            EPWS              EF               DPA         EDPM           Total         Basel Loss Event Type:
    Commercial Banking         $          3.00 $            18.00 $            1.00 $       10.00 $           90.00 $         1.00 $       -      $    123.00            BDSF and EF look distinct
      in % of Total                      2.44%              14.63%             0.81%         8.13%            73.17%         0.81%       0.00%         100.00%
    Payment and Settlement     $          1.00 $             7.00 $            1.00 $        4.00 $           70.00 $        25.00 $     25.00 $       133.00
      in % of Total                      0.75%               5.26%             0.75%         3.01%            52.63%        18.80%       18.80%        100.00%      Business Line:
    Agency Services            $          6.00 $            240.00 $           3.00 $        2.00 $           225.00 $        5.00 $    150.00 $       631.00
      in % of Total                      0.95%              38.03%             0.48%         0.32%            35.66%         0.79%       23.77%        100.00%           All 4 Business Lines might be distinct
    Other                      $              -     $        3.00 $            1.00 $       15.00 $            1.00 $         3.00 $     10.00 $         33.00
      in % of Total
    Total                      $
                                         0.00%
                                         10.00 $
                                                             9.09%
                                                            268.00 $
                                                                               3.03%
                                                                               6.00 $
                                                                                            45.45%
                                                                                            31.00 $
                                                                                                               3.03%
                                                                                                              386.00 $
                                                                                                                             9.09%
                                                                                                                             34.00 $
                                                                                                                                         30.30%
                                                                                                                                        185.00 $
                                                                                                                                                       100.00%
                                                                                                                                                       920.00
                                                                                                                                                                    Additional testing is required to
      in % of Total                      1.09%              29.13%             0.65%         3.37%            41.96%         3.70%       20.11%        100.00%       identify homogenous datasets
    Note: The data above are fiction, created for this example
        6
Using R to Identify a Homogenous Loss Dataset
   R can produce a variety of descriptive statistics, graphics, and hypothesis tests that are useful to evaluate whether
    loss data should be merged (homogenous) or separated (heterogeneous).
    Example: Is Business Disruptions & Systems Failure Loss Event Type statistically distinct from the
             Commercial Banking Business Line?
                                                                                                                        Quantiles
     Datasets                                 Count       Mean(Log)    SD(Log)   50.0% 75.0%        90.0%      95.0%       98.0%       99.0%       99.5%        Max
     Commercial Banking                               640       9.40        1.08 $ 7.89 $ 16.96 $ 65.39 $ 120.79 $ 281.52 $ 423.09 $ 777.20 $ 3,376.89
     Business Disruptions & Systems Failure           215     11.55         2.08 $ 74.94 $ 424.44 $ 1,520.39 $ 5,904.35 $ 12,743.90 $ 17,255.26 $ 19,750.42 $ 19,954.35
     - Quantiles in $ Thousands




                                                                                                     Test              Statistic    pValue
                                                                                             Kolmogorov-Smirnov              0.53           0
                                                                                             Chi- Square                   270.57 3.36674E-50
                                                                                             Anderson Darling              151.16           0




                                                                                  Conclusion:
                                                                                  The preponderance of evidence suggests that
                                                                                  Commercial Banking and BDSF are statistically distinct.
                                                                                  A separate risk exposure estimate should be calculated
                                                                                  for each of these datasets.




       7
Using R to Identify Homogenous Loss Datasets
   R offers the capability to produce a variety of descriptive statistics, graphics, and hypothesis tests that are useful to
    evaluate whether loss data should be merged (homogenous) or separated (heterogeneous).
    Example: Is Other Business Line statistically distinct from the Commercial Banking Business Line?

                                                                                         Quantiles
     Datasets                     Count Mean(Log) SD(Log) 50.0%   75.0%   90.0%   95.0%     98.0%   99.0%     99.5%       Max
     Commercial Banking             640       9.40   1.08 $ 7.81 $ 17.61 $ 51.60 $ 106.22 $ 390.62 $ 617.59 $ 1,162.25 $ 3,142.99
     Other                           55       9.32   1.02 $ 7.70 $ 14.58 $ 40.38 $ 91.85 $ 138.79 $ 434.62 $ 607.40 $ 780.18
     - Quantiles in $ Thousands


                                                                          Test         Statistic     pValue
                                                                  Kolmogorov-Smirnov         0.061       0.992
                                                                  Chi- Square                6.580       0.884
                                                                  Anderson Darling          -0.998       0.627



                                                                Conclusion:
                                                                The preponderance of evidence suggests that we cannot
                                                                conclude the Commercial Banking and ‘Other’ Business
                                                                Lines are statistically distinct.
                                                                These data can be aggregated into a single data set for
                                                                frequency and severity modeling.


                                                                If a business rationale exists to keep these data sets
                                                                separate, banks may do so.
      8
Frequency Distribution Fitting in R
Fitting a Frequency Distribution:
   The banking industry has focused on the use of a Poisson distribution to model the frequency of operational loss
    events.
        The Poisson is parameterized by one parameter, λ, which is equivalent to the average frequency over the time horizon being
         estimated (1 year).
        Various methods are used to parameterize the Poisson distribution
            Simple Annual Average
                                                                                                          Bank identified internal and
            Regression Analysis based on internal/external variables – See the function lm() in R        external data characteristics might
            Poisson Regression based on internal/external variables – See the function glm() in R        help explain operational loss
                                                                                                          frequency



                                                 Commercial Banking
                                                   Year    Loss Counts
                                                   2005              76         Once a parameter estimate, λ ,has been identified,
                                                   2006              82          obtaining the density, distribution function, quantile
                                                   2007              94          function and random generation for the Poisson
                                                   2008              64
                                                                                 distribution is quite easy:
                                                   2009              90
                                                   2010             103              See dpois, ppois, rpois in R for more details.
                                                   2011              96
                                                   2012              85
                                                   Total            690
                                                 Average          86.25




     9
Fitting a Severity Distribution
Fitting a Severity Distribution:
   Many great authors have published overviews on the process for severity distribution fitting within the context of an
    LDA model*.
         The industry currently practices a variety of loss severity modeling techniques
             Fitting a single parametric distribution to the entire dataset (e.g. – log normal, pareto, log gamma, weibull, etc.)
             Fitting a mixture of parametric distributions to the loss severity data
             Fitting multiple parametric distributions that have non-overlapping ranges (“Splicing”)
             Extreme Value Theory (EVT) and the Peaks Over Thresholds Method
   Challenges associated with fitting a severity distribution include:
     1.   The Final Rule asks banks to estimate a 1 in 1,000 year event based on less than 15 years of operational loss data
     2.   Data collection thresholds – Use of shifted distributions or truncated distributions?
     3.   Operational Loss Databases are often “living” – Loss severities, loss data classifications, and risk types can be modified
     4.   Data Paucity – In many cases banks have units of measure that have a small number of observations (< 1,000).
     5.   Undetected Heterogeneity of Datasets – Tests performed to identify heterogeneous datasets are not perfect at doing so
             Small data sets can impede this effort.
     6.   Fat-Tailed Data – Banks are faced with UOMs that have a small number of observations which are often best described by a
          heavily skewed distribution.
             Limited data in the tail can result in volatile capital estimates (e.g. – capital can swing upwards or downwards by hundreds of
                millions of $) based on the inclusion of a few events.
             Volatile results can present subsequent challenges for obtaining senior management buy-in on risk exposure estimates.
     * Please see the references slide at the end of this presentation for a short list of books and papers that provide additional detail on
       operational risk modeling.
    10
Fitting a Severity Distribution in R
Fitting a Severity Distribution:
   A variety of optimization routines exist in R that are capable of fitting severity distribution to loss data.
        Using the optim() in R, one needs to specify:
             1. Density Function: - sum(densityFunction(x=data, log=TRUE))
             2. Starting Parameters: Contingent upon the distribution being fit
             3. Optimization Routine: Nelder-Mead, BFGS, SANN, etc.
        See B. Bolker for more on optimization routines in R beyond the optim() function.

   Fitting truncated severity distributions
        The actuar package provides density, distribution, and quantile functions as well as random number generators for fat-tailed
         distributions
        See Nadarajah and Kotz for code that will facilitate the fitting of a truncated density, distribution, quantile function, and random
         number generator.
   Identifying a “best-fit” severity distribution to the loss data
        QQ-Plot of the empircal data against the fitted distributions – plot(), qqplot()
        Plot the empirical cdf against the fitted distribution – ecdf()
        See truncgof R package and A. Chernobai, S. T. Rachev, F. J. Fabozzi for goodness-of-fit tests and some adjusted exploratory tools
         that work with left truncated data.

   Many packages exist that perform EVT severity distribution fitting:
        See A. J. McNeil, R. Frey, P. Embrechts and the evir package in R.

   Fitting and evaluating mixture distributions are more complex endeavors…
        See the GAMLSS package in R and http://www.gamlss.org/

    11
Overview of the Loss Distribution Approach (LDA)
   Thus far we have discussed:
          Segmentation of loss data to obtain datasets that are not               Segment Loss Data
           demonstrably heterogeneous.                                              into Homogenous
          Loss Frequency Modeling                                                    Loss Datasets

          Loss Severity
                                                                                              Loss Distribution Approach
   We have not yet discussed Monte Carlo Simulation…
          Many simulations containing millions of iterations must be                               Frequency
           run to observe a sufficient number of losses to reasonably                               Distribution
           assess what a 1 in 1,000 year event might look like                                            λ
          This results in multiple days being lost to wait on code to                                                      Aggregate Loss
           complete.                                                                         # of loss events per year      Distribution
              Northern explored opportunities to parallelize                  Internal
                 Monte Carlo simulation with Revolution Analytics                                      Monte Carlo
                                                                                and/or                 Simulation
                                                                               External
                                                                                 Loss
                                                                                 Data
    Example Code:
    # Randomly draw n frequency observations from a Poisson distribution,
      then draw random severities from the specified truncated severity                             Severity
      distribution, truncated at point a. Sum up each of the individual loss                        Distribution         Operational Risk
      amounts.                                                                                                           Exposure is estimated as
         f_tr <- function() {                                                                                            the 99.9th percentile of
           sum(do.call("rtrunc", c(n=rpois(1, lambda),                                                                   the aggregate loss
                                                                                             $ value of loss event       distribution; a 1/1,000
                      spec=distName, a=a, parList)))
         }                                                                                                               year event
    # Simulate a large number of iterations and replicate the simulation a
    number of times to reduce sample noise
           simuMatrix <- replicate(30, replicate(1e+6, f_tr()))

      12
Monte Carlo Simulation Benchmarking Analysis

   Northern Trust and Revolution Analytics Evaluate Various Methods to Enhance Monte Carlo Simulation
        Use a different version of R: 32B, 64B (e.g. – Update your operating system)
        Use various parallelization packages: doSNOW, doRSR, & doSMP, (doRSR & doSMP are Revolution Analytics product offerings)
        Use multiple processors and/or machines:
             Single node with multiple cores
             Cluster of CPUs with multiple cores
   Hardware Environments:
        4-core laptop
        3-node High Performing Cluster (HPC) on Amazon Cloud
         Configured and run with 8-cores on each node
         Each node was restricted from 16- to 8-cores
   Metrics used to evaluate each method:
        Elapsed Time by Step
        Memory usage




    13
Monte Carlo Benchmarking Highlights
                               Revolution Analytics’ parallelization can be easily scaled up from laptop/server to
                                the cluster using Revolution Analytics’ distributed computing capabilities
                               Parallelization greatly improves simulation performance
          64bit is better
                               Elapsed time is linear in # of iterations
                               Performance improves with # of cores
                               Revo ~ Cran within a node (no MKL impact in this study)
                               doRSR slightly better than doSMP on a single server
                               64bit marginally better that 32bit
                               Performance scales with cluster resources
                               Memory use just driven by # of iterations




     doRSR ~ doSMP                                                                 Memory Trends
     within a node                     Scales with # Cores




14
Take-Aways, Next Steps, and Contacts

Parallelizations Offers Business Enhancements:
   Less time spent waiting on programs to complete
        Means more time to analyze drivers of change (e.g. – underlying data changes)
   More efficient management of computing resources
        No need to manually manage/schedule programs
   Scalability of the solution to available resources
        Revolution Analytics’ parallelization routines are scalable to the resources available



Contact Information:
   Dave Humke, Northern Trust, Vice President, (dh98@ntrs.com)
   Derek Norton, Revolution Analytics, (derek.norton@revolutionanalytics.com)




    15
Appendix – Basel Loss Event Type Definition

     Event Type Category Definition                   Categories (Level 2)   Activity Examples (Level 3)
     (Level 1)
     Internal Fraud      Loss due to acts of a type     Unauthorized          Transactions not reported (intentional)
                         intended to defraud,           Activity              Transaction type unauthorized (with monetary loss)
                         misappropriate property or                           Mismarking of position (intentional)
                         circumvent regulations, the
                                                        Theft and Fraud       Fraud / credit fraud / worthless deposits
                         law or company policy,
                         excluding diversity /                                Theft / extortion / embezzlement / robbery
                         discrimination events, which                         Misappropriation of assets
                         involves at least one internal                       Forgery
                         party.                                               Check kiting
                                                                              Smuggling
                                                                              Account take-over / impersonation, etc.
                                                                              Tax non-compliance / evasion (willful)
                                                                              Bribes / kickbacks
                                                                              Insider trading (not on firm's account)
     External Fraud      Losses due to acts of a type Theft and Fraud         Theft / robbery
                         intended to defraud,                                 Forgery
                         misappropriate property or                           Check kiting
                         circumvent the law, by a
                                                      Systems Security        Hacking damage
                         third party
                                                                              Theft of information (with monetary loss)
     Employment          Losses arising from acts     Employee Relations      Compensation, benefit, termination issues
     Practices and       inconsistent with                                    Organized labor activities
     Workplace Safety    employment, health or safety Safe Environment        General liability (slips and falls, etc.)
                         laws or agreements, from
                                                                              Employee health & safety rules and events
                         payment of personal injury
                         claims, or from diversity /                          Workers compensation
                         discrimination events.       Diversity &            All discrimination types
                                                      Discrimination




16
Appendix – Basel Loss Event Type Definitions (Continued)

     Event Type Category Definition                    Categories (Level 2)   Activity Examples (Level 3)
     (Level 1)
     Clients, Products & Losses arising from an         Suitability,           Fiduciary breaches / guideline violations
     Business Practice   unintentional or negligent     Disclosure &           Suitability / disclosure issues (KYC, etc.)
                         failure to meet a professional Fiduciary              Retail consumer disclosure violations
                         obligation to specific clients
                                                                               Breach of privacy
                         (including fiduciary and
                         suitability requirements), or                         Aggressive sales
                         from the nature or design of                          Account churning
                         a product.                                            Misuse of confidential information
                                                                               Lender liability
                                                       Improper Business or    Antitrust
                                                       Market Practices        Improper trade / market practice
                                                                               Market manipulation
                                                                               Insider trading (on firm's account)
                                                                               Unlicensed activity
                                                                               Money laundering
                                                       Product Flaws           Product defects (unauthorized, etc.)
                                                                               Model errors
                                                       Selection,              Failure t investigate client per guidelines
                                                       Sponsorship &           Exceeding client exposure limits
                                                       E
                                                       Advisory Activities    Disputes over performance or advisory activities

     Damage to Physical Losses arising from loss or    Disasters and Other     Natural disaster losses
     Assets             damage to physical assets      Events                  Human losses from external sources (terrorism,
                        from natural disaster or other                         vandalism)
     Business Disruption Losses arising from disruption Systems                Hardware
     & Systems Failures of business or system                                  Software
                         failures                                              Telecommunications
                                                                               Utility outage / disruptions


17
Appendix – Basel Loss Event Type Definitions (Continued)


     Event Type Category Definition                   Categories (Level 2)   Activity Examples (Level 3)
     (Level 1)
     Execution, Delivery Losses from failed           Transaction             Miscommunication
     & Process           transaction processing or    Capture, Execution      Data entry, maintenance or loading error
     Management          process management, from     & Maintenance           Missed deadline or responsibility
                         relations with trade
                                                                              Model / system misoperation
                         counterparties and vendors
                                                                              Accounting error / entity attribution error
                                                                              Other task misperformance
                                                                              Delivery failure
                                                                              Collateral management failure
                                                                              Reference data maintenance
                                                      Monitoring &            Failed mandatory reporting obligation
                                                      Reporting               Inaccurate external report (loss incurred)
                                                      Customer Intake &       Client permissions / disclaimers missed
                                                      Documentation           Legal documents missing / incomplete
                                                      Customer / Client       Unapproved access given to accounts
                                                      Account                 Incorrect client records (loss incurred)
                                                      Management              Negligent loss or damage of client assets
                                                      Trade                   Non-client counterparty misperformance
                                                      Counterparties          Misc. non-client counterparty disputes
                                                      Vendors & Suppliers     Outsourcing
                                                                              Vendor disputes




18
Appendix - References

References on Loss Distribution Approach Modeling, Frequency and Severity Fitting, and Monte Carlo Simulation:
  1.    A. Chernobai, S. T. Rachev, F. J. Fabozzi (2005), Composite Goodness-of-Fit Tests for Left-Truncated Samples, Technical report,
        University of California Santa Barbara
  2.    A. J. McNeil, R. Frey, P. Embrechts (2005), Quantitative Risk Management: Concepts, Techniques, and Tools, Princeton University
        Press, Princeton
  3.    B. Bolker (2007), Optimization and All That, Draft of Chapter 7 of B. Bolker (2008), Ecological Models and Data in R, Princeton
        University Press, Princeton
  4.    G.J. McLachlan, D. Peel (2000), Finite Mixture Models, Wiley & Sons, New York
  5.    H. Panjer (2006), Operational Risk: Modeling Analytics, Wiley & Sons, New York, p. 293.
  6.    K. Dutta, J. Perry (2006), A Tale of Tails: An Empirical Analysis of Loss Distribution Models for Estimating Operational Risk Capital,
        Working Paper No. 06-13, Federal Reserve Bank of Boston.
  7.    M. Moscadelli (2004), The Modelling of Operational Risk: Experience with the Analysis of the Data Collected by the Basel Committee,
        Temi di Discussione No. 517, Banca d’Italia.
  8.    P. de Fontnouvelle, E. Rosengren, J. Jordan (2007), Implications of Alternative Operational Risk Modeling Techniques, In: M. Carey
        and R.M. Stulz (eds), The Risks of Financial Institutions, University of Chicago Press, pp. 475-512.
  9.    S.A. Klugman, H.H. Panjer, G.E. Willmot (2008), Loss Models: From Data to Decisions, 3rd ed., Wiley & Sons, Hoboken, NJ
  10.   S. Nadarajah, S. Kotz (2006), R Programs for Computing Truncated Distributions, Journal of Statistical Software 16(2)




  19

Weitere ähnliche Inhalte

Mehr von Revolution Analytics

The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
Revolution Analytics
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint Package
Revolution Analytics
 

Mehr von Revolution Analytics (20)

The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solution
 
Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceReproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R Conference
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint Package
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 

Kürzlich hochgeladen

MASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdfMASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
Cocity Enterprises
 
FOREX FUNDAMENTALS: A BEGINNER'S GUIDE.pdf
FOREX FUNDAMENTALS: A BEGINNER'S GUIDE.pdfFOREX FUNDAMENTALS: A BEGINNER'S GUIDE.pdf
FOREX FUNDAMENTALS: A BEGINNER'S GUIDE.pdf
Cocity Enterprises
 
abortion pills in Riyadh Saudi Arabia (+919707899604)cytotec pills in dammam
abortion pills in Riyadh Saudi Arabia (+919707899604)cytotec pills in dammamabortion pills in Riyadh Saudi Arabia (+919707899604)cytotec pills in dammam
abortion pills in Riyadh Saudi Arabia (+919707899604)cytotec pills in dammam
samsungultra782445
 
QATAR Pills for Abortion -+971*55*85*39*980-in Dubai. Abu Dhabi.
QATAR Pills for Abortion -+971*55*85*39*980-in Dubai. Abu Dhabi.QATAR Pills for Abortion -+971*55*85*39*980-in Dubai. Abu Dhabi.
QATAR Pills for Abortion -+971*55*85*39*980-in Dubai. Abu Dhabi.
hyt3577
 
Abortion pills in Saudi Arabia (+919707899604)cytotec pills in dammam
Abortion pills in Saudi Arabia (+919707899604)cytotec pills in dammamAbortion pills in Saudi Arabia (+919707899604)cytotec pills in dammam
Abortion pills in Saudi Arabia (+919707899604)cytotec pills in dammam
samsungultra782445
 
+971565801893>>SAFE ORIGINAL ABORTION PILLS FOR SALE IN DUBAI,RAK CITY,ABUDHA...
+971565801893>>SAFE ORIGINAL ABORTION PILLS FOR SALE IN DUBAI,RAK CITY,ABUDHA...+971565801893>>SAFE ORIGINAL ABORTION PILLS FOR SALE IN DUBAI,RAK CITY,ABUDHA...
+971565801893>>SAFE ORIGINAL ABORTION PILLS FOR SALE IN DUBAI,RAK CITY,ABUDHA...
Health
 

Kürzlich hochgeladen (20)

MASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdfMASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
 
Famous Kala Jadu, Kala ilam specialist in USA and Bangali Amil baba in Saudi ...
Famous Kala Jadu, Kala ilam specialist in USA and Bangali Amil baba in Saudi ...Famous Kala Jadu, Kala ilam specialist in USA and Bangali Amil baba in Saudi ...
Famous Kala Jadu, Kala ilam specialist in USA and Bangali Amil baba in Saudi ...
 
Famous Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist i...
Famous Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist i...Famous Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist i...
Famous Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist i...
 
Test bank for advanced assessment interpreting findings and formulating diffe...
Test bank for advanced assessment interpreting findings and formulating diffe...Test bank for advanced assessment interpreting findings and formulating diffe...
Test bank for advanced assessment interpreting findings and formulating diffe...
 
FOREX FUNDAMENTALS: A BEGINNER'S GUIDE.pdf
FOREX FUNDAMENTALS: A BEGINNER'S GUIDE.pdfFOREX FUNDAMENTALS: A BEGINNER'S GUIDE.pdf
FOREX FUNDAMENTALS: A BEGINNER'S GUIDE.pdf
 
Significant AI Trends for the Financial Industry in 2024 and How to Utilize Them
Significant AI Trends for the Financial Industry in 2024 and How to Utilize ThemSignificant AI Trends for the Financial Industry in 2024 and How to Utilize Them
Significant AI Trends for the Financial Industry in 2024 and How to Utilize Them
 
Collecting banker, Capacity of collecting Banker, conditions under section 13...
Collecting banker, Capacity of collecting Banker, conditions under section 13...Collecting banker, Capacity of collecting Banker, conditions under section 13...
Collecting banker, Capacity of collecting Banker, conditions under section 13...
 
Strategic Resources May 2024 Corporate Presentation
Strategic Resources May 2024 Corporate PresentationStrategic Resources May 2024 Corporate Presentation
Strategic Resources May 2024 Corporate Presentation
 
Female Escorts Service in Hyderabad Starting with 5000/- for Savita Escorts S...
Female Escorts Service in Hyderabad Starting with 5000/- for Savita Escorts S...Female Escorts Service in Hyderabad Starting with 5000/- for Savita Escorts S...
Female Escorts Service in Hyderabad Starting with 5000/- for Savita Escorts S...
 
Call Girls Howrah ( 8250092165 ) Cheap rates call girls | Get low budget
Call Girls Howrah ( 8250092165 ) Cheap rates call girls | Get low budgetCall Girls Howrah ( 8250092165 ) Cheap rates call girls | Get low budget
Call Girls Howrah ( 8250092165 ) Cheap rates call girls | Get low budget
 
Dubai Call Girls Deira O525547819 Dubai Call Girls Bur Dubai Multiple
Dubai Call Girls Deira O525547819 Dubai Call Girls Bur Dubai MultipleDubai Call Girls Deira O525547819 Dubai Call Girls Bur Dubai Multiple
Dubai Call Girls Deira O525547819 Dubai Call Girls Bur Dubai Multiple
 
W.D. Gann Theory Complete Information.pdf
W.D. Gann Theory Complete Information.pdfW.D. Gann Theory Complete Information.pdf
W.D. Gann Theory Complete Information.pdf
 
abortion pills in Riyadh Saudi Arabia (+919707899604)cytotec pills in dammam
abortion pills in Riyadh Saudi Arabia (+919707899604)cytotec pills in dammamabortion pills in Riyadh Saudi Arabia (+919707899604)cytotec pills in dammam
abortion pills in Riyadh Saudi Arabia (+919707899604)cytotec pills in dammam
 
QATAR Pills for Abortion -+971*55*85*39*980-in Dubai. Abu Dhabi.
QATAR Pills for Abortion -+971*55*85*39*980-in Dubai. Abu Dhabi.QATAR Pills for Abortion -+971*55*85*39*980-in Dubai. Abu Dhabi.
QATAR Pills for Abortion -+971*55*85*39*980-in Dubai. Abu Dhabi.
 
Toronto dominion bank investor presentation.pdf
Toronto dominion bank investor presentation.pdfToronto dominion bank investor presentation.pdf
Toronto dominion bank investor presentation.pdf
 
Mahendragarh Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Mahendragarh Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsMahendragarh Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Mahendragarh Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
Abortion pills in Saudi Arabia (+919707899604)cytotec pills in dammam
Abortion pills in Saudi Arabia (+919707899604)cytotec pills in dammamAbortion pills in Saudi Arabia (+919707899604)cytotec pills in dammam
Abortion pills in Saudi Arabia (+919707899604)cytotec pills in dammam
 
Shrambal_Distributors_Newsletter_May-2024.pdf
Shrambal_Distributors_Newsletter_May-2024.pdfShrambal_Distributors_Newsletter_May-2024.pdf
Shrambal_Distributors_Newsletter_May-2024.pdf
 
Certified Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil ba...
Certified Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil ba...Certified Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil ba...
Certified Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil ba...
 
+971565801893>>SAFE ORIGINAL ABORTION PILLS FOR SALE IN DUBAI,RAK CITY,ABUDHA...
+971565801893>>SAFE ORIGINAL ABORTION PILLS FOR SALE IN DUBAI,RAK CITY,ABUDHA...+971565801893>>SAFE ORIGINAL ABORTION PILLS FOR SALE IN DUBAI,RAK CITY,ABUDHA...
+971565801893>>SAFE ORIGINAL ABORTION PILLS FOR SALE IN DUBAI,RAK CITY,ABUDHA...
 

Achieving High-Performing, Simulation-Based Operational Risk Measurement with RevoScaleR

  • 1. NORTHERN TRUST Operational Risk Quantification System Northern Trust Corporation June 28th 2012 Achieving High-Performing, Simulation-Based Operational Risk Measurement with R and RevoScaleR Presented by David Humke, Vice President, Corporate Risk Analytics and Insurance, Northern Trust Disclaimer: The views expressed in this presentation are the views of the author and do not necessarily reflect the opinions of Northern Trust Corporation or Revolution Analytics ©2012.
  • 2. Agenda  Basel II Overview  Operational Risk – Definition  Requirements of an Operational Risk Exposure Estimate  Loss Distribution Approach (LDA)  Segmenting Loss Data into Units of Measure  Literature on Frequency and Severity Modeling  Monte Carlo Simulation within an LDA based Operational Risk Exposure Model  Potential solutions for faster Monte Carlo Simulation  Description of the test environments utilized  Results from various methods of enhancement 2
  • 3. Operational Risk Loss Events in the News  Barings Bank (1995) –$1.3 billion loss due to speculative trading performed by currency trader Nick Leeson. This loss ultimately lead to the collapse of the bank.  Societe Generale (2008) - $7 billion loss based on the fraudulent activities of rogue futures trader Jerome Kerviel  DBS Bank, Ltd. (2010) - $310 million penalty imposed by the Monetary Authority of Singapore due to a seven hour system-wide outage that left customers unable to use mobile, internet, and ATM services. Additionally, customers were not able to make any debit or credit card transactions during the outage.  Citibank (2011) - $285 million settlement related to a failure to disclose to investors its role in the asset-selection process for a hybrid Collateralized Debt Obligation the bank offered.  Multiple Banks (2012) - $25 billion in settlements and penalties regarding five large lenders’ improper foreclosure practices between January 2008 and December 2011. Note: Details for each of the events above were obtained from the Algorithmic’s FIRST database. 3
  • 4. Basel II and Operational Risk  In December of 2007, the US Federal Reserve System finalized a document commonly referred to as the “Final Rules” which set forth general requirements for the measurement of operational risk by large US financial institutions.1  These rules defined operational risk as the risk of loss resulting from inadequate or failed internal processes, people, and systems or from external events (including legal risk but excluding strategic and reputational risk)  Seven Distinct Basel Loss Event Types2: 1. Internal Fraud 2. External Fraud 5. Damage to Physical Assets 3. Business Disruptions/System Failure 6. Clients, Products, and Business Practice Matters 4. Execution, Delivery and Process Management 7. Employee Practices and Workplace Safety Issues.  Other classifications are available to describe losses as defined by regulators and banks  E.g. - Business Lines, Regions of Operations, Causal of Loss Category, etc  The Final Rules require banks to estimate an operational risk exposure amount that corresponds to the 99.9th percentile of the distribution of potential aggregate operational losses, as generated by the bank’s operational risk quantification system over a one- year horizon. Exposure estimates must: a) Incorporate four data elements: Internal Loss Data, External Loss Data, Scenario Analysis Data, and Business Environment/Internal Control Factor data. b) Be calculated using systematic, transparent, verifiable, and credible methodologies c) A separate exposure estimate must be calculated for each set of operational loss data demonstrating a statistically distinct loss profile.  The banking industry has focused on the use of the Loss Distribution Approach (LDA) to calculate operational risk exposure estimates. 1.Risk-Based Capital Standards: Advanced Capital Adequacy Framework – Basel II; Final Rule (2007), Federal Register 72(235), 69407 – 408. Also, see Operational Risk – Supervisory Guidelines for the Advanced Measurement Approaches; BIS June 2011. 2. See the Appendix for examples of each loss event type 4
  • 5. Overview of the Loss Distribution Approach (LDA)  Under the LDA, banks must segment their loss Segment Loss Data data to obtain datasets that are not demonstrably into Homogenous heterogeneous. Loss Datasets  These datasets are referred to as units of measure or UOMs Loss Distribution Approach  These datasets are used for subsequent modeling within the LDA Frequency  The LDA models two primary components of Distribution operational loss data: λ  Loss Frequency Aggregate Loss # of loss events per year Distribution  The banking industry has widely accepted a Poisson distribution as an appropriate distribution. Internal and/or Monte Carlo  Loss Severity Simulation External  Fitting a parametric distribution to operational loss Loss data is one of the biggest challenges in measuring Data operational risk exposure. Severity Distribution Operational Risk Exposure  Monte Carlo Simulation is then utilized to compound the is estimated as the 99.9th two distributions. percentile of the aggregate loss distribution; a 1/1,000  A large number of simulations must be run to observe a sufficient $ value of loss event year event * number of losses to reasonably assess what a 1 in 1,000 year event might look like…More on this shortly * Banks typically sum the VaR estimates for their UOMs and perform diversification modeling to move away from the assumption of positive correlation. 5
  • 6. Segmenting Loss Data into Units of Measure  Banks must segment their loss data to obtain datasets that are not demonstrably heterogeneous.  Which data classifications captured in the bank’s operational loss database best characterize the bank’s operational risk exposure? Banks often capture a variety of details about individual loss events – E.g. Region of occurrence, Business Line, Basel Event Type  How granular should classification be?  Once an appropriate set of classifying variables has been identified, a natural starting point to narrow in on homogenous datasets is to look at loss frequency and loss severity within the identified variables Example Data: Basel Loss Event Types  In this example, the bank has Loss Counts by Business CPBP BDSF IF EPWS EF DPA EDPM Total determined that 4 Business Lines Line and Event Type and the 7 Basel Loss Event Types are Commercial Banking 20 50 - 20 550 50 - 690 in % of Total 2.90% 7.25% 0.00% 2.90% 79.71% 7.25% 0.00% 100.00% a reasonable representation of Payment and Settlement in % of Total 10 1.27% 30 3.80% 1.27% 10 15 1.90% 440 260 55.70% 32.91% 25 790 3.16% 100.00% operational risk exposure. Agency Services 80 130 10 30 850 10 5 1,115 in % of Total 7.17% 11.66% 0.90% 2.69% 76.23% 0.90% 0.45% 100.00%  Not every business line has a large Other - 5 5 30 10 5 5 60 number of loss events. in % of Total 0.00% 8.33% 8.33% 50.00% 16.67% 8.33% 8.33% 100.00% Total 110 215 25 95 1,850 325 35 2,655  Within a business line, not every in % of Total 4.14% 8.10% 0.94% 3.58% 69.68% 12.24% 1.32% 100.00% Basel Event Type classification level Basel Loss Event Types has a large number of data Loss Amounts by Business Line and Event Type (in $ MM) CPBP BDSF IF EPWS EF DPA EDPM Total  Basel Loss Event Type: Commercial Banking $ 3.00 $ 18.00 $ 1.00 $ 10.00 $ 90.00 $ 1.00 $ - $ 123.00  BDSF and EF look distinct in % of Total 2.44% 14.63% 0.81% 8.13% 73.17% 0.81% 0.00% 100.00% Payment and Settlement $ 1.00 $ 7.00 $ 1.00 $ 4.00 $ 70.00 $ 25.00 $ 25.00 $ 133.00 in % of Total 0.75% 5.26% 0.75% 3.01% 52.63% 18.80% 18.80% 100.00%  Business Line: Agency Services $ 6.00 $ 240.00 $ 3.00 $ 2.00 $ 225.00 $ 5.00 $ 150.00 $ 631.00 in % of Total 0.95% 38.03% 0.48% 0.32% 35.66% 0.79% 23.77% 100.00%  All 4 Business Lines might be distinct Other $ - $ 3.00 $ 1.00 $ 15.00 $ 1.00 $ 3.00 $ 10.00 $ 33.00 in % of Total Total $ 0.00% 10.00 $ 9.09% 268.00 $ 3.03% 6.00 $ 45.45% 31.00 $ 3.03% 386.00 $ 9.09% 34.00 $ 30.30% 185.00 $ 100.00% 920.00  Additional testing is required to in % of Total 1.09% 29.13% 0.65% 3.37% 41.96% 3.70% 20.11% 100.00% identify homogenous datasets Note: The data above are fiction, created for this example 6
  • 7. Using R to Identify a Homogenous Loss Dataset  R can produce a variety of descriptive statistics, graphics, and hypothesis tests that are useful to evaluate whether loss data should be merged (homogenous) or separated (heterogeneous). Example: Is Business Disruptions & Systems Failure Loss Event Type statistically distinct from the Commercial Banking Business Line? Quantiles Datasets Count Mean(Log) SD(Log) 50.0% 75.0% 90.0% 95.0% 98.0% 99.0% 99.5% Max Commercial Banking 640 9.40 1.08 $ 7.89 $ 16.96 $ 65.39 $ 120.79 $ 281.52 $ 423.09 $ 777.20 $ 3,376.89 Business Disruptions & Systems Failure 215 11.55 2.08 $ 74.94 $ 424.44 $ 1,520.39 $ 5,904.35 $ 12,743.90 $ 17,255.26 $ 19,750.42 $ 19,954.35 - Quantiles in $ Thousands Test Statistic pValue Kolmogorov-Smirnov 0.53 0 Chi- Square 270.57 3.36674E-50 Anderson Darling 151.16 0 Conclusion: The preponderance of evidence suggests that Commercial Banking and BDSF are statistically distinct. A separate risk exposure estimate should be calculated for each of these datasets. 7
  • 8. Using R to Identify Homogenous Loss Datasets  R offers the capability to produce a variety of descriptive statistics, graphics, and hypothesis tests that are useful to evaluate whether loss data should be merged (homogenous) or separated (heterogeneous). Example: Is Other Business Line statistically distinct from the Commercial Banking Business Line? Quantiles Datasets Count Mean(Log) SD(Log) 50.0% 75.0% 90.0% 95.0% 98.0% 99.0% 99.5% Max Commercial Banking 640 9.40 1.08 $ 7.81 $ 17.61 $ 51.60 $ 106.22 $ 390.62 $ 617.59 $ 1,162.25 $ 3,142.99 Other 55 9.32 1.02 $ 7.70 $ 14.58 $ 40.38 $ 91.85 $ 138.79 $ 434.62 $ 607.40 $ 780.18 - Quantiles in $ Thousands Test Statistic pValue Kolmogorov-Smirnov 0.061 0.992 Chi- Square 6.580 0.884 Anderson Darling -0.998 0.627 Conclusion: The preponderance of evidence suggests that we cannot conclude the Commercial Banking and ‘Other’ Business Lines are statistically distinct. These data can be aggregated into a single data set for frequency and severity modeling. If a business rationale exists to keep these data sets separate, banks may do so. 8
  • 9. Frequency Distribution Fitting in R Fitting a Frequency Distribution:  The banking industry has focused on the use of a Poisson distribution to model the frequency of operational loss events.  The Poisson is parameterized by one parameter, λ, which is equivalent to the average frequency over the time horizon being estimated (1 year).  Various methods are used to parameterize the Poisson distribution Simple Annual Average Bank identified internal and Regression Analysis based on internal/external variables – See the function lm() in R external data characteristics might Poisson Regression based on internal/external variables – See the function glm() in R help explain operational loss frequency Commercial Banking Year Loss Counts 2005 76  Once a parameter estimate, λ ,has been identified, 2006 82 obtaining the density, distribution function, quantile 2007 94 function and random generation for the Poisson 2008 64 distribution is quite easy: 2009 90 2010 103  See dpois, ppois, rpois in R for more details. 2011 96 2012 85 Total 690 Average 86.25 9
  • 10. Fitting a Severity Distribution Fitting a Severity Distribution:  Many great authors have published overviews on the process for severity distribution fitting within the context of an LDA model*.  The industry currently practices a variety of loss severity modeling techniques Fitting a single parametric distribution to the entire dataset (e.g. – log normal, pareto, log gamma, weibull, etc.) Fitting a mixture of parametric distributions to the loss severity data Fitting multiple parametric distributions that have non-overlapping ranges (“Splicing”) Extreme Value Theory (EVT) and the Peaks Over Thresholds Method  Challenges associated with fitting a severity distribution include: 1. The Final Rule asks banks to estimate a 1 in 1,000 year event based on less than 15 years of operational loss data 2. Data collection thresholds – Use of shifted distributions or truncated distributions? 3. Operational Loss Databases are often “living” – Loss severities, loss data classifications, and risk types can be modified 4. Data Paucity – In many cases banks have units of measure that have a small number of observations (< 1,000). 5. Undetected Heterogeneity of Datasets – Tests performed to identify heterogeneous datasets are not perfect at doing so Small data sets can impede this effort. 6. Fat-Tailed Data – Banks are faced with UOMs that have a small number of observations which are often best described by a heavily skewed distribution. Limited data in the tail can result in volatile capital estimates (e.g. – capital can swing upwards or downwards by hundreds of millions of $) based on the inclusion of a few events. Volatile results can present subsequent challenges for obtaining senior management buy-in on risk exposure estimates. * Please see the references slide at the end of this presentation for a short list of books and papers that provide additional detail on operational risk modeling. 10
  • 11. Fitting a Severity Distribution in R Fitting a Severity Distribution:  A variety of optimization routines exist in R that are capable of fitting severity distribution to loss data.  Using the optim() in R, one needs to specify: 1. Density Function: - sum(densityFunction(x=data, log=TRUE)) 2. Starting Parameters: Contingent upon the distribution being fit 3. Optimization Routine: Nelder-Mead, BFGS, SANN, etc.  See B. Bolker for more on optimization routines in R beyond the optim() function.  Fitting truncated severity distributions  The actuar package provides density, distribution, and quantile functions as well as random number generators for fat-tailed distributions  See Nadarajah and Kotz for code that will facilitate the fitting of a truncated density, distribution, quantile function, and random number generator.  Identifying a “best-fit” severity distribution to the loss data  QQ-Plot of the empircal data against the fitted distributions – plot(), qqplot()  Plot the empirical cdf against the fitted distribution – ecdf()  See truncgof R package and A. Chernobai, S. T. Rachev, F. J. Fabozzi for goodness-of-fit tests and some adjusted exploratory tools that work with left truncated data.  Many packages exist that perform EVT severity distribution fitting:  See A. J. McNeil, R. Frey, P. Embrechts and the evir package in R.  Fitting and evaluating mixture distributions are more complex endeavors…  See the GAMLSS package in R and http://www.gamlss.org/ 11
  • 12. Overview of the Loss Distribution Approach (LDA)  Thus far we have discussed:  Segmentation of loss data to obtain datasets that are not Segment Loss Data demonstrably heterogeneous. into Homogenous  Loss Frequency Modeling Loss Datasets  Loss Severity Loss Distribution Approach  We have not yet discussed Monte Carlo Simulation…  Many simulations containing millions of iterations must be Frequency run to observe a sufficient number of losses to reasonably Distribution assess what a 1 in 1,000 year event might look like λ  This results in multiple days being lost to wait on code to Aggregate Loss complete. # of loss events per year Distribution Northern explored opportunities to parallelize Internal Monte Carlo simulation with Revolution Analytics Monte Carlo and/or Simulation External Loss Data Example Code: # Randomly draw n frequency observations from a Poisson distribution, then draw random severities from the specified truncated severity Severity distribution, truncated at point a. Sum up each of the individual loss Distribution Operational Risk amounts. Exposure is estimated as f_tr <- function() { the 99.9th percentile of sum(do.call("rtrunc", c(n=rpois(1, lambda), the aggregate loss $ value of loss event distribution; a 1/1,000 spec=distName, a=a, parList))) } year event # Simulate a large number of iterations and replicate the simulation a number of times to reduce sample noise simuMatrix <- replicate(30, replicate(1e+6, f_tr())) 12
  • 13. Monte Carlo Simulation Benchmarking Analysis  Northern Trust and Revolution Analytics Evaluate Various Methods to Enhance Monte Carlo Simulation  Use a different version of R: 32B, 64B (e.g. – Update your operating system)  Use various parallelization packages: doSNOW, doRSR, & doSMP, (doRSR & doSMP are Revolution Analytics product offerings)  Use multiple processors and/or machines: Single node with multiple cores Cluster of CPUs with multiple cores  Hardware Environments:  4-core laptop  3-node High Performing Cluster (HPC) on Amazon Cloud Configured and run with 8-cores on each node Each node was restricted from 16- to 8-cores  Metrics used to evaluate each method:  Elapsed Time by Step  Memory usage 13
  • 14. Monte Carlo Benchmarking Highlights  Revolution Analytics’ parallelization can be easily scaled up from laptop/server to the cluster using Revolution Analytics’ distributed computing capabilities  Parallelization greatly improves simulation performance 64bit is better  Elapsed time is linear in # of iterations  Performance improves with # of cores  Revo ~ Cran within a node (no MKL impact in this study)  doRSR slightly better than doSMP on a single server  64bit marginally better that 32bit  Performance scales with cluster resources  Memory use just driven by # of iterations doRSR ~ doSMP Memory Trends within a node Scales with # Cores 14
  • 15. Take-Aways, Next Steps, and Contacts Parallelizations Offers Business Enhancements:  Less time spent waiting on programs to complete  Means more time to analyze drivers of change (e.g. – underlying data changes)  More efficient management of computing resources  No need to manually manage/schedule programs  Scalability of the solution to available resources  Revolution Analytics’ parallelization routines are scalable to the resources available Contact Information:  Dave Humke, Northern Trust, Vice President, (dh98@ntrs.com)  Derek Norton, Revolution Analytics, (derek.norton@revolutionanalytics.com) 15
  • 16. Appendix – Basel Loss Event Type Definition Event Type Category Definition Categories (Level 2) Activity Examples (Level 3) (Level 1) Internal Fraud Loss due to acts of a type Unauthorized Transactions not reported (intentional) intended to defraud, Activity Transaction type unauthorized (with monetary loss) misappropriate property or Mismarking of position (intentional) circumvent regulations, the Theft and Fraud Fraud / credit fraud / worthless deposits law or company policy, excluding diversity / Theft / extortion / embezzlement / robbery discrimination events, which Misappropriation of assets involves at least one internal Forgery party. Check kiting Smuggling Account take-over / impersonation, etc. Tax non-compliance / evasion (willful) Bribes / kickbacks Insider trading (not on firm's account) External Fraud Losses due to acts of a type Theft and Fraud Theft / robbery intended to defraud, Forgery misappropriate property or Check kiting circumvent the law, by a Systems Security Hacking damage third party Theft of information (with monetary loss) Employment Losses arising from acts Employee Relations Compensation, benefit, termination issues Practices and inconsistent with Organized labor activities Workplace Safety employment, health or safety Safe Environment General liability (slips and falls, etc.) laws or agreements, from Employee health & safety rules and events payment of personal injury claims, or from diversity / Workers compensation discrimination events. Diversity & All discrimination types Discrimination 16
  • 17. Appendix – Basel Loss Event Type Definitions (Continued) Event Type Category Definition Categories (Level 2) Activity Examples (Level 3) (Level 1) Clients, Products & Losses arising from an Suitability, Fiduciary breaches / guideline violations Business Practice unintentional or negligent Disclosure & Suitability / disclosure issues (KYC, etc.) failure to meet a professional Fiduciary Retail consumer disclosure violations obligation to specific clients Breach of privacy (including fiduciary and suitability requirements), or Aggressive sales from the nature or design of Account churning a product. Misuse of confidential information Lender liability Improper Business or Antitrust Market Practices Improper trade / market practice Market manipulation Insider trading (on firm's account) Unlicensed activity Money laundering Product Flaws Product defects (unauthorized, etc.) Model errors Selection, Failure t investigate client per guidelines Sponsorship & Exceeding client exposure limits E Advisory Activities Disputes over performance or advisory activities Damage to Physical Losses arising from loss or Disasters and Other Natural disaster losses Assets damage to physical assets Events Human losses from external sources (terrorism, from natural disaster or other vandalism) Business Disruption Losses arising from disruption Systems Hardware & Systems Failures of business or system Software failures Telecommunications Utility outage / disruptions 17
  • 18. Appendix – Basel Loss Event Type Definitions (Continued) Event Type Category Definition Categories (Level 2) Activity Examples (Level 3) (Level 1) Execution, Delivery Losses from failed Transaction Miscommunication & Process transaction processing or Capture, Execution Data entry, maintenance or loading error Management process management, from & Maintenance Missed deadline or responsibility relations with trade Model / system misoperation counterparties and vendors Accounting error / entity attribution error Other task misperformance Delivery failure Collateral management failure Reference data maintenance Monitoring & Failed mandatory reporting obligation Reporting Inaccurate external report (loss incurred) Customer Intake & Client permissions / disclaimers missed Documentation Legal documents missing / incomplete Customer / Client Unapproved access given to accounts Account Incorrect client records (loss incurred) Management Negligent loss or damage of client assets Trade Non-client counterparty misperformance Counterparties Misc. non-client counterparty disputes Vendors & Suppliers Outsourcing Vendor disputes 18
  • 19. Appendix - References References on Loss Distribution Approach Modeling, Frequency and Severity Fitting, and Monte Carlo Simulation: 1. A. Chernobai, S. T. Rachev, F. J. Fabozzi (2005), Composite Goodness-of-Fit Tests for Left-Truncated Samples, Technical report, University of California Santa Barbara 2. A. J. McNeil, R. Frey, P. Embrechts (2005), Quantitative Risk Management: Concepts, Techniques, and Tools, Princeton University Press, Princeton 3. B. Bolker (2007), Optimization and All That, Draft of Chapter 7 of B. Bolker (2008), Ecological Models and Data in R, Princeton University Press, Princeton 4. G.J. McLachlan, D. Peel (2000), Finite Mixture Models, Wiley & Sons, New York 5. H. Panjer (2006), Operational Risk: Modeling Analytics, Wiley & Sons, New York, p. 293. 6. K. Dutta, J. Perry (2006), A Tale of Tails: An Empirical Analysis of Loss Distribution Models for Estimating Operational Risk Capital, Working Paper No. 06-13, Federal Reserve Bank of Boston. 7. M. Moscadelli (2004), The Modelling of Operational Risk: Experience with the Analysis of the Data Collected by the Basel Committee, Temi di Discussione No. 517, Banca d’Italia. 8. P. de Fontnouvelle, E. Rosengren, J. Jordan (2007), Implications of Alternative Operational Risk Modeling Techniques, In: M. Carey and R.M. Stulz (eds), The Risks of Financial Institutions, University of Chicago Press, pp. 475-512. 9. S.A. Klugman, H.H. Panjer, G.E. Willmot (2008), Loss Models: From Data to Decisions, 3rd ed., Wiley & Sons, Hoboken, NJ 10. S. Nadarajah, S. Kotz (2006), R Programs for Computing Truncated Distributions, Journal of Statistical Software 16(2) 19