SlideShare ist ein Scribd-Unternehmen logo
1 von 40
1© Cloudera, Inc. All rights reserved.
Estimating Financial Risk with
Spark
Sandy Ryza | Senior Data Scientist
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/spark-financial-modeling
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
2© Cloudera, Inc. All rights reserved.
3© Cloudera, Inc. All rights reserved.
4© Cloudera, Inc. All rights reserved.
In reasonable
circumstances, what’s
the most you can expect
to lose?
5© Cloudera, Inc. All rights reserved.
6© Cloudera, Inc. All rights reserved.
def valueAtRisk(
portfolio,
timePeriod,
pValue
): Double = { ... }
7© Cloudera, Inc. All rights reserved.
def valueAtRisk(
portfolio,
2 weeks,
0.05
) = $1,000,000
8© Cloudera, Inc. All rights reserved.
Probability
density
Portfolio return ($) over the time
period
9© Cloudera, Inc. All rights reserved.
VaR estimation approaches
• Variance-covariance
• Historical
• Monte Carlo
10© Cloudera, Inc. All rights reserved.
11© Cloudera, Inc. All rights reserved.
12© Cloudera, Inc. All rights reserved.
Market Risk Factors
• Indexes (S&P 500, NASDAQ)
• Prices of commodities
• Currency exchange rates
• Treasury bonds
13© Cloudera, Inc. All rights reserved.
Predicting Instrument Returns from Factor Returns
• Train a linear model on the factors for each instrument
14© Cloudera, Inc. All rights reserved.
Fancier
• Add features that are non-linear transformations of the market risk factors
• Decision trees
• For options, use Black-Scholes
15© Cloudera, Inc. All rights reserved.
import org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression
// Load the instruments and factors
val factorReturns: Array[Array[Double]] = ...
val instrumentReturns: RDD[Array[Double]] = ...
// Fit a model to each instrument
val models: Array[Array[Double]] =
instrumentReturns.map { instrument =>
val regression = new OLSMultipleLinearRegression()
regression.newSampleData(instrument, factorReturns)
regression.estimateRegressionParameters()
}.collect()
16© Cloudera, Inc. All rights reserved.
How to sample factor returns?
• Need to be able to generate sample vectors where each component is a factor
return.
• Factors returns are usually correlated.
17© Cloudera, Inc. All rights reserved.
Distribution of US treasury bond two-week returns
18© Cloudera, Inc. All rights reserved.
Distribution of crude oil two-week returns
19© Cloudera, Inc. All rights reserved.
The Multivariate Normal Distribution
• Probability distribution over vectors of length N
• Given all the variables but one, that variable is distributed according to a univariate
normal distribution
• Models correlations between variables
20© Cloudera, Inc. All rights reserved.
21© Cloudera, Inc. All rights reserved.
import org.apache.commons.math3.stat.correlation.Covariance
// Compute means
val factorMeans: Array[Double] = transpose(factorReturns)
.map(factor => factor.sum / factor.size)
// Compute covariances
val factorCovs: Array[Array[Double]] = new Covariance(factorReturns)
.getCovarianceMatrix().getData()
22© Cloudera, Inc. All rights reserved.
Fancier
• Multivariate normal often a poor choice compared to more sophisticated options
• Fatter tails: Multivariate T Distribution
• Filtered historical simulation
• ARMA
• GARCH
23© Cloudera, Inc. All rights reserved.
Running the simulations
• Create an RDD of seeds
• Use each seed to generate a set of simulations
• Aggregate results
24© Cloudera, Inc. All rights reserved.
// Broadcast the factor return -> instrument return models
val bModels = sc.broadcast(models)
// Generate a seed for each task
val seeds = (baseSeed until baseSeed + parallelism)
val seedRdd = sc.parallelize(seeds, parallelism)
// Create an RDD of trials
val trialReturns: RDD[Double] = seedRdd.flatMap { seed =>
trialReturns(seed, trialsPerTask, bModels.value, factorMeans, factorCovs)
}
25© Cloudera, Inc. All rights reserved.
def trialReturn(factorDist: MultivariateNormalDistribution, models: Seq[Array[Double]]): Double = {
val trialFactorReturns = factorDist.sample()
var totalReturn = 0.0
for (model <- models) {
// Add the returns from the instrument to the total trial return
for (i <- until trialFactorsReturns.length) {
totalReturn += trialFactorReturns(i) * model(i)
}
}
totalReturn
}
26© Cloudera, Inc. All rights reserved.
Executor
Executor
Time
Model
parameters
Model
parameters
27© Cloudera, Inc. All rights reserved.
Executor
Task Task
Trial Trial TrialTrial
Task Task
Trial Trial Trial Trial
Executor
Task Task
Trial Trial TrialTrial
Task Task
Trial Trial Trial Trial
Time
Model
parameters
Model
parameters
28© Cloudera, Inc. All rights reserved.
// Cache for reuse
trialReturns.cache()
val numTrialReturns = trialReturns.count().toInt
// Compute value at risk
val valueAtRisk = trials.takeOrdered(numTrialReturns / 20).last
// Compute expected shortfall
val expectedShortfall =
trials.takeOrdered(numTrialReturns / 20).sum / (numTrialReturns / 20)
29© Cloudera, Inc. All rights reserved.
30© Cloudera, Inc. All rights reserved.
VaR
31© Cloudera, Inc. All rights reserved.
So why Spark?
32© Cloudera, Inc. All rights reserved.
Ease of use
• Parallel computing for 5-year olds
• Scala, Python, and R REPLs
33© Cloudera, Inc. All rights reserved.
• Cleaning data
• Fitting models
• Running simulations
• Storing results
• Analyzing results
Single platform for
34© Cloudera, Inc. All rights reserved.
But it’s CPU-bound and we’re using Java?
• Computational bottlenecks are normally in matrix operations, which can be BLAS-
ified
• Can call out to GPUs just like in C++
• Memory access patterns aren’t high-GC inducing
35© Cloudera, Inc. All rights reserved.
Want to do this yourself?
36© Cloudera, Inc. All rights reserved.
spark-timeseries
• https://github.com/cloudera/spark-timeseries
• Everything here + some fancier stuff
• Patches welcome!
37© Cloudera, Inc. All rights reserved.
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations/
spark-financial-modeling

Weitere ähnliche Inhalte

Andere mochten auch

The monte carlo method
The monte carlo methodThe monte carlo method
The monte carlo method
Saurabh Sood
 
Monte Carlo Simulations
Monte Carlo SimulationsMonte Carlo Simulations
Monte Carlo Simulations
gfbreaux
 

Andere mochten auch (7)

Estimating Financial Risk with Apache Spark
Estimating Financial Risk with Apache SparkEstimating Financial Risk with Apache Spark
Estimating Financial Risk with Apache Spark
 
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...
 
DataEngConf: Apache Spark in Financial Modeling at BlackRock
DataEngConf: Apache Spark in Financial Modeling at BlackRock DataEngConf: Apache Spark in Financial Modeling at BlackRock
DataEngConf: Apache Spark in Financial Modeling at BlackRock
 
The monte carlo method
The monte carlo methodThe monte carlo method
The monte carlo method
 
Improving Forecasts with Monte Carlo Simulations
Improving Forecasts with Monte Carlo Simulations  Improving Forecasts with Monte Carlo Simulations
Improving Forecasts with Monte Carlo Simulations
 
Monte Carlo Simulations
Monte Carlo SimulationsMonte Carlo Simulations
Monte Carlo Simulations
 
How to perform a Monte Carlo simulation
How to perform a Monte Carlo simulation How to perform a Monte Carlo simulation
How to perform a Monte Carlo simulation
 

Ähnlich wie Financial Modeling with Apache Spark: Calculating Value at Risk

Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...
Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...
Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...
Altair
 
22-4_PerformanceTuningUsingtheAdvisorFramework.pdf
22-4_PerformanceTuningUsingtheAdvisorFramework.pdf22-4_PerformanceTuningUsingtheAdvisorFramework.pdf
22-4_PerformanceTuningUsingtheAdvisorFramework.pdf
yishengxi
 

Ähnlich wie Financial Modeling with Apache Spark: Calculating Value at Risk (20)

MySQL-Performance Schema- What's new in MySQL-5.7 DMRs
MySQL-Performance Schema- What's new in MySQL-5.7 DMRsMySQL-Performance Schema- What's new in MySQL-5.7 DMRs
MySQL-Performance Schema- What's new in MySQL-5.7 DMRs
 
Microservices Chaos Testing at Jet
Microservices Chaos Testing at JetMicroservices Chaos Testing at Jet
Microservices Chaos Testing at Jet
 
[India Merge World Tour] Electric Cloud
[India Merge World Tour] Electric Cloud[India Merge World Tour] Electric Cloud
[India Merge World Tour] Electric Cloud
 
MySQL Performance Schema : fossasia
MySQL Performance Schema : fossasiaMySQL Performance Schema : fossasia
MySQL Performance Schema : fossasia
 
Patterns & Practices for Cloud-based Microservices
Patterns & Practices for Cloud-based MicroservicesPatterns & Practices for Cloud-based Microservices
Patterns & Practices for Cloud-based Microservices
 
Academic and industry research collaboration – the Mathworks suite
Academic and industry research collaboration – the Mathworks suiteAcademic and industry research collaboration – the Mathworks suite
Academic and industry research collaboration – the Mathworks suite
 
Puppet on a string
Puppet on a stringPuppet on a string
Puppet on a string
 
Load Testing: See a Bigger Picture
Load Testing: See a Bigger PictureLoad Testing: See a Bigger Picture
Load Testing: See a Bigger Picture
 
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
 
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18
 
Create Agile, Automated and Predictable IT Infrastructure in the Cloud
Create Agile, Automated and Predictable IT Infrastructure in the CloudCreate Agile, Automated and Predictable IT Infrastructure in the Cloud
Create Agile, Automated and Predictable IT Infrastructure in the Cloud
 
DevOps Roadshow - cloud services for development
DevOps Roadshow - cloud services for developmentDevOps Roadshow - cloud services for development
DevOps Roadshow - cloud services for development
 
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...
 
Mechatronics engineer
Mechatronics engineerMechatronics engineer
Mechatronics engineer
 
Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...
Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...
Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...
 
Mstr meetup
Mstr meetupMstr meetup
Mstr meetup
 
22-4_PerformanceTuningUsingtheAdvisorFramework.pdf
22-4_PerformanceTuningUsingtheAdvisorFramework.pdf22-4_PerformanceTuningUsingtheAdvisorFramework.pdf
22-4_PerformanceTuningUsingtheAdvisorFramework.pdf
 
Rich Assad Resume
Rich Assad ResumeRich Assad Resume
Rich Assad Resume
 
Deep Learning - Continuous Operations
Deep Learning - Continuous Operations Deep Learning - Continuous Operations
Deep Learning - Continuous Operations
 

Mehr von C4Media

Mehr von C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Financial Modeling with Apache Spark: Calculating Value at Risk

  • 1. 1© Cloudera, Inc. All rights reserved. Estimating Financial Risk with Spark Sandy Ryza | Senior Data Scientist
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /spark-financial-modeling
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. 2© Cloudera, Inc. All rights reserved.
  • 5. 3© Cloudera, Inc. All rights reserved.
  • 6. 4© Cloudera, Inc. All rights reserved. In reasonable circumstances, what’s the most you can expect to lose?
  • 7. 5© Cloudera, Inc. All rights reserved.
  • 8. 6© Cloudera, Inc. All rights reserved. def valueAtRisk( portfolio, timePeriod, pValue ): Double = { ... }
  • 9. 7© Cloudera, Inc. All rights reserved. def valueAtRisk( portfolio, 2 weeks, 0.05 ) = $1,000,000
  • 10. 8© Cloudera, Inc. All rights reserved. Probability density Portfolio return ($) over the time period
  • 11. 9© Cloudera, Inc. All rights reserved. VaR estimation approaches • Variance-covariance • Historical • Monte Carlo
  • 12. 10© Cloudera, Inc. All rights reserved.
  • 13. 11© Cloudera, Inc. All rights reserved.
  • 14. 12© Cloudera, Inc. All rights reserved. Market Risk Factors • Indexes (S&P 500, NASDAQ) • Prices of commodities • Currency exchange rates • Treasury bonds
  • 15. 13© Cloudera, Inc. All rights reserved. Predicting Instrument Returns from Factor Returns • Train a linear model on the factors for each instrument
  • 16. 14© Cloudera, Inc. All rights reserved. Fancier • Add features that are non-linear transformations of the market risk factors • Decision trees • For options, use Black-Scholes
  • 17. 15© Cloudera, Inc. All rights reserved. import org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression // Load the instruments and factors val factorReturns: Array[Array[Double]] = ... val instrumentReturns: RDD[Array[Double]] = ... // Fit a model to each instrument val models: Array[Array[Double]] = instrumentReturns.map { instrument => val regression = new OLSMultipleLinearRegression() regression.newSampleData(instrument, factorReturns) regression.estimateRegressionParameters() }.collect()
  • 18. 16© Cloudera, Inc. All rights reserved. How to sample factor returns? • Need to be able to generate sample vectors where each component is a factor return. • Factors returns are usually correlated.
  • 19. 17© Cloudera, Inc. All rights reserved. Distribution of US treasury bond two-week returns
  • 20. 18© Cloudera, Inc. All rights reserved. Distribution of crude oil two-week returns
  • 21. 19© Cloudera, Inc. All rights reserved. The Multivariate Normal Distribution • Probability distribution over vectors of length N • Given all the variables but one, that variable is distributed according to a univariate normal distribution • Models correlations between variables
  • 22. 20© Cloudera, Inc. All rights reserved.
  • 23. 21© Cloudera, Inc. All rights reserved. import org.apache.commons.math3.stat.correlation.Covariance // Compute means val factorMeans: Array[Double] = transpose(factorReturns) .map(factor => factor.sum / factor.size) // Compute covariances val factorCovs: Array[Array[Double]] = new Covariance(factorReturns) .getCovarianceMatrix().getData()
  • 24. 22© Cloudera, Inc. All rights reserved. Fancier • Multivariate normal often a poor choice compared to more sophisticated options • Fatter tails: Multivariate T Distribution • Filtered historical simulation • ARMA • GARCH
  • 25. 23© Cloudera, Inc. All rights reserved. Running the simulations • Create an RDD of seeds • Use each seed to generate a set of simulations • Aggregate results
  • 26. 24© Cloudera, Inc. All rights reserved. // Broadcast the factor return -> instrument return models val bModels = sc.broadcast(models) // Generate a seed for each task val seeds = (baseSeed until baseSeed + parallelism) val seedRdd = sc.parallelize(seeds, parallelism) // Create an RDD of trials val trialReturns: RDD[Double] = seedRdd.flatMap { seed => trialReturns(seed, trialsPerTask, bModels.value, factorMeans, factorCovs) }
  • 27. 25© Cloudera, Inc. All rights reserved. def trialReturn(factorDist: MultivariateNormalDistribution, models: Seq[Array[Double]]): Double = { val trialFactorReturns = factorDist.sample() var totalReturn = 0.0 for (model <- models) { // Add the returns from the instrument to the total trial return for (i <- until trialFactorsReturns.length) { totalReturn += trialFactorReturns(i) * model(i) } } totalReturn }
  • 28. 26© Cloudera, Inc. All rights reserved. Executor Executor Time Model parameters Model parameters
  • 29. 27© Cloudera, Inc. All rights reserved. Executor Task Task Trial Trial TrialTrial Task Task Trial Trial Trial Trial Executor Task Task Trial Trial TrialTrial Task Task Trial Trial Trial Trial Time Model parameters Model parameters
  • 30. 28© Cloudera, Inc. All rights reserved. // Cache for reuse trialReturns.cache() val numTrialReturns = trialReturns.count().toInt // Compute value at risk val valueAtRisk = trials.takeOrdered(numTrialReturns / 20).last // Compute expected shortfall val expectedShortfall = trials.takeOrdered(numTrialReturns / 20).sum / (numTrialReturns / 20)
  • 31. 29© Cloudera, Inc. All rights reserved.
  • 32. 30© Cloudera, Inc. All rights reserved. VaR
  • 33. 31© Cloudera, Inc. All rights reserved. So why Spark?
  • 34. 32© Cloudera, Inc. All rights reserved. Ease of use • Parallel computing for 5-year olds • Scala, Python, and R REPLs
  • 35. 33© Cloudera, Inc. All rights reserved. • Cleaning data • Fitting models • Running simulations • Storing results • Analyzing results Single platform for
  • 36. 34© Cloudera, Inc. All rights reserved. But it’s CPU-bound and we’re using Java? • Computational bottlenecks are normally in matrix operations, which can be BLAS- ified • Can call out to GPUs just like in C++ • Memory access patterns aren’t high-GC inducing
  • 37. 35© Cloudera, Inc. All rights reserved. Want to do this yourself?
  • 38. 36© Cloudera, Inc. All rights reserved. spark-timeseries • https://github.com/cloudera/spark-timeseries • Everything here + some fancier stuff • Patches welcome!
  • 39. 37© Cloudera, Inc. All rights reserved.
  • 40. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/ spark-financial-modeling