SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Real Time Big Data Applications for
Investment Banks & Financial Institutions
Dev Lakhani
• 15 years Software Architecture & Development Experience
• 7 Years of Big Data Experience
• Big Data Architectures for Banks, Telecom, Retail, Media
• Deutche Telekom
• ASOS
• Tier 1 Investment Banks in Canary Wharf
• Dentsu Aeigis
• Contributor to Hadoop, Spark, Tachyon, HBase, Ignite
• uk.linkedin.com/in/devlakhani
• Overview of Big Data in financial
institutions
• Architectural constraints in investment
banking
• Implementation challenges
• Data model
• Future for financial applications
Introduction
• This talk has a technical focus
• This presentation is not representative of any client
• Real time re-definition for Big Data
• Vendor neutral talk
Disclaimers
Real Time Definition
[AS MODIFIER] Computing Relating to a system in which input data is
processed within milliseconds so that it is available virtually immediately as
feedback to the process from which it is coming, e.g. in a missile guidance
system:real-time signal processingreal-time software
http://www.oxforddictionaries.com/definition/english/real-time
Real Time Definition (Modified)
[AS MODIFIER] Computing Relating to a system in which input data is
processed within a guaranteed response time, using up-to-date
(latest version) information and available on demand as feedback to
the process from which it is coming.
Problem Domain
Big Data Drivers for Investment Banking &
Financial Instituions
• Capturing billions of trades
• Quantifying risk and exposure
• Regulatory requirements
• Response to news and events
• Detect fraud, rogue trading and anomalies
• Performing simulations & algorithmic trading
• Business analysis -PNL
• Capital reserves and forecasting
Why Use Big Data?
Service Layer.
Load Balanced/ Cached
TRADES
REFERENCE
DATA
TRADES
High Level Architecture
• Disaster avoidance (not recovery) through
replication and redundancy
• High availability
• "Chinese Wall" policy and segmentation of
information
• Within the bank
• External to the bank
• Security & role based segmentation
• Responsiveness and throughput
• API or service based architecutre,
transparent to quants/end users
• Data completeness, 1 lost trade = $1 < x <
$10million in VaR estimate
Constraints
•Distributed File System, ingest raw data
•Regulatory compliance& archiving
• Last option disaster recovery
• Direct access to "power-users" for modelling and
analysis
Big Data Solution Architecture Components
•Distributed Warehouse
•Not always highly transactional
• Trading exchange worries about the
trade/transaction
• Eventually consistent sufficient
•SQL vs No-SQL
•MPP (Massively Parallel Processing)
•In memory vs on disk tuning
Big Data Solution Architecture Components
•Analytics and Serving Layer
•Perform descriptive stats
• Trade summaries
• Risk Calculation
• Monte Carlo Simulation
• Machine learning
• Expose APIs
•Report/Aggregate/Present
Big Data Solution Architecture Components
Physical Processes and Daemons
• HDFS
• Datanodes- store the data
• Journalnodes - shared edits (HA)
• Primary and Seconday namenode (HA)
• Zookeeper - coordinate between Namenodes
• YARN
• Resource manager x 2
• Node managers x (number of nodes)
• Job history servers
Lower Level Architecture Components
Physical Processes and Daemons
• HBase (1.0.0)
• N xHBase zookeepers
• 2 x Hbase masters
• 2 x Hbase master -regionservers
• N x Regionservers
• Spark
• Master (No HA)
• N x slaves
• Monitoring
• JMX monitoring
Lower Level Architecture Components
{"book":[
{
" trade:id":"8400000-8cf0-11bd-b23e-10b96e4ef00d",
"timestamp":"2015-04-04T14:56:45+00:00 ",
" type":"spotfxusd", "value":"4999"
}
]}
• 20+ interbank systems, 100s of reference sets (e.g.
exchange rates)
• Billions of these per day, 100TB+
Data model
• Estimate Value at Risk
• Over a given timeframe, week, month,
year
• A confidence level 95%-99%
• A loss amount e.g £1m
What is the maximum potential
loss >£1m over that time?
• Using Spark calculate the covariance
matrix of past returns
• Use RDDs and parallel data structures to
simulate various conditions
• Sum, aggregate and take bottom 5%
Analytics, Machine Learning & Simulation
Towards Real Time/ Streaming VaR
• Keys have to be distributed evenly
• Encoding and compression choices have to be
made
• LZO, GZ, Snappy, Codecs
• Serialization choices and memory tuning
• Java objects/JSON objects/JSON to Java
• Replication has to be managed and tested
• Cross cluster replication
• Cross data center replication
• Availability throughput during replication
• Rolling restarts and upgrades
Performance Challenges
• In memory tuning, off heap and on heap, region sizes
• Java tuning, heap, permgen, generation (for 20+ daemons!)
• HBase requires a functioning and performant HDFS cluster
• Cassandra requires tuning for compaction, replication
• Spark needs correct partitioning and persistence strategies
• Allocation of resources to nodes, network, disk etc.
• Role and table based segmentation - maintaining the Chinese
Wall
Performance Challenges
Once you solve that...
•Distributed File System for ingested/archived
data
•MPP warehouse for querying and analytics
•Quant layer for machine learning and prediction
•Service layer to expose APIs for VaR, stress tests
•Response guarantees for real time Big Data
Questions?
dl@batchinsights.com
blog.batchinsights.com
livedemo.batchinsights.com
batchinsights.com

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
DataWorks Summit
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar
Cloudera, Inc.
 

Was ist angesagt? (20)

Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
ROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on Hadoop
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousing
 

Andere mochten auch

Asian Bankers Association, Manila Conference
Asian Bankers Association, Manila ConferenceAsian Bankers Association, Manila Conference
Asian Bankers Association, Manila Conference
Deepak Ramanathan
 

Andere mochten auch (17)

Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 
Asian Bankers Association, Manila Conference
Asian Bankers Association, Manila ConferenceAsian Bankers Association, Manila Conference
Asian Bankers Association, Manila Conference
 
INVESTMENT ,COMMERCIAL & MERCHANT BANKS
INVESTMENT ,COMMERCIAL & MERCHANT BANKSINVESTMENT ,COMMERCIAL & MERCHANT BANKS
INVESTMENT ,COMMERCIAL & MERCHANT BANKS
 
Banks Betting on Big Data Analytics and Real-Time Execution to Better Engage ...
Banks Betting on Big Data Analytics and Real-Time Execution to Better Engage ...Banks Betting on Big Data Analytics and Real-Time Execution to Better Engage ...
Banks Betting on Big Data Analytics and Real-Time Execution to Better Engage ...
 
Analytics For Retail Banking - Marketelligent
Analytics For Retail Banking - MarketelligentAnalytics For Retail Banking - Marketelligent
Analytics For Retail Banking - Marketelligent
 
Business Intelligence In Financial Industry
Business Intelligence In Financial IndustryBusiness Intelligence In Financial Industry
Business Intelligence In Financial Industry
 
Why Blockchain Matters to Big Data - Big Data London Meetup - Nov 3, 2016
Why Blockchain Matters to Big Data - Big Data London Meetup - Nov 3, 2016Why Blockchain Matters to Big Data - Big Data London Meetup - Nov 3, 2016
Why Blockchain Matters to Big Data - Big Data London Meetup - Nov 3, 2016
 
Financial Economics - Commercial Banking
Financial Economics - Commercial BankingFinancial Economics - Commercial Banking
Financial Economics - Commercial Banking
 
Big Data Alchemy: How can Banks Maximize the Value of their Customer Data?
Big Data Alchemy: How can Banks Maximize the Value of their Customer Data?Big Data Alchemy: How can Banks Maximize the Value of their Customer Data?
Big Data Alchemy: How can Banks Maximize the Value of their Customer Data?
 
Big Idea For Big Data
Big Idea For Big DataBig Idea For Big Data
Big Idea For Big Data
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
Top Ten Trends in Banking 2017
Top Ten Trends in Banking 2017Top Ten Trends in Banking 2017
Top Ten Trends in Banking 2017
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Gartner TOP 10 Strategic Technology Trends 2017
Gartner TOP 10 Strategic Technology Trends 2017Gartner TOP 10 Strategic Technology Trends 2017
Gartner TOP 10 Strategic Technology Trends 2017
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 

Ähnlich wie Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applications for Investment Banks & Financial Institutions"

Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
Kognitio
 
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
confluent
 

Ähnlich wie Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applications for Investment Banks & Financial Institutions" (20)

"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
Operational-Analytics
Operational-AnalyticsOperational-Analytics
Operational-Analytics
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
 
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
 
Kai Wähner – Real World Use Cases for Realtime In-Memory Computing - NoSQL ma...
Kai Wähner – Real World Use Cases for Realtime In-Memory Computing - NoSQL ma...Kai Wähner – Real World Use Cases for Realtime In-Memory Computing - NoSQL ma...
Kai Wähner – Real World Use Cases for Realtime In-Memory Computing - NoSQL ma...
 
Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...
Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...
Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 

Mehr von Dataconomy Media

Mehr von Dataconomy Media (20)

Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & 	David An...Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & 	David An...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
 
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
 
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
 
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
 
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
 
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
 
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0  | "Building Kubernetes Operators with KUDO for Dat...Data Natives Vienna v 7.0  | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
 
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
 
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0  | "The Data Lorax: Planting the Seeds of Fairness...Data Natives Cologne v 4.0  | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
 
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
 
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
 
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
 
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
 
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
 
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
 
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
 
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
 
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
 
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
 
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
 

Kürzlich hochgeladen

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Kürzlich hochgeladen (20)

Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 

Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applications for Investment Banks & Financial Institutions"

  • 1. Real Time Big Data Applications for Investment Banks & Financial Institutions
  • 2. Dev Lakhani • 15 years Software Architecture & Development Experience • 7 Years of Big Data Experience • Big Data Architectures for Banks, Telecom, Retail, Media • Deutche Telekom • ASOS • Tier 1 Investment Banks in Canary Wharf • Dentsu Aeigis • Contributor to Hadoop, Spark, Tachyon, HBase, Ignite • uk.linkedin.com/in/devlakhani
  • 3. • Overview of Big Data in financial institutions • Architectural constraints in investment banking • Implementation challenges • Data model • Future for financial applications Introduction
  • 4. • This talk has a technical focus • This presentation is not representative of any client • Real time re-definition for Big Data • Vendor neutral talk Disclaimers
  • 5. Real Time Definition [AS MODIFIER] Computing Relating to a system in which input data is processed within milliseconds so that it is available virtually immediately as feedback to the process from which it is coming, e.g. in a missile guidance system:real-time signal processingreal-time software http://www.oxforddictionaries.com/definition/english/real-time
  • 6. Real Time Definition (Modified) [AS MODIFIER] Computing Relating to a system in which input data is processed within a guaranteed response time, using up-to-date (latest version) information and available on demand as feedback to the process from which it is coming.
  • 8. Big Data Drivers for Investment Banking & Financial Instituions • Capturing billions of trades • Quantifying risk and exposure • Regulatory requirements • Response to news and events • Detect fraud, rogue trading and anomalies • Performing simulations & algorithmic trading • Business analysis -PNL • Capital reserves and forecasting Why Use Big Data?
  • 9. Service Layer. Load Balanced/ Cached TRADES REFERENCE DATA TRADES High Level Architecture
  • 10. • Disaster avoidance (not recovery) through replication and redundancy • High availability • "Chinese Wall" policy and segmentation of information • Within the bank • External to the bank • Security & role based segmentation • Responsiveness and throughput • API or service based architecutre, transparent to quants/end users • Data completeness, 1 lost trade = $1 < x < $10million in VaR estimate Constraints
  • 11. •Distributed File System, ingest raw data •Regulatory compliance& archiving • Last option disaster recovery • Direct access to "power-users" for modelling and analysis Big Data Solution Architecture Components
  • 12. •Distributed Warehouse •Not always highly transactional • Trading exchange worries about the trade/transaction • Eventually consistent sufficient •SQL vs No-SQL •MPP (Massively Parallel Processing) •In memory vs on disk tuning Big Data Solution Architecture Components
  • 13. •Analytics and Serving Layer •Perform descriptive stats • Trade summaries • Risk Calculation • Monte Carlo Simulation • Machine learning • Expose APIs •Report/Aggregate/Present Big Data Solution Architecture Components
  • 14. Physical Processes and Daemons • HDFS • Datanodes- store the data • Journalnodes - shared edits (HA) • Primary and Seconday namenode (HA) • Zookeeper - coordinate between Namenodes • YARN • Resource manager x 2 • Node managers x (number of nodes) • Job history servers Lower Level Architecture Components
  • 15. Physical Processes and Daemons • HBase (1.0.0) • N xHBase zookeepers • 2 x Hbase masters • 2 x Hbase master -regionservers • N x Regionservers • Spark • Master (No HA) • N x slaves • Monitoring • JMX monitoring Lower Level Architecture Components
  • 16. {"book":[ { " trade:id":"8400000-8cf0-11bd-b23e-10b96e4ef00d", "timestamp":"2015-04-04T14:56:45+00:00 ", " type":"spotfxusd", "value":"4999" } ]} • 20+ interbank systems, 100s of reference sets (e.g. exchange rates) • Billions of these per day, 100TB+ Data model
  • 17. • Estimate Value at Risk • Over a given timeframe, week, month, year • A confidence level 95%-99% • A loss amount e.g £1m What is the maximum potential loss >£1m over that time? • Using Spark calculate the covariance matrix of past returns • Use RDDs and parallel data structures to simulate various conditions • Sum, aggregate and take bottom 5% Analytics, Machine Learning & Simulation
  • 18. Towards Real Time/ Streaming VaR
  • 19. • Keys have to be distributed evenly • Encoding and compression choices have to be made • LZO, GZ, Snappy, Codecs • Serialization choices and memory tuning • Java objects/JSON objects/JSON to Java • Replication has to be managed and tested • Cross cluster replication • Cross data center replication • Availability throughput during replication • Rolling restarts and upgrades Performance Challenges
  • 20. • In memory tuning, off heap and on heap, region sizes • Java tuning, heap, permgen, generation (for 20+ daemons!) • HBase requires a functioning and performant HDFS cluster • Cassandra requires tuning for compaction, replication • Spark needs correct partitioning and persistence strategies • Allocation of resources to nodes, network, disk etc. • Role and table based segmentation - maintaining the Chinese Wall Performance Challenges
  • 21. Once you solve that... •Distributed File System for ingested/archived data •MPP warehouse for querying and analytics •Quant layer for machine learning and prediction •Service layer to expose APIs for VaR, stress tests •Response guarantees for real time Big Data