Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applications for Investment Banks & Financial Institutions"

Real Time Big Data Applications for
Investment Banks & Financial Institutions

Dev Lakhani
• 15 years Software Architecture & Development Experience
• 7 Years of Big Data Experience
• Big Data Architectures for Banks, Telecom, Retail, Media
• Deutche Telekom
• ASOS
• Tier 1 Investment Banks in Canary Wharf
• Dentsu Aeigis
• Contributor to Hadoop, Spark, Tachyon, HBase, Ignite
• uk.linkedin.com/in/devlakhani

• Overview of Big Data in financial
institutions
• Architectural constraints in investment
banking
• Implementation challenges
• Data model
• Future for financial applications
Introduction

• This talk has a technical focus
• This presentation is not representative of any client
• Real time re-definition for Big Data
• Vendor neutral talk
Disclaimers

Real Time Definition
[AS MODIFIER] Computing Relating to a system in which input data is
processed within milliseconds so that it is available virtually immediately as
feedback to the process from which it is coming, e.g. in a missile guidance
system:real-time signal processingreal-time software
http://www.oxforddictionaries.com/definition/english/real-time

Real Time Definition (Modified)
[AS MODIFIER] Computing Relating to a system in which input data is
processed within a guaranteed response time, using up-to-date
(latest version) information and available on demand as feedback to
the process from which it is coming.

Big Data Drivers for Investment Banking &
Financial Instituions
• Capturing billions of trades
• Quantifying risk and exposure
• Regulatory requirements
• Response to news and events
• Detect fraud, rogue trading and anomalies
• Performing simulations & algorithmic trading
• Business analysis -PNL
• Capital reserves and forecasting
Why Use Big Data?

Service Layer.
Load Balanced/ Cached
TRADES
REFERENCE
DATA
TRADES
High Level Architecture

• Disaster avoidance (not recovery) through
replication and redundancy
• High availability
• "Chinese Wall" policy and segmentation of
information
• Within the bank
• External to the bank
• Security & role based segmentation
• Responsiveness and throughput
• API or service based architecutre,
transparent to quants/end users
• Data completeness, 1 lost trade = $1 < x <
$10million in VaR estimate
Constraints

•Distributed File System, ingest raw data
•Regulatory compliance& archiving
• Last option disaster recovery
• Direct access to "power-users" for modelling and
analysis
Big Data Solution Architecture Components

•Distributed Warehouse
•Not always highly transactional
• Trading exchange worries about the
trade/transaction
• Eventually consistent sufficient
•SQL vs No-SQL
•MPP (Massively Parallel Processing)
•In memory vs on disk tuning

•Analytics and Serving Layer
•Perform descriptive stats
• Trade summaries
• Risk Calculation
• Monte Carlo Simulation
• Machine learning
• Expose APIs
•Report/Aggregate/Present

Physical Processes and Daemons
• HDFS
• Datanodes- store the data
• Journalnodes - shared edits (HA)
• Primary and Seconday namenode (HA)
• Zookeeper - coordinate between Namenodes
• YARN
• Resource manager x 2
• Node managers x (number of nodes)
• Job history servers
Lower Level Architecture Components

Physical Processes and Daemons
• HBase (1.0.0)
• N xHBase zookeepers
• 2 x Hbase masters
• 2 x Hbase master -regionservers
• N x Regionservers
• Spark
• Master (No HA)
• N x slaves
• Monitoring
• JMX monitoring
Lower Level Architecture Components

{"book":[
{
" trade:id":"8400000-8cf0-11bd-b23e-10b96e4ef00d",
"timestamp":"2015-04-04T14:56:45+00:00 ",
" type":"spotfxusd", "value":"4999"
}
]}
• 20+ interbank systems, 100s of reference sets (e.g.
exchange rates)
• Billions of these per day, 100TB+
Data model

• Estimate Value at Risk
• Over a given timeframe, week, month,
year
• A confidence level 95%-99%
• A loss amount e.g £1m
What is the maximum potential
loss >£1m over that time?
• Using Spark calculate the covariance
matrix of past returns
• Use RDDs and parallel data structures to
simulate various conditions
• Sum, aggregate and take bottom 5%
Analytics, Machine Learning & Simulation

Towards Real Time/ Streaming VaR

• Keys have to be distributed evenly
• Encoding and compression choices have to be
made
• LZO, GZ, Snappy, Codecs
• Serialization choices and memory tuning
• Java objects/JSON objects/JSON to Java
• Replication has to be managed and tested
• Cross cluster replication
• Cross data center replication
• Availability throughput during replication
• Rolling restarts and upgrades
Performance Challenges

• In memory tuning, off heap and on heap, region sizes
• Java tuning, heap, permgen, generation (for 20+ daemons!)
• HBase requires a functioning and performant HDFS cluster
• Cassandra requires tuning for compaction, replication
• Spark needs correct partitioning and persistence strategies
• Allocation of resources to nodes, network, disk etc.
• Role and table based segmentation - maintaining the Chinese
Wall
Performance Challenges

Once you solve that...
•Distributed File System for ingested/archived
data
•MPP warehouse for querying and analytics
•Quant layer for machine learning and prediction
•Service layer to expose APIs for VaR, stress tests
•Response guarantees for real time Big Data

Questions?
dl@batchinsights.com
blog.batchinsights.com
livedemo.batchinsights.com
batchinsights.com

Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applications for Investment Banks & Financial Institutions"

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (17)

Ähnlich wie Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applications for Investment Banks & Financial Institutions"

Ähnlich wie Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applications for Investment Banks & Financial Institutions" (20)

Mehr von Dataconomy Media

Mehr von Dataconomy Media (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applications for Investment Banks & Financial Institutions"