MongoDB Tick Data Presentation

MongoDB World
New York City, June 23-25
#MongoDBWorld
See what’s next in MongoDB including
•MongoDB 2.6
•Sharding
•Replication
•Aggregation
http://world.mongodb.com
Save $200 with discount code THANKYOU

3
• What is MongoDB
- The Company
- The Product
• MongoDB for Tick Data
• Case Study
Agenda

4
MongoDB Overview
350+ employees 1,000+ customers
Over $231 million in funding13 offices around the world

5
7,000,000+7,000,000+
MongoDB DownloadsMongoDB Downloads
150,000+150,000+
Online Education RegistrantsOnline Education Registrants
35,000+35,000+
MongoDB Management Service (MMS) UsersMongoDB Management Service (MMS) Users
30,000+30,000+
MongoDB User Group MembersMongoDB User Group Members
20,000+20,000+
MongoDB Days AttendeesMongoDB Days Attendees
Global Community

6
• What is MongoDB
- The Company
- The Product
• Case Study
Agenda

7
MongoDB.
NoSQL Document based database.
Designed to build todays applications.
•Fast to build.
•Quick to adapt.
•Easy to scale
•Lessons learned from 40 years of RDBMS.

8
Relational Model
PlanID BenFK Plan
100 1 PPO Plus
200 2 Standard
EmpID Name Dept Title Manage Payband
9950 Dunham,
Justin
500 1500 6531 C
EmpBenPlanID EmpFK PlanFK
1 9950 100
2 9950 200
BenID Benefit
1 Health
2 Dental
DeptID Department
500 Marketing
TitleID Title
1500 Product Manager

9
Document Model
EmpID Name Dept Title Manage Payband Benefits
9950 Dunham,
Justin
Marketing Product
Manager
6531 C
EmpBenPlanID EmpFK PlanFK
1 9950 100
2 9950 200
Health PPO Plus
Dental Standard
PlanID BenFK Plan
100 Health PPO Plus
200 Dental Standard

10
MongoDB - Agility
Dynamic Schemas
V 1.0 V 1.1 V 2.0
EmpID Name Dept Title Manager Payband Benefits
9950 Dunham,
Justin
Marketing Product
Manager
6531 C
EmpID Name Title Payband Bonus
9952 Joe White CEO E 20,000
EmpID Name Dept Title Manager Payband Shares
9531 Nearey,
Graham
Marketing Director 9952 D 5000
Health PPO Plus
Dental Standard

11
Shell
Command-line shell for
interacting directly with
database
MongoDB - Usability
Drivers
Drivers for most popular
programming languages and
frameworks
> db.collection.insert({product:“MongoDB”,
type:“Document Database”})
>
> db.collection.findOne()
{
“_id” : ObjectId(“5106c1c2fc629bfe52792e86”),
“product” : “MongoDB”
“type” : “Document Database”
}
Java
Python
Perl
Ruby
Haskell
JavaScript

12
MongoDB - Utility
• Complex Indexed Queries
• Aggregation.
Age > 65 AND Male
living near Lyon
Age Profit Margin
1-17 0
18-35 20
36-50 80
51-65 50
66+ 5

13
MongoDB - Scalability
• High Availability
• Auto Sharding
• Enterprise Monitoring
• Grid file storage

14
Column Family
Key/Value Store
Relational
Document Store
Options for building a Operational Database

15
MongoDB & Hadoop
• Multi-source analytics
• Interactive & Batch
• Data lake
• Online, Real-time
• High concurrency & HA
• Live analytics
Operational
Post
Processingand
MongoDB
Connector for
Hadoop

16
• What is MongoDB
- The Company
- The Product
• Case Study
Agenda

17
Tick Data – Why MongoDB?
• Flexible Data Model
– Easy Onboarding
• Flexible Querying and Indexing
– Primary, Secondary & Index Intersection
• Aggregation Framework
– Native to MongoDB
• Pre-aggregation pattern
– Continous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector
– Java, Python, Scala, R, Matlab
• High Throughput & Linear Scalability

18
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
bidPrice: 55.37,
offerPrice: 55.58,
bidQuantity: 500,
offerQuantity: 700
}
> db.ticks.find( {symbol: "DIS",
bidPrice: {$gt: 55.36} } )
Flexible Data Model
Easy Onboarding – e.g. Equities

19
{
symbol : "DIS",
bidPrices: [55.37, 55.36, 55.35],
offerPrices: [55.58, 55.59, 55.60],
bidQuantities: [500, 1000, 2000],
offerQuantities: [1000, 2000, 3000]
}
> db.ticks.find( {bidPrices: {$gt: 55.36} } )
Flexible Data Model
Easy Onboarding – e.g. Depth of Book

20
{
symbol : "DIS",
title: “Disney Earnings…”
body: “Walt Disney Company reported…”,
tags: [“earnings”, “media”, “walt disney”]
}
Flexible Data Model
Easy Onboarding – e.g. News

21
{
twitterHandle: “jdoe”,
tweet: “Heard @DisneyPictures is releasing…”,
usernamesIncluded: [“DisneyPictures”],
hashTags: [“movierumors”, “disney”]
}
Flexible Data Model
Easy Onboarding – e.g. Social Networking

22
– Easy Onboarding
• Aggregation Framework & Map-Reduce

23
Architecture for Querying Data
Higher Latency
Trading
Applications
Higher Latency
Trading
Applications
Backtesting
Applications
Backtesting
Applications
Research &
Analysis
Applications
Research &
Analysis
Applications

24
// Compound indexes
> db.ticks.ensureIndex({symbol: 1, timestamp:1})
// Index on arrays
>db.ticks.ensureIndex( {bidPrices: -1})
// Index on any depth
> db.ticks.ensureIndex( {“bids.price”: 1} )
// Full text search
> db.ticks.ensureIndex ( {tweet: “text”} )
Flexible Querying and Indexing
Index any field [or arrays]

25
// Ticks for last month for media companies
> db.ticks.find({
symbol: {$in: ["DIS", “VIA“, “CBS"]},
timestamp: {$gt: new ISODate("2013-01-
01")},
timestamp: {$lte: new ISODate("2013-01-
31")}})
// Ticks when Disney’s bid breached 55.50 this month
> db.ticks.find({
symbol: "DIS",
bidPrice: {$gt: 55.50},
timestamp: {$gt: new ISODate("2013-02-
01")}})
Flexible Querying and Indexing
Rich Query Language

26
– Easy Onboarding

27
//Aggregate minute bars for Disney for February
db.ticks.aggregate(
{ $match: {symbol: "DIS”, timestamp: {$gt: new ISODate("2013-02-01")}}},
{ $project: {
year: {$year: "$timestamp"},
month: {$month: "$timestamp"},
day: {$dayOfMonth: "$timestamp"},
hour: {$hour: "$timestamp"},
minute: {$minute: "$timestamp"},
second: {$second: "$timestamp"},
timestamp: 1,
price: 1}},
{ $sort: { timestamp: 1}},
{ $group :
{ _id : {year: "$year", month: "$month", day: "$day", hour: "$hour", minute: "$minute"},
open: {$first: "$price"},
high: {$max: "$price"},
low: {$min: "$price"},
close: {$last: "$price"} }} )
Aggregation Framework
Parallel execution across cluster

28
– Easy Onboarding
– Continuous and up-to-date snapshot of “object”

29
Pre-aggregation pattern
Real-time and continuous state
{
_id :
ObjectId("4e2e3f92268cdda473b628f6”)
symbol : "DIS",
timestamp: ISODate("2013-02-15
10:00"),
bidPrices: [55.37, 55.36, 55.35],
…
}
{
_id :
ObjectId("4e2e3f92268cdda473b628f6”)
symbol : "DIS",
timestamp: ISODate("2013-02-15
…
}
{
_id :
ObjectId("4e2e3f9226
8cdda473b628f6”)
symbol : "DIS",
Daily_high: 66.1
Daily_low: 57.1
Daily_volume: 100222
}
All Ticks CollectionPre-aggregated State

30
– Easy Onboarding

31
Process Data in Hadoop
• MongoDB’s Hadoop Connector
• Supports Map/Reduce, Streaming, Pig
• MongoDB as input/output storage for Hadoop
jobs
– No need to go through HDFS
• Leverage power of Hadoop ecosystem against
operational data in MongoDB

32
– Easy Onboarding
• High Throughput & High Scalability

33
Why MongoDB Is Fast and Scalable
Better data locality
Relational MongoDB
In-Memory
Caching
Auto-Sharding
Read/write scalingRead/write scaling

34
• What is MongoDB
- The Company
- The Product
• Case Study
Agenda

35
Easy On-boarding
Easy On-boarding of all Financial Data
Problem Why MongoDB
• Financial data comes in many different shapes and sizes,
and it needs to be on-boarded for research and analysis from
multiple platforms like Bloombergs and Reuters
Shapes
- Time Series News
- Event
- Sentiment
Sizes
- 1MB 1x a day price data
- 1GB x 1000s data matrices
- 40GB 1-minute data
- 30TB Tick data
- Even bigger << options data
• On-boarding can takes week in a relational model with
complex schema designs and ETL
•An FX Option can be a 80+ table schema
• Relational technology is a scale up architecture and did not
meet performance requirement of AHL
• Dynamic schema: can on-board data of any
shape or size almost instantly, without having to
go through a typical “ETL” lifecyle
• Performance: Quant researchers want data
rendered in <1s for up-to 20 years of historical
data for back-testing trading strategies
• Replication: Team of 40 Quants researchers who
rely on this system being up.
• Sharding: can scale seamlessly and
accommodate data of any shape and size

36
Low latency:
-1xDay data: 4ms for 10,000 rows (vs. 2,210ms from SQL)
-OneMinute / Tick data: 1s for 3.5M rows Python (vs. 15s – 40s+ from OtherTick)
-1s for 15M rows Java
-
Parallel Access:
-Cluster with 256+ concurrent data access
-Consistent throughput – little load on the Mongo server
Efficient:
-10-15x reduction in network load
-Negligible decompression cost (lz4: 1.8Gb/s)
Easy On-boarding
Results

39
James (AHL) Presentation Links
• Slides:
• http://www.slideshare.net/JamesBlackburn1/mo
ngodb-and-python-as-a-market-data-platform
• YouTube:
• James Blackburn - Python and MongoDB as a
Platform for Financial Market Data

MongoDB Tick Data Presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MongoDB Tick Data Presentation

Similar to MongoDB Tick Data Presentation (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

MongoDB Tick Data Presentation

Editor's Notes