2. MongoDB World
New York City, June 23-25
#MongoDBWorld
See what’s next in MongoDB including
•MongoDB 2.6
•Sharding
•Replication
•Aggregation
http://world.mongodb.com
Save $200 with discount code THANKYOU
3. 3
• What is MongoDB
- The Company
- The Product
• MongoDB for Tick Data
• Case Study
Agenda
5. 5
7,000,000+7,000,000+
MongoDB DownloadsMongoDB Downloads
150,000+150,000+
Online Education RegistrantsOnline Education Registrants
35,000+35,000+
MongoDB Management Service (MMS) UsersMongoDB Management Service (MMS) Users
30,000+30,000+
MongoDB User Group MembersMongoDB User Group Members
20,000+20,000+
MongoDB Days AttendeesMongoDB Days Attendees
Global Community
6. 6
• What is MongoDB
- The Company
- The Product
• MongoDB for Tick Data
• Case Study
Agenda
7. 7
MongoDB.
NoSQL Document based database.
Designed to build todays applications.
•Fast to build.
•Quick to adapt.
•Easy to scale
•Lessons learned from 40 years of RDBMS.
8. 8
Relational Model
PlanID BenFK Plan
100 1 PPO Plus
200 2 Standard
EmpID Name Dept Title Manage Payband
9950 Dunham,
Justin
500 1500 6531 C
EmpBenPlanID EmpFK PlanFK
1 9950 100
2 9950 200
BenID Benefit
1 Health
2 Dental
DeptID Department
500 Marketing
TitleID Title
1500 Product Manager
9. 9
Document Model
EmpID Name Dept Title Manage Payband Benefits
9950 Dunham,
Justin
Marketing Product
Manager
6531 C
EmpBenPlanID EmpFK PlanFK
1 9950 100
2 9950 200
Health PPO Plus
Dental Standard
PlanID BenFK Plan
100 Health PPO Plus
200 Dental Standard
10. 10
MongoDB - Agility
Dynamic Schemas
V 1.0 V 1.1 V 2.0
EmpID Name Dept Title Manager Payband Benefits
9950 Dunham,
Justin
Marketing Product
Manager
6531 C
EmpID Name Title Payband Bonus
9952 Joe White CEO E 20,000
EmpID Name Dept Title Manager Payband Shares
9531 Nearey,
Graham
Marketing Director 9952 D 5000
Health PPO Plus
Dental Standard
11. 11
Shell
Command-line shell for
interacting directly with
database
MongoDB - Usability
Drivers
Drivers for most popular
programming languages and
frameworks
> db.collection.insert({product:“MongoDB”,
type:“Document Database”})
>
> db.collection.findOne()
{
“_id” : ObjectId(“5106c1c2fc629bfe52792e86”),
“product” : “MongoDB”
“type” : “Document Database”
}
Java
Python
Perl
Ruby
Haskell
JavaScript
12. 12
MongoDB - Utility
• Complex Indexed Queries
• Aggregation.
Age > 65 AND Male
living near Lyon
Age Profit Margin
1-17 0
18-35 20
36-50 80
51-65 50
66+ 5
13. 13
MongoDB - Scalability
• High Availability
• Auto Sharding
• Enterprise Monitoring
• Grid file storage
15. 15
MongoDB & Hadoop
• Multi-source analytics
• Interactive & Batch
• Data lake
• Online, Real-time
• High concurrency & HA
• Live analytics
Operational
Post
Processingand
MongoDB
Connector for
Hadoop
16. 16
• What is MongoDB
- The Company
- The Product
• MongoDB for Tick Data
• Case Study
Agenda
17. 17
Tick Data – Why MongoDB?
• Flexible Data Model
– Easy Onboarding
• Flexible Querying and Indexing
– Primary, Secondary & Index Intersection
• Aggregation Framework
– Native to MongoDB
• Pre-aggregation pattern
– Continous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector
– Java, Python, Scala, R, Matlab
• High Throughput & Linear Scalability
18. 18
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
bidPrice: 55.37,
offerPrice: 55.58,
bidQuantity: 500,
offerQuantity: 700
}
> db.ticks.find( {symbol: "DIS",
bidPrice: {$gt: 55.36} } )
Flexible Data Model
Easy Onboarding – e.g. Equities
19. 19
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
bidPrices: [55.37, 55.36, 55.35],
offerPrices: [55.58, 55.59, 55.60],
bidQuantities: [500, 1000, 2000],
offerQuantities: [1000, 2000, 3000]
}
> db.ticks.find( {bidPrices: {$gt: 55.36} } )
Flexible Data Model
Easy Onboarding – e.g. Depth of Book
20. 20
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
title: “Disney Earnings…”
body: “Walt Disney Company reported…”,
tags: [“earnings”, “media”, “walt disney”]
}
Flexible Data Model
Easy Onboarding – e.g. News
21. 21
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
timestamp: ISODate("2013-02-15 10:00"),
twitterHandle: “jdoe”,
tweet: “Heard @DisneyPictures is releasing…”,
usernamesIncluded: [“DisneyPictures”],
hashTags: [“movierumors”, “disney”]
}
Flexible Data Model
Easy Onboarding – e.g. Social Networking
22. 22
Tick Data – Why MongoDB?
• Flexible Data Model
– Easy Onboarding
• Flexible Querying and Indexing
– Primary, Secondary & Index Intersection
• Aggregation Framework & Map-Reduce
– Native to MongoDB
• Pre-aggregation pattern
– Continous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector
– Java, Python, Scala, R, Matlab
• High Throughput & Linear Scalability
23. 23
Architecture for Querying Data
Higher Latency
Trading
Applications
Higher Latency
Trading
Applications
Backtesting
Applications
Backtesting
Applications
Research &
Analysis
Applications
Research &
Analysis
Applications
24. 24
// Compound indexes
> db.ticks.ensureIndex({symbol: 1, timestamp:1})
// Index on arrays
>db.ticks.ensureIndex( {bidPrices: -1})
// Index on any depth
> db.ticks.ensureIndex( {“bids.price”: 1} )
// Full text search
> db.ticks.ensureIndex ( {tweet: “text”} )
Flexible Querying and Indexing
Index any field [or arrays]
25. 25
// Ticks for last month for media companies
> db.ticks.find({
symbol: {$in: ["DIS", “VIA“, “CBS"]},
timestamp: {$gt: new ISODate("2013-01-
01")},
timestamp: {$lte: new ISODate("2013-01-
31")}})
// Ticks when Disney’s bid breached 55.50 this month
> db.ticks.find({
symbol: "DIS",
bidPrice: {$gt: 55.50},
timestamp: {$gt: new ISODate("2013-02-
01")}})
Flexible Querying and Indexing
Rich Query Language
26. 26
Tick Data – Why MongoDB?
• Flexible Data Model
– Easy Onboarding
• Flexible Querying and Indexing
– Primary, Secondary & Index Intersection
• Aggregation Framework & Map-Reduce
– Native to MongoDB
• Pre-aggregation pattern
– Continous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector
– Java, Python, Scala, R, Matlab
• High Throughput & Linear Scalability
28. 28
Tick Data – Why MongoDB?
• Flexible Data Model
– Easy Onboarding
• Flexible Querying and Indexing
– Primary, Secondary & Index Intersection
• Aggregation Framework & Map-Reduce
– Native to MongoDB
• Pre-aggregation pattern
– Continuous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector
– Java, Python, Scala, R, Matlab
29. 29
Pre-aggregation pattern
Real-time and continuous state
{
_id :
ObjectId("4e2e3f92268cdda473b628f6”)
symbol : "DIS",
timestamp: ISODate("2013-02-15
10:00"),
bidPrices: [55.37, 55.36, 55.35],
…
}
{
_id :
ObjectId("4e2e3f92268cdda473b628f6”)
symbol : "DIS",
timestamp: ISODate("2013-02-15
…
}
{
_id :
ObjectId("4e2e3f9226
8cdda473b628f6”)
symbol : "DIS",
Daily_high: 66.1
Daily_low: 57.1
Daily_volume: 100222
}
All Ticks CollectionPre-aggregated State
30. 30
Tick Data – Why MongoDB?
• Flexible Data Model
– Easy Onboarding
• Flexible Querying and Indexing
– Primary, Secondary & Index Intersection
• Aggregation Framework & Map-Reduce
– Native to MongoDB
• Pre-aggregation pattern
– Continuous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector
– Java, Python, Scala, R, Matlab
31. 31
Process Data in Hadoop
• MongoDB’s Hadoop Connector
• Supports Map/Reduce, Streaming, Pig
• MongoDB as input/output storage for Hadoop
jobs
– No need to go through HDFS
• Leverage power of Hadoop ecosystem against
operational data in MongoDB
32. 32
Tick Data – Why MongoDB?
• Flexible Data Model
– Easy Onboarding
• Flexible Querying and Indexing
– Primary, Secondary & Index Intersection
• Aggregation Framework & Map-Reduce
– Native to MongoDB
• Pre-aggregation pattern
– Continuous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector
– Java, Python, Scala, R, Matlab
• High Throughput & High Scalability
33. 33
Why MongoDB Is Fast and Scalable
Better data locality
Relational MongoDB
In-Memory
Caching
Auto-Sharding
Read/write scalingRead/write scaling
34. 34
• What is MongoDB
- The Company
- The Product
• MongoDB for Tick Data
• Case Study
Agenda
35. 35
Easy On-boarding
Easy On-boarding of all Financial Data
Problem Why MongoDB
• Financial data comes in many different shapes and sizes,
and it needs to be on-boarded for research and analysis from
multiple platforms like Bloombergs and Reuters
Shapes
- Time Series News
- Event
- Sentiment
Sizes
- 1MB 1x a day price data
- 1GB x 1000s data matrices
- 40GB 1-minute data
- 30TB Tick data
- Even bigger << options data
• On-boarding can takes week in a relational model with
complex schema designs and ETL
•An FX Option can be a 80+ table schema
• Relational technology is a scale up architecture and did not
meet performance requirement of AHL
• Dynamic schema: can on-board data of any
shape or size almost instantly, without having to
go through a typical “ETL” lifecyle
• Performance: Quant researchers want data
rendered in <1s for up-to 20 years of historical
data for back-testing trading strategies
• Replication: Team of 40 Quants researchers who
rely on this system being up.
• Sharding: can scale seamlessly and
accommodate data of any shape and size
36. 36
Low latency:
-1xDay data: 4ms for 10,000 rows (vs. 2,210ms from SQL)
-OneMinute / Tick data: 1s for 3.5M rows Python (vs. 15s – 40s+ from OtherTick)
-1s for 15M rows Java
-
Parallel Access:
-Cluster with 256+ concurrent data access
-Consistent throughput – little load on the Mongo server
Efficient:
-10-15x reduction in network load
-Negligible decompression cost (lz4: 1.8Gb/s)
Easy On-boarding
Results
39. 39
James (AHL) Presentation Links
• Slides:
• http://www.slideshare.net/JamesBlackburn1/mo
ngodb-and-python-as-a-market-data-platform
• YouTube:
• James Blackburn - Python and MongoDB as a
Platform for Financial Market Data
MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
----- Meeting Notes (11/02/2014 12:00) -----
MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
Dotted line is the natural boundary of what is possible today. Eg, ORCL lives far out on the right and does things nosql vendors will ever do. These things come at the expense of some degree of scale and performance.
NoSQL born out of wanting greater scalability and performance, but we think they overreacted by giving up some things. Eg, caching layers give up many things, key value stores are super fast, but give up rich data model and rich query model.
MongoDB tries to give up some features of a relational database (joins, complex transactions) to enable greater scalability and performance. You get most of the functionality – 80% - with much better scalability and performance.
Start with rdbms, ask what could we do to scale – take out complex transactions and joins. How? Change the data model. &gt;&gt; segue to data model section.
May need to revise the graphic – either remove the line or all points should be on the line.
To enable horizontal scalability, reduce coordination between nodes (joins and transactions). Traditionally in rdbms you would denormalize the data or tell the system more about how data relates to one another. Another way, a more intuitive way, is to use a document data model. More intuitive b/c closer to the way we develop applications today with object oriented languages, like java,.net, ruby, node.js, etc.
Document data model is good segue to next section &gt;&gt; Data Model
Makes MongoDB a Hadoop-enabled file system
Read and write to live data, in-place
Copy data between Hadoop and MongoDB
Uses MongoDB indexes to filter data
Full support for data processing
Hive
MapReduce
Pig
Streaming