Learn why MongoDB is spreading like wildfire across capital markets (and really every industry) and then focus in particular on how financial firms are enjoying the developer productivity, low TCO, and unlimited scale of MongoDB as a tick database for capturing, analyzing, and taking advantage of opportunities in tick data. This webinar illustrates how MongoDB can easily and quickly store variable data formats, like top and depth of book, multiple asset classes, and even news and social networking feeds. It will explore aggregating and analyzing tick data in real-time for automated trading or in batch for research and analysis and how auto-sharding enables MongoDB to scale with commodity hardware to satisfy unlimited storage and performance requirements.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
Webinar: How Banks Use MongoDB as a Tick Database
1. How Capital Markets Firms Use
MongoDB as a Tick Database
Matt Kalan, Sr. Solution Architect
Email: Matt.kalan@10gen.com
Twitter: @matthewkalan
2. Agenda
• MongoDB Introduction
• FS Use Cases
• Writing/Capturing Market Data
• Reading/Analyzing Market Data
• Performance, Scalability, & High Availability
• Q&A
2
3. Introduction
10gen is the company behind MongoDB –
the leading next generation database
Document- General Open-
Oriented Purpose Source
3
4. 10gen Overview
200+ employees 500+ customers
Offices in New York, Palo Alto, Washington
Over $81 million in funding DC, London, Dublin, Barcelona and Sydney
4
8. Most Common FS Use Cases
1. Tick Data Capture & Analysis
2. Reference Data Management
3. Risk Analysis & Reporting
4. Trade Repository
5. Portfolio Reporting
8
9. Tick Data Capture & Analysis -
Requirements
• Capture real-time market data (multi-asset, top of
book, depth of book, even news)
• Load historical data
• Aggregate data into bars, daily, monthly intervals
• Enable queries & analysis on raw ticks or
aggregates
• Drive backtesting or automated signals
9
10. Tick Data Capture & Analysis –
Why MongoDB?
• High throughput => can capture real-time feeds for all
products/asset classes needed
• High scalability => all data and depth for all historical time periods
can be captured
• Flexible & Range-based indexing => fast querying on time ranges
and any fields
• Aggregation Framework => can shape raw data into aggregates
(e.g. ticks to bars)
• Map-reduce capability (Native MR or Hadoop Connector) => batch
analysis looking for patterns and opportunities
• Easy to use => native language drivers and JSON expressions that
you can apply for most operational database needs as well
• Low TCO => Low software license cost and commodity hardware
10
22. Architecture for Querying Data
Research &
Analysis
• Ticks Applications
• Bars
• Other analysis
Backtesting
Applications
Higher Latency
Trading
Applications
22
23. Index any fields: arrays, nested, etc
// Compound indexes
> db.ticks.ensureIndex({symbol: 1, timestamp:1})
// Index on arrays
>db.ticks.ensureIndex( {bidPrices: -1})
// Index on any depth
> db.ticks.ensureIndex( {“bids.price”: 1} )
// Full text search
> db.ticks.ensureIndex ( {tweet: “text”} )
23
24. Query for ticks by time; price
threshold
// Ticks for last month for media companies
> db.ticks.find({
symbol: {$in: ["DIS", “VIA“, “CBS"]},
timestamp: {$gt: new ISODate("2013-01-01")},
timestamp: {$lte: new ISODate("2013-01-31")}})
// Ticks when Disney’s bid breached 55.50 this month
> db.ticks.find({
symbol: "DIS",
bidPrice: {$gt: 55.50},
timestamp: {$gt: new ISODate("2013-02-01")}})
24
25. Analyzing/Aggregating Options
• Custom application code
– Run your queries, compute your results
• Aggregation framework
– Declarative, pipeline-based approach
• Native Map/Reduce in MongoDB
– Javascript functions distributed across cluster
• Hadoop Connector
– Offline batch processing/computation
25
27. Add analysis on the bars
…
//then count the number of down bars
{ $project: {
downBar: {$lt: [“$close”, “$open”] },
timestamp: 1,
open: 1, high: 1, low: 1, close: 1}},
{ $group: {
_id: “$downBar”,
sum: {$sum: 1}}} })
27
28. Map-Reduce Example: Sum
var mapFunction = function () {
emit(this.symbol, this.bidPrice);
}
var reduceFunction = function (symbol, priceList) {
return Array.sum(priceList);
}
> db.ticks.mapReduce(
map, reduceFunction, {out: ”tickSums"})
28
29. Process Data on Hadoop
• MongoDB’s Hadoop Connector
• Supports Map/Reduce, Streaming, Pig
• MongoDB as input/output storage for Hadoop
jobs
– No need to go through HDFS
• Leverage power of Hadoop ecosystem against
operational data in MongoDB
29
33. Auto-Sharding for Horizontal Scale
Key Range Key Range
Symbol: A…J Symbol: K…Z
mongod mongod
Read/Write Scalability
33
34. Sharding
Key Range Key Range Key Range Key Range
Symbol: A…F Symbol: G…J Symbol: K…O Symbol: P…Z
mongod mongod
mongod mongod
Read/Write Scalability
34
35. Application
MongoS MongoS MongoS
Key Range Key Range Key Range Key Range
Symbol: A…F, Symbol: G…J, Symbol: K…O, Symbol: P…Z,
Time Time Time Time
Primary Primary Primary Primary
Secondary Secondary Secondary Secondary
Secondary Secondary Secondary Secondary
35
36. 10gen Products and Services
Subscriptions
Professional Support, Enterprise Edition and Commercial License
Consulting
Expert Resources for All Phases of MongoDB Implementations
Training
Online and In-Person, for Developers and Administrators
36
37. Summary
• MongoDB is high performance for tick data
• Scales horizontally automatically by auto-
sharding
• Fast, flexible querying, analysis, & aggregation
• Dynamic schema can handle any data types
• MongoDB has all these features with low TCO
• 10gen can support you with anything discussed
37
38. For More Information
Resource User Data Management
Location
MongoDB Downloads www.mongodb.org/download
Free Online Training education.10gen.com
Webinars and Events www.10gen.com/events
White Papers www.10gen.com/white-papers
Customer Case Studies www.10gen.com/customers
Presentations www.10gen.com/presentations
Documentation docs.mongodb.org
Additional Info info@10gen.com
38
39. How Capital Markets Firms Use
MongoDB as a Tick Database
Matt Kalan, Sr. Solution Architect
Email: Matt.kalan@10gen.com
Twitter: @matthewkalan
Hinweis der Redaktion
Mention tick databases
JSON document – contains key value pairs, different types, values can also be arrays and other documents
because of the way MongoDB lets you update documents atomically we can be sure totals and list of voters will stay in sync
because of the way MongoDB lets you update documents atomically we can be sure totals and list of voters will stay in sync
because of the way MongoDB lets you update documents atomically we can be sure totals and list of voters will stay in sync
comments is an array of JSON documentswe can query by fields inside embedded documents as well as array members.
secondary indexes, compound indexes, multikey indexes.why is it important to have all of document together? data locality
secondary indexes, compound indexes, multikey indexes.why is it important to have all of document together? data locality
Fewer reads, data is together, memory mapped files, caching handled by OS, naturally leaves most frequently accessed data in RAM (have enough RAM to fit indexes and working data set into RAM for best performance), horizontal scaling is "built-in" to the product by design from the start.
Full deployment. As many mongoS processes as you have app servers (for example); Config DBs are small but hold the critical information about where ranges of data are located on disk/shards.