SlideShare ist ein Scribd-Unternehmen logo
1 von 58
1
Distributed, fault-tolerant, transactional
Real-Time Integration: MongoDB and SQL Databases
Eugene Dvorkin
Architect, WebMD
2
WebMD: A lot of data; a lot of traffic
~900 millions page view a month
~100 million unique visitors a month
3
How We Use MongoDB
User Activity
4
Why Move Data to RDBMS?
Preserve existing investment in BI
and data warehouse
To use analytical database such as
Vertica
To use SQL
5
Why Move Data In Real-time?
Batch process is slow
No ad-hoc queries
No real-time reports
6
Challenge in moving data
Transform Document to Relational Structure
Insert into RDBMS at high rate
7
Challenge in moving data
Scale easily as data volume and velocity
increase
8
Our Solution to move data in Real-time: Storm
tem.Storm – open source distributed real-
time computation system.
Developed by Nathan Marz - acquired
by Twitter
9
Hadoop Storm
Our Solution to move data in Real-time: Storm
10
Why STORM?
JVM-based framework
Guaranteed data processing
Supports development in multiple languages
Scalable and transactional
11
Overview of Storm cluster
Master Node
Cluster Coordination
run worker processes
12
Storm Abstractions
Tuples, Streams, Spouts, Bolts and Topologies
13
Tuples
(“ns:events”,”email:edvorkin@gmail.com”)
Ordered list of elements
14
Stream
Unbounded sequence of tuples
Example: Stream of messages from
message queue
15
Spout
Read from stream of data – Queues, web
logs, API calls, mongoDB oplog
Emit documents as tuples
Source of Streams
16
Bolts
Process tuples and create new streams
17
Bolts
Apply functions /transforms
Calculate and aggregate
data (word count!)
Access DB, API , etc.
Filter data
Map/Reduce
Process tuples and create new streams
18
Topology
19
Topology
Storm is transforming and moving data
20
MongoDB
How To Read All Incoming Data
from MongoDB?
21
MongoDB
How To Read All Incoming Data
from MongoDB?
Use MongoDB OpLog
22
What is OpLog?
Replication
mechanism in
MongoDB
It is a Capped
Collection
23
Spout: reading from OpLog
Located at local database, oplog.rs collection
24
Spout: reading from OpLog
Operations: Insert, Update, Delete
25
Spout: reading from OpLog
Name space: Table – Collection name
26
Spout: reading from OpLog
Data object:
27
Sharded cluster
28
Automatic discovery of sharded cluster
29
Example: Shard vs Replica set discovery
30
Example: Shard discovery
31
Spout: Reading data from OpLog
How to Read data continuously
from OpLog?
32
Spout: Reading data from OpLog
How to Read data continuously
from OpLog?
Use Tailable Cursor
33
Example: Tailable cursor - like tail –f
34
Manage timestamps
Use ts (timestamp in oplog entry) field to
track processed records
If system restart, start from recorded ts
35
Spout: reading from OpLog
36
SPOUT – Code Example
37
TOPOLOGY
38
Working With Embedded Arrays
Array represents One-to-Many relationship in
RDBMS
39
Example: Working with embedded arrays
40
Example: Working with embedded arrays
{_id: 1,
ns: “person_awards”,
o: { award: 'National Medal of Science',
year: 1975,
by: 'National Science Foundation' }
}
{ _id: 1,
ns: “person_awards”,
o: {award: 'Turing Award',
year: 1977,
by: 'ACM' }
}
41
Example: Working with embedded arrays
public void execute(Tuple tuple) {
.........
if (field instanceof BasicDBList) {
BasicDBObject arrayElement=processArray(field)
......
outputCollector.emit("documents", tuple, arrayElement);
42
Parse documents with Bolt
43
{"ns": "people", "op":"i",
o : {
_id: 1,
name: { first: 'John', last:
'Backus' },
birth: 'Dec 03, 1924’
}
["ns": "people", "op":"i",
[“id”:1,
"name_first": "John",
"name_last":"Backus",
"birth": "DEc 03, 1924"
]
]
Parse documents with Bolt
44
@Override
public void execute(Tuple tuple) {
......
final BasicDBObject oplogObject =
(BasicDBObject)tuple.getValueByField("document");
final BasicDBObject document = (BasicDBObject)oplogObject.get("o");
......
outputValues.add(flattenDocument(document));
outputCollector.emit(tuple,outputValues);
Parse documents with Bolt
45
Write to SQL with SQLWriter Bolt
46
Write to SQL with SQLWriter Bolt
["ns": "people", "op":"i",
[“id”:1,
"name_first": "John",
"name_last":"Backus",
"birth": "Dec 03, 1924"
]
]
insert into people (_id,name_first,name_last,birth) values
(1,'John','Backus','Dec 03,1924') ,
insert into people_awards (_id,awards_award,awards_award,awards_by)
values (1,'Turing Award',1977,'ACM'),
insert into people_awards (_id,awards_award,awards_award,awards_by)
values (1,'National Medal of Science',1975,'National Science Foundation')
47
@Override
public void prepare(.....) {
....
Class.forName("com.vertica.jdbc.Driver");
con = DriverManager.getConnection(dBUrl, username,password);
@Override
public void execute(Tuple tuple) {
String insertStatement=createInsertStatement(tuple);
try {
Statement stmt = con.createStatement();
stmt.execute(insertStatement);
stmt.close();
Write to SQL with SQLWriter Bolt
48
Topology Definition
TopologyBuilder builder = new TopologyBuilder();
// define our spout
builder.setSpout(spoutId, new MongoOpLogSpout("mongodb://",
opslog_progress)
builder.setBolt(arrayExtractorId ,new
ArrayFieldExtractorBolt(),5).shuffleGrouping(spoutId)
builder.setBolt(mongoDocParserId, new
MongoDocumentParserBolt()).shuffleGrouping(arrayExtractorId,
documentsStreamId)
builder.setBolt(sqlWriterId, new
SQLWriterBolt(rdbmsUrl,rdbmsUserName,rdbmsPassword)).shuffle
Grouping(mongoDocParserId)
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("test", conf,
builder.createTopology());
49
Topology Definition
TopologyBuilder builder = new TopologyBuilder();
// define our spout
builder.setSpout(spoutId, new MongoOpLogSpout("mongodb://",
opslog_progress)
builder.setBolt(arrayExtractorId ,new
ArrayFieldExtractorBolt(),5).shuffleGrouping(spoutId)
builder.setBolt(mongoDocParserId, new
MongoDocumentParserBolt()).shuffleGrouping(arrayExtractorId
,documentsStreamId)
builder.setBolt(sqlWriterId, new
SQLWriterBolt(rdbmsUrl,rdbmsUserName,rdbmsPassword)).shuffl
eGrouping(mongoDocParserId)
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("test", conf,
builder.createTopology());
50
Topology Definition
TopologyBuilder builder = new TopologyBuilder();
// define our spout
builder.setSpout(spoutId, new MongoOpLogSpout("mongodb://",
opslog_progress)
builder.setBolt(arrayExtractorId ,new
ArrayFieldExtractorBolt(),5).shuffleGrouping(spoutId)
builder.setBolt(mongoDocParserId, new
MongoDocumentParserBolt()).shuffleGrouping(arrayExtractorId,
documentsStreamId)
builder.setBolt(sqlWriterId, new
SQLWriterBolt(rdbmsUrl,rdbmsUserName,rdbmsPassword)).shuffle
Grouping(mongoDocParserId)
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("test", conf,
builder.createTopology());
51
Topology Definition
TopologyBuilder builder = new TopologyBuilder();
// define our spout
builder.setSpout(spoutId, new MongoOpLogSpout("mongodb://",
opslog_progress)
builder.setBolt(arrayExtractorId ,new
ArrayFieldExtractorBolt(),5).shuffleGrouping(spoutId)
builder.setBolt(mongoDocParserId, new
MongoDocumentParserBolt()).shuffleGrouping(arrayExtractorId,
documentsStreamId)
builder.setBolt(sqlWriterId, new
SQLWriterBolt(rdbmsUrl,rdbmsUserName,rdbmsPassword)).shuffle
Grouping(mongoDocParserId)
StormSubmitter.submitTopology("OfflineEventProcess",
conf,builder.createTopology())
52
Lesson learned
By leveraging MongoDB Oplog or other
capped collection, tailable cursor and Storm
framework, you can build fast, scalable,
real-time data processing pipeline.
53
Resources
Book: Getting started with Storm
Storm Project wiki
Storm starter project
Storm contributions project
Running a Multi-Node Storm cluster tutorial
Implementing real-time trending topic
A Hadoop Alternative: Building a real-time
data pipeline with Storm
Storm Use cases
54
Resources (cont’d)
Understanding the Parallelism of a Storm
Topology
Trident – high level Storm abstraction
A practical Storm’s Trident API
Storm online forum
Mongo connector from 10gen Labs
MoSQL streaming Translator in Ruby
Project source code
New York City Storm Meetup
55
Questions
Eugene Dvorkin, Architect, WebMD edvorkin@webmd.net
Twitter: @edvorkin LinkedIn: eugenedvorkin
56
57
58
Next Sessions at 2:50
5th Floor:
WestSideBallroom3&4:DataModelingExamplesfromtheRealWorld
WestSideBallroom1&2: GrowingUpMongoDB
JuilliardComplex:BusinessTrack:MetLifeLeapfrogsInsuranceIndustry
withMongoDB-PoweredBigDataApplication
LyceumComplex: AsktheExperts:MongoDBMonitoringandBackup
ServiceSession
7th Floor:
EmpireComplex:HowWeFixedOurMongoDBProblems
SoHoComplex:HighPerformance,HighScaleMongoDBonAWS:AHands
OnGuide

Weitere Àhnliche Inhalte

Ähnlich wie Real-Time Integration Between MongoDB and SQL Databases

Tutorial On Database Management System
Tutorial On Database Management SystemTutorial On Database Management System
Tutorial On Database Management Systempsathishcs
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5SAP Concur
 
An introduction into Spring Data
An introduction into Spring DataAn introduction into Spring Data
An introduction into Spring DataOliver Gierke
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormDataStax
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataGruter
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesOleksii Diagiliev
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
 
NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020Thodoris Bais
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxpetabridge
 
Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)Ravi Okade
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...MongoDB
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapKostas Tzoumas
 
Forensic Memory Analysis of Android's Dalvik Virtual Machine
Forensic Memory Analysis of Android's Dalvik Virtual MachineForensic Memory Analysis of Android's Dalvik Virtual Machine
Forensic Memory Analysis of Android's Dalvik Virtual MachineSource Conference
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingDatabricks
 
Druid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druidDruid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druidYousun Jeong
 
Fun Teaching MongoDB New Tricks
Fun Teaching MongoDB New TricksFun Teaching MongoDB New Tricks
Fun Teaching MongoDB New TricksMongoDB
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationJonathan Katz
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceUniversity of Washington
 

Ähnlich wie Real-Time Integration Between MongoDB and SQL Databases (20)

Tutorial On Database Management System
Tutorial On Database Management SystemTutorial On Database Management System
Tutorial On Database Management System
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5
 
An introduction into Spring Data
An introduction into Spring DataAn introduction into Spring Data
An introduction into Spring Data
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 
Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
Green dao
Green daoGreen dao
Green dao
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Forensic Memory Analysis of Android's Dalvik Virtual Machine
Forensic Memory Analysis of Android's Dalvik Virtual MachineForensic Memory Analysis of Android's Dalvik Virtual Machine
Forensic Memory Analysis of Android's Dalvik Virtual Machine
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
Druid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druidDruid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druid
 
Fun Teaching MongoDB New Tricks
Fun Teaching MongoDB New TricksFun Teaching MongoDB New Tricks
Fun Teaching MongoDB New Tricks
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management Application
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
 

Mehr von MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

KĂŒrzlich hochgeladen

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 

KĂŒrzlich hochgeladen (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Real-Time Integration Between MongoDB and SQL Databases

  • 1. 1 Distributed, fault-tolerant, transactional Real-Time Integration: MongoDB and SQL Databases Eugene Dvorkin Architect, WebMD
  • 2. 2 WebMD: A lot of data; a lot of traffic ~900 millions page view a month ~100 million unique visitors a month
  • 3. 3 How We Use MongoDB User Activity
  • 4. 4 Why Move Data to RDBMS? Preserve existing investment in BI and data warehouse To use analytical database such as Vertica To use SQL
  • 5. 5 Why Move Data In Real-time? Batch process is slow No ad-hoc queries No real-time reports
  • 6. 6 Challenge in moving data Transform Document to Relational Structure Insert into RDBMS at high rate
  • 7. 7 Challenge in moving data Scale easily as data volume and velocity increase
  • 8. 8 Our Solution to move data in Real-time: Storm tem.Storm – open source distributed real- time computation system. Developed by Nathan Marz - acquired by Twitter
  • 9. 9 Hadoop Storm Our Solution to move data in Real-time: Storm
  • 10. 10 Why STORM? JVM-based framework Guaranteed data processing Supports development in multiple languages Scalable and transactional
  • 11. 11 Overview of Storm cluster Master Node Cluster Coordination run worker processes
  • 12. 12 Storm Abstractions Tuples, Streams, Spouts, Bolts and Topologies
  • 14. 14 Stream Unbounded sequence of tuples Example: Stream of messages from message queue
  • 15. 15 Spout Read from stream of data – Queues, web logs, API calls, mongoDB oplog Emit documents as tuples Source of Streams
  • 16. 16 Bolts Process tuples and create new streams
  • 17. 17 Bolts Apply functions /transforms Calculate and aggregate data (word count!) Access DB, API , etc. Filter data Map/Reduce Process tuples and create new streams
  • 20. 20 MongoDB How To Read All Incoming Data from MongoDB?
  • 21. 21 MongoDB How To Read All Incoming Data from MongoDB? Use MongoDB OpLog
  • 22. 22 What is OpLog? Replication mechanism in MongoDB It is a Capped Collection
  • 23. 23 Spout: reading from OpLog Located at local database, oplog.rs collection
  • 24. 24 Spout: reading from OpLog Operations: Insert, Update, Delete
  • 25. 25 Spout: reading from OpLog Name space: Table – Collection name
  • 26. 26 Spout: reading from OpLog Data object:
  • 28. 28 Automatic discovery of sharded cluster
  • 29. 29 Example: Shard vs Replica set discovery
  • 31. 31 Spout: Reading data from OpLog How to Read data continuously from OpLog?
  • 32. 32 Spout: Reading data from OpLog How to Read data continuously from OpLog? Use Tailable Cursor
  • 33. 33 Example: Tailable cursor - like tail –f
  • 34. 34 Manage timestamps Use ts (timestamp in oplog entry) field to track processed records If system restart, start from recorded ts
  • 38. 38 Working With Embedded Arrays Array represents One-to-Many relationship in RDBMS
  • 39. 39 Example: Working with embedded arrays
  • 40. 40 Example: Working with embedded arrays {_id: 1, ns: “person_awards”, o: { award: 'National Medal of Science', year: 1975, by: 'National Science Foundation' } } { _id: 1, ns: “person_awards”, o: {award: 'Turing Award', year: 1977, by: 'ACM' } }
  • 41. 41 Example: Working with embedded arrays public void execute(Tuple tuple) { ......... if (field instanceof BasicDBList) { BasicDBObject arrayElement=processArray(field) ...... outputCollector.emit("documents", tuple, arrayElement);
  • 43. 43 {"ns": "people", "op":"i", o : { _id: 1, name: { first: 'John', last: 'Backus' }, birth: 'Dec 03, 1924’ } ["ns": "people", "op":"i", [“id”:1, "name_first": "John", "name_last":"Backus", "birth": "DEc 03, 1924" ] ] Parse documents with Bolt
  • 44. 44 @Override public void execute(Tuple tuple) { ...... final BasicDBObject oplogObject = (BasicDBObject)tuple.getValueByField("document"); final BasicDBObject document = (BasicDBObject)oplogObject.get("o"); ...... outputValues.add(flattenDocument(document)); outputCollector.emit(tuple,outputValues); Parse documents with Bolt
  • 45. 45 Write to SQL with SQLWriter Bolt
  • 46. 46 Write to SQL with SQLWriter Bolt ["ns": "people", "op":"i", [“id”:1, "name_first": "John", "name_last":"Backus", "birth": "Dec 03, 1924" ] ] insert into people (_id,name_first,name_last,birth) values (1,'John','Backus','Dec 03,1924') , insert into people_awards (_id,awards_award,awards_award,awards_by) values (1,'Turing Award',1977,'ACM'), insert into people_awards (_id,awards_award,awards_award,awards_by) values (1,'National Medal of Science',1975,'National Science Foundation')
  • 47. 47 @Override public void prepare(.....) { .... Class.forName("com.vertica.jdbc.Driver"); con = DriverManager.getConnection(dBUrl, username,password); @Override public void execute(Tuple tuple) { String insertStatement=createInsertStatement(tuple); try { Statement stmt = con.createStatement(); stmt.execute(insertStatement); stmt.close(); Write to SQL with SQLWriter Bolt
  • 48. 48 Topology Definition TopologyBuilder builder = new TopologyBuilder(); // define our spout builder.setSpout(spoutId, new MongoOpLogSpout("mongodb://", opslog_progress) builder.setBolt(arrayExtractorId ,new ArrayFieldExtractorBolt(),5).shuffleGrouping(spoutId) builder.setBolt(mongoDocParserId, new MongoDocumentParserBolt()).shuffleGrouping(arrayExtractorId, documentsStreamId) builder.setBolt(sqlWriterId, new SQLWriterBolt(rdbmsUrl,rdbmsUserName,rdbmsPassword)).shuffle Grouping(mongoDocParserId) LocalCluster cluster = new LocalCluster(); cluster.submitTopology("test", conf, builder.createTopology());
  • 49. 49 Topology Definition TopologyBuilder builder = new TopologyBuilder(); // define our spout builder.setSpout(spoutId, new MongoOpLogSpout("mongodb://", opslog_progress) builder.setBolt(arrayExtractorId ,new ArrayFieldExtractorBolt(),5).shuffleGrouping(spoutId) builder.setBolt(mongoDocParserId, new MongoDocumentParserBolt()).shuffleGrouping(arrayExtractorId ,documentsStreamId) builder.setBolt(sqlWriterId, new SQLWriterBolt(rdbmsUrl,rdbmsUserName,rdbmsPassword)).shuffl eGrouping(mongoDocParserId) LocalCluster cluster = new LocalCluster(); cluster.submitTopology("test", conf, builder.createTopology());
  • 50. 50 Topology Definition TopologyBuilder builder = new TopologyBuilder(); // define our spout builder.setSpout(spoutId, new MongoOpLogSpout("mongodb://", opslog_progress) builder.setBolt(arrayExtractorId ,new ArrayFieldExtractorBolt(),5).shuffleGrouping(spoutId) builder.setBolt(mongoDocParserId, new MongoDocumentParserBolt()).shuffleGrouping(arrayExtractorId, documentsStreamId) builder.setBolt(sqlWriterId, new SQLWriterBolt(rdbmsUrl,rdbmsUserName,rdbmsPassword)).shuffle Grouping(mongoDocParserId) LocalCluster cluster = new LocalCluster(); cluster.submitTopology("test", conf, builder.createTopology());
  • 51. 51 Topology Definition TopologyBuilder builder = new TopologyBuilder(); // define our spout builder.setSpout(spoutId, new MongoOpLogSpout("mongodb://", opslog_progress) builder.setBolt(arrayExtractorId ,new ArrayFieldExtractorBolt(),5).shuffleGrouping(spoutId) builder.setBolt(mongoDocParserId, new MongoDocumentParserBolt()).shuffleGrouping(arrayExtractorId, documentsStreamId) builder.setBolt(sqlWriterId, new SQLWriterBolt(rdbmsUrl,rdbmsUserName,rdbmsPassword)).shuffle Grouping(mongoDocParserId) StormSubmitter.submitTopology("OfflineEventProcess", conf,builder.createTopology())
  • 52. 52 Lesson learned By leveraging MongoDB Oplog or other capped collection, tailable cursor and Storm framework, you can build fast, scalable, real-time data processing pipeline.
  • 53. 53 Resources Book: Getting started with Storm Storm Project wiki Storm starter project Storm contributions project Running a Multi-Node Storm cluster tutorial Implementing real-time trending topic A Hadoop Alternative: Building a real-time data pipeline with Storm Storm Use cases
  • 54. 54 Resources (cont’d) Understanding the Parallelism of a Storm Topology Trident – high level Storm abstraction A practical Storm’s Trident API Storm online forum Mongo connector from 10gen Labs MoSQL streaming Translator in Ruby Project source code New York City Storm Meetup
  • 55. 55 Questions Eugene Dvorkin, Architect, WebMD edvorkin@webmd.net Twitter: @edvorkin LinkedIn: eugenedvorkin
  • 56. 56
  • 57. 57
  • 58. 58 Next Sessions at 2:50 5th Floor: WestSideBallroom3&4:DataModelingExamplesfromtheRealWorld WestSideBallroom1&2: GrowingUpMongoDB JuilliardComplex:BusinessTrack:MetLifeLeapfrogsInsuranceIndustry withMongoDB-PoweredBigDataApplication LyceumComplex: AsktheExperts:MongoDBMonitoringandBackup ServiceSession 7th Floor: EmpireComplex:HowWeFixedOurMongoDBProblems SoHoComplex:HighPerformance,HighScaleMongoDBonAWS:AHands OnGuide

Hinweis der Redaktion

  1. Leading source of health and medical information.
  2. Data is rawData is immutable, data is trueDynamic personalized marketing campaigns
  3. The oplog is a capped collection that lives in a database calledlocal on every replicating node and records all changes to the data. Every time a client writes to the primary, an entry with enough information to reproduce the write is automatically added to the primary’s oplog. Once the write is replicated to a given secondary, that secondary’s oplog also stores a record of the write. Each oplog entry is identified with a BSON timestamp, and all secondaries use the timestamp to keep track of the latest entry they’ve applied.
  4. How do you now if you connected to shard cluster
  5. Use mongo Oplog as a queue
  6. Spout extend interface
  7. Awards array in Person document – converted into 2 documents with id as of parent document Id
  8. Awards array – converted into 2 documents with id as of parent document Id. Name space will be used later to insert data into correct table on SQL side
  9. Instance of BasicDBList in Java
  10. Flatten out your document structure – use loop or recursion to flatten it outHopefully you don’t have deeply nested documents, which against mongoDB guidelines for schema design
  11. Use tickle tuples and update in batches
  12. Local mode vs prod mode
  13. Increasing papallelization of the bolt. Let say You want 5 bolts to process your array, because it more time consuming operation or you want more SQLWtirerBolts,Because it takes long time to insert data, then use parallelization hint parameters in bolt definition.System will create correspponding number of workers to process your request.
  14. Local mode vs prod mode
  15. Local mode vs prod mode