SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
©Continuent 2014
Getting Serious about
MySQL and Hadoop at
Continuent
Robert Hodges, CEO
©Continuent 2014
Why should MySQL users
care about Hadoop?
2
©Continuent 2014
What is a Hadoop?
3
Hadoop Distributed File System (HDFS)
MapReduce
Spark
Hive
Storm
Pig
Shark
Mahout
HBase
Oozie
Avro
HCatalog
Scalding
Stinger
Impala
Sqoop
Ambari
Cassandra
Zookeeper
©Continuent 2014
With this much funding it must be good
4
(ZDNet)
(jaxenter.com)
(forbes.com)
(451 Group)
©Continuent 2014
Hadoop analyzes any type of data
5
Server Logs
Social
media
feeds
Geolocation 	

data
Clickstreams
Sensor 	

readings
Business 	

transactions
Analytic reports
©Continuent 2014
Hadoop data loading is simple
!
mysql> select * into
-> outfile '/tmp/sakila.rental.csv'
-> fields terminated by ','
-> lines terminated by 'n'
-> from sakila.rental;
Query OK, 16044 rows affected (0.03 sec)
!
mysql> quit
Bye
$ hadoop fs -put /tmp/sakila.rental.csv
6
©Continuent 2014
Hadoop exploits downward cost of
storing and processing data
7
Disk Storage -- Average Cost Per Gigabyte
$0.01
$0.10
$1.00
$10.00
$100.00
$1,000.00
$10,000.00
1990 1993 1996 1999 2002 2005 2008 2011 2014
(Source: John McCallum, http://www.jcmit.com)
©Continuent 2014
Hadoop is shifting from batch to real-
time analytics
8
Cycle time for different iterative algorithms
Page Rank
K-Means Clustering
Logistic Regression
0 40 80 120 160
0.96
4.1
14
110
155
80
Core Hadoop Spark
(Source: Pat McDonough, http://spark-summit.org/2013)
©Continuent 2014
Hadoop is becoming the way that
users œš‘“›⁸see’”⁹ data
9
©Continuent 2014
What does it mean to
integrate with Hadoop?
10
©Continuent 2014
Three integration problems
11
1.Continuous, high-performance loading
2.Meaningful analytics on Hadoop
3.Optimized operation for large-scale
deployment
©Continuent 2014
Thesis: Snapshots
12
Data volumes?
System load?
Latency?
Change history?
Dump/load
©Continuent 2014
MySQL does not do it that way...
13
Binlog
Replication
©Continuent 2014
Antithesis: Real-time replication
14
Raw files?
Overwrite/append?
Replication
Binlog
©Continuent 2014
Synthesis: Snapshots + real-time
replication
15
Replication
CSV	

Files
CSV	

Files
Buffered
Transactions
Binlog
Dump/load
©Continuent 2014
We can implement that!
16
MySQL
binlog_format=row
MySQL	

Binlog
Tungsten 3.0 Master
hadoop
Tungsten 3.0 Slave
hadoop
CSV	

Files
CSV	

Files
CSV	

Files
CSV	

FilesCSV
Apache Sqoop/ETL
Fast data filtering
Buffered	

CSV
Programmable 	

load scripts
Parallel apply
Parallel table
dumps
Low impact
replication from
the binlog
©Continuent 2014
How do you like your data?
(Your data stored in MySQL)
+---------+--------------------+-------------+--------+
| film_id | title | rental_rate | length |
+---------+--------------------+-------------+--------+
| 556 | MALTESE HOPE | 4.99 | 127 |
| 557 | MANCHURIAN CURTAIN | 2.99 | 177 |
| 558 | MANNEQUIN WORST | 2.99 | 71 |
| 559 | MARRIED GO | 2.99 | 114 |
+---------+--------------------+-------------+--------+
!
17
©Continuent 2014
Does it really look better like this?
!
!
!
!
556,MALTESE HOPE,4.99,127n
557,MANCHURIAN CURTAIN,3.99,177n
558,MANNEQUIN WORST,2.99,71n
559,MARRIED GO,2.99,114n
18
field separator
file partitioning
record separator
compression type conversions
(Your data stored in Hadoop)
©Continuent 2014
Or this?
19
!
(INSERT)
I,57,556,2014-03-27 21:04:24.000,556,MALTESE HOPE,
4.99,127n
!
(UPDATE)
D,57,557,2014-03-27 21:04:24.000,557,N,N,Nn
I,57,558,2014-03-27 21:04:24.000,557,MANCHURIAN
CURTAIN,2.99,177n
!
(DELETE)
D,57,559,2014-03-27 21:04:24.000,558,N,N,Nn
©Continuent 2014
One more thing to replicate...
20
Dump/load
Replication
CSV	

Files
CSV	

Files
Buffered
Transactions
Binlog
Table metadata
©Continuent 2014
A more civilized view of data
!
!
(Your data viewed through Hive)
556	
MALTESE HOPE	
 4.99	
 127
557	
MANCHURIAN CURTAIN	
3.99	
 177
558	
MANNEQUIN WORST	
 2.99	
 71
559	
MARRIED GO	
 2.99	
 114
21
©Continuent 2014
Are we done yet?
22
Transaction logs Snapshot
????
©Continuent 2014
Introducing a useful MapReduce trick...
23
Transaction logs Snapshot
UNION ALL
Emit last row per key if not a delete
MAP
REDUCE
Materialized view
including all updates
Sort by key(s), transaction orderSHUFFLE
©Continuent 2014
...With some amazing properties
24
Apache Sqoop
Tungsten Replication
CSV	

Files
CSV	

Files
Buffered
CSV Files
No replication
failures due to
consistency
Reconstruct
consistent	

views at will
No locks	

No transactions	

No need to pause
processing
Reprovision any
table at will
Table metadata
©Continuent 2014
We can implement that too!!
25
https://github.com/continuent/continuent-tools-hadoop
Continuent	

Hadoop	

Tools
Schema	

creation
Materialized	

view
generation
Data 	

comparison
Apache 2.0	

licensing
©Continuent 2014
Optimizing large scale deployments
26
Replicator
m1 (slave)
m2 (slave)
m3 (slave)
Replicator
m1 (master)
m2 (master)
m3 (master)
Replicator
Replicator
RBR
RBR
RBR
©Continuent 2014
Where we want to be
27
Single path	

loading
CSV	

Files
CSV	

Files
Buffered
TransactionsBinlog
©Continuent 2014
Where we want to be
28
Single path	

loading
CSV	

Files
CSV	

Files
Buffered
TransactionsBinlog
©Continuent 2014
Tungsten 3.0 Roadmap for Hadoop
29
Q1 2014 Q2 2014
Features
• Parallel extractor
• Polished MapReduce
tools
• Improved schema
change handling
• Binary data
conversion
• HortonWorks 2.0
Features
• Scripted load
• Better block commit
• Hive CSV format
• Hive DDL generation
• Partitioned files
• Auto-recovery
• Parallel batch apply
• Sqoop integration
• Cloudera 4.x/5.0
©Continuent 2014
How can we prepare for
Hadoop integration?
30
©Continuent 2014
Users can prepare...
• Use Unicode/UTF8
• Standardize on UTC for time
• Enable row replication
• Cluster your data in a way that supports
restarts
31
©Continuent 2014
MySQL can prepare...
32
By being MySQL
©Continuent 2014
The MySQL community can prepare...
• Fast heterogeneous replication and loading
• Innovative projects to make relational data
easy to consume on Hadoop
• Competing solutions that improve life for
users
33
©Continuent 2014
Conclusion
• Hadoop is for real and the MySQL community
needs to adapt
• The challenge is to move data to Hadoop and
make it easy to integrate into analytics
• MySQL can be *the* preferred RDBMS to use
with Hadoop
34
©Continuent 2014
Thanks to our many customers
35
23
©Continuent 2014
Wed 2:20pm Ballroom B - Hadoop for MySQL People	

!
Thurs 1pm Ballroom D - From Dolphins to Elephants:
Real-Time MySQL to Hadoop Replication	

We’re Hiring!
http://www.continuent.com

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Simplilearn
 

Was ist angesagt? (20)

Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Pptx present
Pptx presentPptx present
Pptx present
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animation
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 

Andere mochten auch

Andere mochten auch (6)

OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"
 
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezIntro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
 
Hadoop to spark_v2
Hadoop to spark_v2Hadoop to spark_v2
Hadoop to spark_v2
 

Ähnlich wie Keynote: Getting Serious about MySQL and Hadoop at Continuent

Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
Rajesh Nadipalli
 

Ähnlich wie Keynote: Getting Serious about MySQL and Hadoop at Continuent (20)

Real-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopReal-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to Hadoop
 
Set Up & Operate Real-Time Data Loading into Hadoop
Set Up & Operate Real-Time Data Loading into HadoopSet Up & Operate Real-Time Data Loading into Hadoop
Set Up & Operate Real-Time Data Loading into Hadoop
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is Possible
 
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Replicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftReplicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon Redshift
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
Strata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applicationsStrata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applications
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
 
Manuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4octManuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4oct
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
 
CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014
 
Exploring sql server 2016 bi
Exploring sql server 2016 biExploring sql server 2016 bi
Exploring sql server 2016 bi
 
Scylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi KivityScylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi Kivity
 

Mehr von Continuent

Continuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition WebinarContinuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition Webinar
Continuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControlWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Continuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Continuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Continuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Continuent
 
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Continuent
 
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Continuent
 
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Continuent
 

Mehr von Continuent (20)

Tungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and BeyondTungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
 
Continuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition WebinarContinuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition Webinar
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControlWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS AuroraWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
 
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
 
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
 
Webinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
Webinar Slides: Intelligent Database Proxies: Routing & Transparent FailoverWebinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
Webinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
 
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
 
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 205 - Installing and Configuring Tungsten DashboardTraining Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
 
Training Slides: 352 - Tungsten Replicator for MongoDB & Kafka
Training Slides: 352 - Tungsten Replicator for MongoDB & KafkaTraining Slides: 352 - Tungsten Replicator for MongoDB & Kafka
Training Slides: 352 - Tungsten Replicator for MongoDB & Kafka
 
Training Slides: 351 - Tungsten Replicator for Data Warehouses
Training Slides: 351 - Tungsten Replicator for Data WarehousesTraining Slides: 351 - Tungsten Replicator for Data Warehouses
Training Slides: 351 - Tungsten Replicator for Data Warehouses
 
Training Slides: 303 - Replicating out of a Cluster
Training Slides: 303 - Replicating out of a ClusterTraining Slides: 303 - Replicating out of a Cluster
Training Slides: 303 - Replicating out of a Cluster
 
Training Slides: 206 - Using the Tungsten Cluster AMI
Training Slides: 206 - Using the Tungsten Cluster AMITraining Slides: 206 - Using the Tungsten Cluster AMI
Training Slides: 206 - Using the Tungsten Cluster AMI
 
Training Slides: 254 - Using the Tungsten Replicator AMI
Training Slides: 254 - Using the Tungsten Replicator AMITraining Slides: 254 - Using the Tungsten Replicator AMI
Training Slides: 254 - Using the Tungsten Replicator AMI
 
Training Slides: 253 - Filter like a Pro
Training Slides: 253 - Filter like a ProTraining Slides: 253 - Filter like a Pro
Training Slides: 253 - Filter like a Pro
 
Training Slides: 252 - Monitoring & Troubleshooting
Training Slides: 252 - Monitoring & TroubleshootingTraining Slides: 252 - Monitoring & Troubleshooting
Training Slides: 252 - Monitoring & Troubleshooting
 
Training Slides: 302 - Securing Your Cluster With SSL
Training Slides: 302 - Securing Your Cluster With SSLTraining Slides: 302 - Securing Your Cluster With SSL
Training Slides: 302 - Securing Your Cluster With SSL
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Keynote: Getting Serious about MySQL and Hadoop at Continuent

  • 1. ©Continuent 2014 Getting Serious about MySQL and Hadoop at Continuent Robert Hodges, CEO
  • 2. ©Continuent 2014 Why should MySQL users care about Hadoop? 2
  • 3. ©Continuent 2014 What is a Hadoop? 3 Hadoop Distributed File System (HDFS) MapReduce Spark Hive Storm Pig Shark Mahout HBase Oozie Avro HCatalog Scalding Stinger Impala Sqoop Ambari Cassandra Zookeeper
  • 4. ©Continuent 2014 With this much funding it must be good 4 (ZDNet) (jaxenter.com) (forbes.com) (451 Group)
  • 5. ©Continuent 2014 Hadoop analyzes any type of data 5 Server Logs Social media feeds Geolocation data Clickstreams Sensor readings Business transactions Analytic reports
  • 6. ©Continuent 2014 Hadoop data loading is simple ! mysql> select * into -> outfile '/tmp/sakila.rental.csv' -> fields terminated by ',' -> lines terminated by 'n' -> from sakila.rental; Query OK, 16044 rows affected (0.03 sec) ! mysql> quit Bye $ hadoop fs -put /tmp/sakila.rental.csv 6
  • 7. ©Continuent 2014 Hadoop exploits downward cost of storing and processing data 7 Disk Storage -- Average Cost Per Gigabyte $0.01 $0.10 $1.00 $10.00 $100.00 $1,000.00 $10,000.00 1990 1993 1996 1999 2002 2005 2008 2011 2014 (Source: John McCallum, http://www.jcmit.com)
  • 8. ©Continuent 2014 Hadoop is shifting from batch to real- time analytics 8 Cycle time for different iterative algorithms Page Rank K-Means Clustering Logistic Regression 0 40 80 120 160 0.96 4.1 14 110 155 80 Core Hadoop Spark (Source: Pat McDonough, http://spark-summit.org/2013)
  • 9. ©Continuent 2014 Hadoop is becoming the way that users œš‘“›⁸see’”⁹ data 9
  • 10. ©Continuent 2014 What does it mean to integrate with Hadoop? 10
  • 11. ©Continuent 2014 Three integration problems 11 1.Continuous, high-performance loading 2.Meaningful analytics on Hadoop 3.Optimized operation for large-scale deployment
  • 12. ©Continuent 2014 Thesis: Snapshots 12 Data volumes? System load? Latency? Change history? Dump/load
  • 13. ©Continuent 2014 MySQL does not do it that way... 13 Binlog Replication
  • 14. ©Continuent 2014 Antithesis: Real-time replication 14 Raw files? Overwrite/append? Replication Binlog
  • 15. ©Continuent 2014 Synthesis: Snapshots + real-time replication 15 Replication CSV Files CSV Files Buffered Transactions Binlog Dump/load
  • 16. ©Continuent 2014 We can implement that! 16 MySQL binlog_format=row MySQL Binlog Tungsten 3.0 Master hadoop Tungsten 3.0 Slave hadoop CSV Files CSV Files CSV Files CSV FilesCSV Apache Sqoop/ETL Fast data filtering Buffered CSV Programmable load scripts Parallel apply Parallel table dumps Low impact replication from the binlog
  • 17. ©Continuent 2014 How do you like your data? (Your data stored in MySQL) +---------+--------------------+-------------+--------+ | film_id | title | rental_rate | length | +---------+--------------------+-------------+--------+ | 556 | MALTESE HOPE | 4.99 | 127 | | 557 | MANCHURIAN CURTAIN | 2.99 | 177 | | 558 | MANNEQUIN WORST | 2.99 | 71 | | 559 | MARRIED GO | 2.99 | 114 | +---------+--------------------+-------------+--------+ ! 17
  • 18. ©Continuent 2014 Does it really look better like this? ! ! ! ! 556,MALTESE HOPE,4.99,127n 557,MANCHURIAN CURTAIN,3.99,177n 558,MANNEQUIN WORST,2.99,71n 559,MARRIED GO,2.99,114n 18 field separator file partitioning record separator compression type conversions (Your data stored in Hadoop)
  • 19. ©Continuent 2014 Or this? 19 ! (INSERT) I,57,556,2014-03-27 21:04:24.000,556,MALTESE HOPE, 4.99,127n ! (UPDATE) D,57,557,2014-03-27 21:04:24.000,557,N,N,Nn I,57,558,2014-03-27 21:04:24.000,557,MANCHURIAN CURTAIN,2.99,177n ! (DELETE) D,57,559,2014-03-27 21:04:24.000,558,N,N,Nn
  • 20. ©Continuent 2014 One more thing to replicate... 20 Dump/load Replication CSV Files CSV Files Buffered Transactions Binlog Table metadata
  • 21. ©Continuent 2014 A more civilized view of data ! ! (Your data viewed through Hive) 556 MALTESE HOPE 4.99 127 557 MANCHURIAN CURTAIN 3.99 177 558 MANNEQUIN WORST 2.99 71 559 MARRIED GO 2.99 114 21
  • 22. ©Continuent 2014 Are we done yet? 22 Transaction logs Snapshot ????
  • 23. ©Continuent 2014 Introducing a useful MapReduce trick... 23 Transaction logs Snapshot UNION ALL Emit last row per key if not a delete MAP REDUCE Materialized view including all updates Sort by key(s), transaction orderSHUFFLE
  • 24. ©Continuent 2014 ...With some amazing properties 24 Apache Sqoop Tungsten Replication CSV Files CSV Files Buffered CSV Files No replication failures due to consistency Reconstruct consistent views at will No locks No transactions No need to pause processing Reprovision any table at will Table metadata
  • 25. ©Continuent 2014 We can implement that too!! 25 https://github.com/continuent/continuent-tools-hadoop Continuent Hadoop Tools Schema creation Materialized view generation Data comparison Apache 2.0 licensing
  • 26. ©Continuent 2014 Optimizing large scale deployments 26 Replicator m1 (slave) m2 (slave) m3 (slave) Replicator m1 (master) m2 (master) m3 (master) Replicator Replicator RBR RBR RBR
  • 27. ©Continuent 2014 Where we want to be 27 Single path loading CSV Files CSV Files Buffered TransactionsBinlog
  • 28. ©Continuent 2014 Where we want to be 28 Single path loading CSV Files CSV Files Buffered TransactionsBinlog
  • 29. ©Continuent 2014 Tungsten 3.0 Roadmap for Hadoop 29 Q1 2014 Q2 2014 Features • Parallel extractor • Polished MapReduce tools • Improved schema change handling • Binary data conversion • HortonWorks 2.0 Features • Scripted load • Better block commit • Hive CSV format • Hive DDL generation • Partitioned files • Auto-recovery • Parallel batch apply • Sqoop integration • Cloudera 4.x/5.0
  • 30. ©Continuent 2014 How can we prepare for Hadoop integration? 30
  • 31. ©Continuent 2014 Users can prepare... • Use Unicode/UTF8 • Standardize on UTC for time • Enable row replication • Cluster your data in a way that supports restarts 31
  • 32. ©Continuent 2014 MySQL can prepare... 32 By being MySQL
  • 33. ©Continuent 2014 The MySQL community can prepare... • Fast heterogeneous replication and loading • Innovative projects to make relational data easy to consume on Hadoop • Competing solutions that improve life for users 33
  • 34. ©Continuent 2014 Conclusion • Hadoop is for real and the MySQL community needs to adapt • The challenge is to move data to Hadoop and make it easy to integrate into analytics • MySQL can be *the* preferred RDBMS to use with Hadoop 34
  • 35. ©Continuent 2014 Thanks to our many customers 35 23
  • 36. ©Continuent 2014 Wed 2:20pm Ballroom B - Hadoop for MySQL People ! Thurs 1pm Ballroom D - From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication We’re Hiring! http://www.continuent.com