SlideShare ist ein Scribd-Unternehmen logo
1 von 19
CAN THE ELEPHANTS HANDLE
THE NO-SQL ONSLAUGHT?


AUNG THU RHA HEIN
G5537871
OUTLINE
 Introduction
 Background
 Evaluation
    Traditional DSS Workload: Hive vs PDW
    Modern OLTP Workload: MongoDB vs SQL Server
 Discussion & Conclusion
INTRODUCTION

 Motivation
How does the performance and scalability of RDBMs solutions compare
to the NoSQL systems?
 Proposition
compare MongoDB(AS/CS) with SQL Server and Hive with SQL PWD,
and analyze the performance and scalability aspects on two workloads
(decision support analysis and interactive data-serving).
 Use YCSB and TPC-H DSS benchmarks respectively
BACKGROUND
 Parallel Data Warehouse (PDW)
    shared-nothing parallel database system built on top of SQL
     Server
    multiple compute nodes, a single control node and other
     administrative service nodes.
 Hive
    an open-source data warehouse built on top of Hadoop
    a structured data model for data that is stored in the Hadoop
     Distributed Filesystem (HDFS), and a SQL-like declarative query
     language called HiveQL
BACKGROUND(CONT.)
 MongoDB
  Features

   a document-oriented storage layer, indexing in the form of B-
    trees, auto-sharding, asynchronous replication of data between
    servers.
   Data stored in collections which contain documents
   Each document is serialized using BSON

  For implementation, it is created two types of MongoDB servers:
             MongoDB-CS (with client-side sharding )
             MongoDB-AS (Auto-Sharding)
EVALUATION
 Make hardware and software configuration for all four systems
 For PDW and Hive, use 8 disks to store the data
 For YCSB benchmark, 8 nodes are used as servers and another 8 for
  client-benchmarks
Hive and Hadoop
 Use RCFile format to store data
 All TPC-H tables are stored in Gzip RcCile format
TRADITIONAL DSS WORKLOAD:
HIVE VS PDW
Workload Description
 use TPC-H at 4 scale factors (250,500,1000,4000,16000 GBs)
 TPC-H generator doesn’t produce correct result at 16000 scale
 Executed all 22 TPC-H queries
 But leave 2 TPC-H refresh functions
TRADITIONAL DSS WORKLOAD:
HIVE VS PDW
Data Layout in
Hive and PDW
TRADITIONAL DSS WORKLOAD:
HIVE VS PDW
Data Preparation and Load Times
Hive
 Generated dataset across 16 nodes
 Create one hive table for each TPC-H table
 Data is loaded in 2 phases:
    data files loaded onto each node
    data is converted from text to RCfile format.
PDW
 Load data into landed node
 Create necessary tables
TRADITIONAL DSS WORKLOAD:
HIVE VS PDW
Performance Analysis
TRADITIONAL DSS WORKLOAD:
HIVE VS PDW
Performance Analysis(cont.)
 PDW is faster than Hive in for all TPC-H queries
 The average speedup of PDW over Hive is greater for small datasets
     Hive has high overheads for small datasets.

Scalability Analysis
 Hive scales better than PDW
 Hive scales well as the dataset size increases.
MODERN OLTP WORKLOAD:
MONGODB VS SQL SERVER
Workload description




Extends YCSB into 2 ways:
 added support for multiple instances on many database servers
 Supports for Stored procedures in YCSB JBDC driver
ran the YCSB benchmark on a database that consists of 640 million records
MODERN OLTP WORKLOAD:
MONGODB VS SQL SERVER
Data Preparation
 Mongo-AS can automatically manage the shards by using a
  “balancer” process
 The loading time for SQL-CS and Mongo-CS was 146 and 45
   minutes respectively
 SQL load time take longer because a bulk insert method was not
  used
MODERN OLTP WORKLOAD:
MONGODB VS SQL SERVER
Experimental Evaluation


“Read-Only” workload
MODERN OLTP WORKLOAD:
MONGODB VS SQL SERVER

95% Read
5% Update Workload
MODERN OLTP WORKLOAD:
MONGODB VS SQL SERVER

50% Read &
50% Update workload
MODERN OLTP WORKLOAD:
MONGODB VS SQL SERVER

95% Read
5% Append Workload
DISCUSSION & CONCLUSION
 This evaluation shows that NoSQL systems are still behind RDBMS in
  performance.
 PDW is also 9 times faster than Hive running TPC-H at 16TB scale
   SQL-CS was able to achieve higher throughput than MongoDB
AUTHORS
 Avrilia Floratou
University of Wisconsin-Madison
 Nikhil Teletia
Microsoft Jim Gray Systems Lab
 David J. DeWitt
Microsoft Jim Gray Systems Lab
 Jignesh M. Patel
University of Wisconsin-Madison
 Donghui Zhang
Paradigm4

Weitere ähnliche Inhalte

Was ist angesagt?

HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardMatthew Blair
 
Data- How Does It Work-
Data- How Does It Work-Data- How Does It Work-
Data- How Does It Work-Boyang Niu
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeMichael Stack
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudShubham Tagra
 
MySQL Live Migration - Common Scenarios
MySQL Live Migration - Common ScenariosMySQL Live Migration - Common Scenarios
MySQL Live Migration - Common ScenariosMydbops
 
Hadoop Architecture in Depth
Hadoop Architecture in DepthHadoop Architecture in Depth
Hadoop Architecture in DepthSyed Hadoop
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloudAvinash Ramineni
 
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudUsing S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudDatabricks
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Icebergkbajda
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSqlOmid Vahdaty
 
Improve Presto Architectural Decisions with Shadow Cache
 Improve Presto Architectural Decisions with Shadow Cache Improve Presto Architectural Decisions with Shadow Cache
Improve Presto Architectural Decisions with Shadow CacheAlluxio, Inc.
 
Hybrid collaborative tiered storage with alluxio
Hybrid collaborative tiered storage with alluxioHybrid collaborative tiered storage with alluxio
Hybrid collaborative tiered storage with alluxioThai Bui
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio, Inc.
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesScyllaDB
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioAlluxio, Inc.
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case Kai Sasaki
 
Mongo presentation conf
Mongo presentation confMongo presentation conf
Mongo presentation confShridhar Joshi
 

Was ist angesagt? (20)

HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
 
Data- How Does It Work-
Data- How Does It Work-Data- How Does It Work-
Data- How Does It Work-
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
 
RubiX
RubiXRubiX
RubiX
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
 
MySQL Live Migration - Common Scenarios
MySQL Live Migration - Common ScenariosMySQL Live Migration - Common Scenarios
MySQL Live Migration - Common Scenarios
 
Hadoop Architecture in Depth
Hadoop Architecture in DepthHadoop Architecture in Depth
Hadoop Architecture in Depth
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloud
 
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudUsing S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
 
Improve Presto Architectural Decisions with Shadow Cache
 Improve Presto Architectural Decisions with Shadow Cache Improve Presto Architectural Decisions with Shadow Cache
Improve Presto Architectural Decisions with Shadow Cache
 
Hybrid collaborative tiered storage with alluxio
Hybrid collaborative tiered storage with alluxioHybrid collaborative tiered storage with alluxio
Hybrid collaborative tiered storage with alluxio
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking Databases
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
Mongo presentation conf
Mongo presentation confMongo presentation conf
Mongo presentation conf
 

Ähnlich wie Can the elephants handle the no sql onslaught

It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?Srihari Srinivasan
 
Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Ahmed Rashwan
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataNicolas Poggi
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackAndrew Brust
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Vinoth Chandar
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive Rupak Roy
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
Deploying Apache Spark and testing big data applications on servers powered b...
Deploying Apache Spark and testing big data applications on servers powered b...Deploying Apache Spark and testing big data applications on servers powered b...
Deploying Apache Spark and testing big data applications on servers powered b...Principled Technologies
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
 
Real-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopReal-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopContinuent
 
OSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage SystemOSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage SystemNETWAYS
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irdatastack
 

Ähnlich wie Can the elephants handle the no sql onslaught (20)

It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?
 
Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?
 
Hive
HiveHive
Hive
 
Apache drill
Apache drillApache drill
Apache drill
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big Data
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
מיכאל
מיכאלמיכאל
מיכאל
 
Deploying Apache Spark and testing big data applications on servers powered b...
Deploying Apache Spark and testing big data applications on servers powered b...Deploying Apache Spark and testing big data applications on servers powered b...
Deploying Apache Spark and testing big data applications on servers powered b...
 
Hive_Pig.pptx
Hive_Pig.pptxHive_Pig.pptx
Hive_Pig.pptx
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Nosql Introduction, Basics
Nosql Introduction, BasicsNosql Introduction, Basics
Nosql Introduction, Basics
 
Real-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopReal-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to Hadoop
 
OSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage SystemOSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage System
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 

Mehr von Aung Thu Rha Hein

Bioinformatics for Computer Scientists
Bioinformatics for Computer Scientists Bioinformatics for Computer Scientists
Bioinformatics for Computer Scientists Aung Thu Rha Hein
 
Analysis of hybrid image with FFT (Fast Fourier Transform)
Analysis of hybrid image with FFT (Fast Fourier Transform)Analysis of hybrid image with FFT (Fast Fourier Transform)
Analysis of hybrid image with FFT (Fast Fourier Transform)Aung Thu Rha Hein
 
Introduction to Common Weakness Enumeration (CWE)
Introduction to Common Weakness Enumeration (CWE)Introduction to Common Weakness Enumeration (CWE)
Introduction to Common Weakness Enumeration (CWE)Aung Thu Rha Hein
 
Private Browsing: A Window of Forensic Opportunity
Private Browsing: A Window of Forensic OpportunityPrivate Browsing: A Window of Forensic Opportunity
Private Browsing: A Window of Forensic OpportunityAung Thu Rha Hein
 
Digital Forensic: Brief Intro & Research Challenge
Digital Forensic: Brief Intro & Research ChallengeDigital Forensic: Brief Intro & Research Challenge
Digital Forensic: Brief Intro & Research ChallengeAung Thu Rha Hein
 
Survey & Review of Digital Forensic
Survey & Review of Digital ForensicSurvey & Review of Digital Forensic
Survey & Review of Digital ForensicAung Thu Rha Hein
 
Partitioned Based Regression Verification
Partitioned Based Regression VerificationPartitioned Based Regression Verification
Partitioned Based Regression VerificationAung Thu Rha Hein
 
CRAXweb: Automatic Exploit Generation for Web Applications
CRAXweb: Automatic Exploit Generation for Web ApplicationsCRAXweb: Automatic Exploit Generation for Web Applications
CRAXweb: Automatic Exploit Generation for Web ApplicationsAung Thu Rha Hein
 
Web application security: Threats & Countermeasures
Web application security: Threats & CountermeasuresWeb application security: Threats & Countermeasures
Web application security: Threats & CountermeasuresAung Thu Rha Hein
 
Fuzzy logic based students’ learning assessment
Fuzzy logic based students’ learning assessmentFuzzy logic based students’ learning assessment
Fuzzy logic based students’ learning assessmentAung Thu Rha Hein
 

Mehr von Aung Thu Rha Hein (19)

Writing with ease
Writing with easeWriting with ease
Writing with ease
 
Bioinformatics for Computer Scientists
Bioinformatics for Computer Scientists Bioinformatics for Computer Scientists
Bioinformatics for Computer Scientists
 
Analysis of hybrid image with FFT (Fast Fourier Transform)
Analysis of hybrid image with FFT (Fast Fourier Transform)Analysis of hybrid image with FFT (Fast Fourier Transform)
Analysis of hybrid image with FFT (Fast Fourier Transform)
 
Introduction to Common Weakness Enumeration (CWE)
Introduction to Common Weakness Enumeration (CWE)Introduction to Common Weakness Enumeration (CWE)
Introduction to Common Weakness Enumeration (CWE)
 
Private Browsing: A Window of Forensic Opportunity
Private Browsing: A Window of Forensic OpportunityPrivate Browsing: A Window of Forensic Opportunity
Private Browsing: A Window of Forensic Opportunity
 
Network switching
Network switchingNetwork switching
Network switching
 
Digital Forensic: Brief Intro & Research Challenge
Digital Forensic: Brief Intro & Research ChallengeDigital Forensic: Brief Intro & Research Challenge
Digital Forensic: Brief Intro & Research Challenge
 
Survey & Review of Digital Forensic
Survey & Review of Digital ForensicSurvey & Review of Digital Forensic
Survey & Review of Digital Forensic
 
Partitioned Based Regression Verification
Partitioned Based Regression VerificationPartitioned Based Regression Verification
Partitioned Based Regression Verification
 
CRAXweb: Automatic Exploit Generation for Web Applications
CRAXweb: Automatic Exploit Generation for Web ApplicationsCRAXweb: Automatic Exploit Generation for Web Applications
CRAXweb: Automatic Exploit Generation for Web Applications
 
Botnets 101
Botnets 101Botnets 101
Botnets 101
 
Session initiation protocol
Session initiation protocolSession initiation protocol
Session initiation protocol
 
TPC-H in MongoDB
TPC-H in MongoDBTPC-H in MongoDB
TPC-H in MongoDB
 
Web application security: Threats & Countermeasures
Web application security: Threats & CountermeasuresWeb application security: Threats & Countermeasures
Web application security: Threats & Countermeasures
 
Cloud computing security
Cloud computing securityCloud computing security
Cloud computing security
 
Fuzzy logic based students’ learning assessment
Fuzzy logic based students’ learning assessmentFuzzy logic based students’ learning assessment
Fuzzy logic based students’ learning assessment
 
Link state routing protocol
Link state routing protocolLink state routing protocol
Link state routing protocol
 
Chat bot analysis
Chat bot analysisChat bot analysis
Chat bot analysis
 
Data mining & column stores
Data mining & column storesData mining & column stores
Data mining & column stores
 

Kürzlich hochgeladen

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 

Kürzlich hochgeladen (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 

Can the elephants handle the no sql onslaught

  • 1. CAN THE ELEPHANTS HANDLE THE NO-SQL ONSLAUGHT? AUNG THU RHA HEIN G5537871
  • 2. OUTLINE  Introduction  Background  Evaluation  Traditional DSS Workload: Hive vs PDW  Modern OLTP Workload: MongoDB vs SQL Server  Discussion & Conclusion
  • 3. INTRODUCTION  Motivation How does the performance and scalability of RDBMs solutions compare to the NoSQL systems?  Proposition compare MongoDB(AS/CS) with SQL Server and Hive with SQL PWD, and analyze the performance and scalability aspects on two workloads (decision support analysis and interactive data-serving).  Use YCSB and TPC-H DSS benchmarks respectively
  • 4. BACKGROUND  Parallel Data Warehouse (PDW)  shared-nothing parallel database system built on top of SQL Server  multiple compute nodes, a single control node and other administrative service nodes.  Hive  an open-source data warehouse built on top of Hadoop  a structured data model for data that is stored in the Hadoop Distributed Filesystem (HDFS), and a SQL-like declarative query language called HiveQL
  • 5. BACKGROUND(CONT.)  MongoDB Features  a document-oriented storage layer, indexing in the form of B- trees, auto-sharding, asynchronous replication of data between servers.  Data stored in collections which contain documents  Each document is serialized using BSON For implementation, it is created two types of MongoDB servers:  MongoDB-CS (with client-side sharding )  MongoDB-AS (Auto-Sharding)
  • 6. EVALUATION  Make hardware and software configuration for all four systems  For PDW and Hive, use 8 disks to store the data  For YCSB benchmark, 8 nodes are used as servers and another 8 for client-benchmarks Hive and Hadoop  Use RCFile format to store data  All TPC-H tables are stored in Gzip RcCile format
  • 7. TRADITIONAL DSS WORKLOAD: HIVE VS PDW Workload Description  use TPC-H at 4 scale factors (250,500,1000,4000,16000 GBs)  TPC-H generator doesn’t produce correct result at 16000 scale  Executed all 22 TPC-H queries  But leave 2 TPC-H refresh functions
  • 8. TRADITIONAL DSS WORKLOAD: HIVE VS PDW Data Layout in Hive and PDW
  • 9. TRADITIONAL DSS WORKLOAD: HIVE VS PDW Data Preparation and Load Times Hive  Generated dataset across 16 nodes  Create one hive table for each TPC-H table  Data is loaded in 2 phases:  data files loaded onto each node  data is converted from text to RCfile format. PDW  Load data into landed node  Create necessary tables
  • 10. TRADITIONAL DSS WORKLOAD: HIVE VS PDW Performance Analysis
  • 11. TRADITIONAL DSS WORKLOAD: HIVE VS PDW Performance Analysis(cont.)  PDW is faster than Hive in for all TPC-H queries  The average speedup of PDW over Hive is greater for small datasets  Hive has high overheads for small datasets. Scalability Analysis  Hive scales better than PDW  Hive scales well as the dataset size increases.
  • 12. MODERN OLTP WORKLOAD: MONGODB VS SQL SERVER Workload description Extends YCSB into 2 ways:  added support for multiple instances on many database servers  Supports for Stored procedures in YCSB JBDC driver ran the YCSB benchmark on a database that consists of 640 million records
  • 13. MODERN OLTP WORKLOAD: MONGODB VS SQL SERVER Data Preparation  Mongo-AS can automatically manage the shards by using a “balancer” process  The loading time for SQL-CS and Mongo-CS was 146 and 45 minutes respectively  SQL load time take longer because a bulk insert method was not used
  • 14. MODERN OLTP WORKLOAD: MONGODB VS SQL SERVER Experimental Evaluation “Read-Only” workload
  • 15. MODERN OLTP WORKLOAD: MONGODB VS SQL SERVER 95% Read 5% Update Workload
  • 16. MODERN OLTP WORKLOAD: MONGODB VS SQL SERVER 50% Read & 50% Update workload
  • 17. MODERN OLTP WORKLOAD: MONGODB VS SQL SERVER 95% Read 5% Append Workload
  • 18. DISCUSSION & CONCLUSION  This evaluation shows that NoSQL systems are still behind RDBMS in performance.  PDW is also 9 times faster than Hive running TPC-H at 16TB scale  SQL-CS was able to achieve higher throughput than MongoDB
  • 19. AUTHORS  Avrilia Floratou University of Wisconsin-Madison  Nikhil Teletia Microsoft Jim Gray Systems Lab  David J. DeWitt Microsoft Jim Gray Systems Lab  Jignesh M. Patel University of Wisconsin-Madison  Donghui Zhang Paradigm4