SlideShare ist ein Scribd-Unternehmen logo
1 von 4
*
@daniel_abadi
Yale University
* The Big Data phenomenon is the best thing that
could have happened to the database
community
* Despite other definitions related to ‘3 Vs’ --Big Data means BIG Data

* Which means we need scalable database systems

* Still two main components of Big Data
* Performing data analysis at scale
* Performing requests on data at scale

*
* Database community has won the battle

* Some thought that MapReduce might replace

traditional database technology as the primary
means to perform analysis at scale
* Just about every MapReduce vendor has abandoned
this goal
* Hadapt, Impala, Tez, and several others are in a
race to see who can add the most traditional
database execution technology to Hadoop fastest
* Everyone is going in the direction of cost-based
optimizers, traditional database operators, and
push-based query execution

*
* The database community is losing the battle

* NoSQL systems still have very little traditional database
technology inside (despite adding SQL interfaces)
* No race to add DB technology --- why?

* Don’t blame CAP --- CAP is only relevant when there’s a
*

network partition
We never figured out how to do ACID and active replication at
scale

*

Many new proposals make simplifying assumptions in order to
handle scale

* It’s been 30 years ---- why can’t we build a distributed

database that can handle distributed transactions over
actively replicated data at scale?

*

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overviewharithakannan
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Mahantesh Angadi
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
 
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringBADR
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
Cloud Computing: Hadoop
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoopdarugar
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyonddatasalt
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceeakasit_dpu
 

Was ist angesagt? (20)

Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data Engineering
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Cloud Computing: Hadoop
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoop
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop technology doc
Hadoop technology docHadoop technology doc
Hadoop technology doc
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 

Andere mochten auch

Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs Daniel Abadi
 
Consistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System DesignConsistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System DesignArinto Murdopo
 
VLDB 2009 Tutorial on Column-Stores
VLDB 2009 Tutorial on Column-StoresVLDB 2009 Tutorial on Column-Stores
VLDB 2009 Tutorial on Column-StoresDaniel Abadi
 
The Power of Determinism in Database Systems
The Power of Determinism in Database SystemsThe Power of Determinism in Database Systems
The Power of Determinism in Database SystemsDaniel Abadi
 
CAP, PACELC, and Determinism
CAP, PACELC, and DeterminismCAP, PACELC, and Determinism
CAP, PACELC, and DeterminismDaniel Abadi
 
Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Daniel Abadi
 

Andere mochten auch (7)

Invisible loading
Invisible loadingInvisible loading
Invisible loading
 
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
 
Consistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System DesignConsistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System Design
 
VLDB 2009 Tutorial on Column-Stores
VLDB 2009 Tutorial on Column-StoresVLDB 2009 Tutorial on Column-Stores
VLDB 2009 Tutorial on Column-Stores
 
The Power of Determinism in Database Systems
The Power of Determinism in Database SystemsThe Power of Determinism in Database Systems
The Power of Determinism in Database Systems
 
CAP, PACELC, and Determinism
CAP, PACELC, and DeterminismCAP, PACELC, and Determinism
CAP, PACELC, and Determinism
 
Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?
 

Ähnlich wie Big Data and the Database Community

Information processing architectures
Information processing architecturesInformation processing architectures
Information processing architecturesRaji Gogulapati
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
 
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET Journal
 
Big data management
Big data managementBig data management
Big data managementzeba khanam
 
Using BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined NetworkingUsing BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined NetworkingIJCSIS Research Publications
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemstaimur hafeez
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL TechnologiesAmit Singh
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentationKlawal13
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Denodo
 
Jax 2013 - Big Data and Personalised Medicine
Jax 2013 - Big Data and Personalised MedicineJax 2013 - Big Data and Personalised Medicine
Jax 2013 - Big Data and Personalised MedicineGaurav Kaul
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniquesijsrd.com
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar ReportAtul Kushwaha
 

Ähnlich wie Big Data and the Database Community (20)

Information processing architectures
Information processing architecturesInformation processing architectures
Information processing architectures
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articles
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
Big data management
Big data managementBig data management
Big data management
 
Using BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined NetworkingUsing BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined Networking
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystems
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
Big data edel
Big data edelBig data edel
Big data edel
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentation
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Jax 2013 - Big Data and Personalised Medicine
Jax 2013 - Big Data and Personalised MedicineJax 2013 - Big Data and Personalised Medicine
Jax 2013 - Big Data and Personalised Medicine
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 

Big Data and the Database Community

  • 2. * The Big Data phenomenon is the best thing that could have happened to the database community * Despite other definitions related to ‘3 Vs’ --Big Data means BIG Data * Which means we need scalable database systems * Still two main components of Big Data * Performing data analysis at scale * Performing requests on data at scale *
  • 3. * Database community has won the battle * Some thought that MapReduce might replace traditional database technology as the primary means to perform analysis at scale * Just about every MapReduce vendor has abandoned this goal * Hadapt, Impala, Tez, and several others are in a race to see who can add the most traditional database execution technology to Hadoop fastest * Everyone is going in the direction of cost-based optimizers, traditional database operators, and push-based query execution *
  • 4. * The database community is losing the battle * NoSQL systems still have very little traditional database technology inside (despite adding SQL interfaces) * No race to add DB technology --- why? * Don’t blame CAP --- CAP is only relevant when there’s a * network partition We never figured out how to do ACID and active replication at scale * Many new proposals make simplifying assumptions in order to handle scale * It’s been 30 years ---- why can’t we build a distributed database that can handle distributed transactions over actively replicated data at scale? *