SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Big Data Strategy
for the Relational World
Embracing Disruption, Avoiding Regression
Andrew J. Brust
Founder & CEO, Blue Badge Insights
Big Data correspondent, ZDNet
Big Data Analyst, GigaOM Research
Bio
• CEO and Founder, Blue Badge Insights
• Big Data blogger for ZDNet
• Microsoft Regional Director, MVP
• Co-chair, Visual Studio Live! and 18 years as a speaker
• Founder, Microsoft BI User Group of NYC
– http://www.msbinyc.com
• Co-moderator, NYC .NET Developers Group
– http://www.nycdotnetdev.com
• “Redmond Review” columnist for
Visual Studio Magazine and Redmond Developer News
• Twitter: @andrewbrust
Andrew on ZDNet (bit.ly/bigondata)
Read all about it!
Big Data: Why Should You Care?
• Because analytics (i.e. BI) has always been
important, but it was expensive and obscure
• Because the economics of processing and
storage make Big Data feasible
Big Data: Why Should You be
Cautious?
• Too many vendors; too much churn
• Designed for the lab, not for mainstream
business
• Immature technology and tooling
– Results in serious recruiting and dev costs
• So, you can’t ignore Big Data, but you can’t
just pursue with abandon, either.
– That’s hard!
Agenda
• Trends
• Technologies
– NoSQL
– Hadoop
– SQL Convergence
– NewSQL
– In-Memory
• Forecasts
• Risks
• Recommendations
Database Trends
• Mongo and Cassandra, primarilyNoSQL
• aka “unstructured data”Late-bound schema
• Especially HDFSFile-based table handling
• And Massively Parallel ProcessingColumnar storage
• Very few throwing them awayCo-existence with RDBMS, OLAP
databases
• Still expect tables or cubesLittle change in tools/clients
NoSQL
Key-Value
Store
• Couchbase
• Riak
• Redis
• Voldemort
• DynamoDB
• Azure tables
Document
Store
• MongoDB
• CouchDB
• Cloudant
• Couchbase
Wide Column
Store
• HBase
• Cassandra
Graph
Database
• Neo4J
SQLSQL
Consistency
• CAP Theorem
–Databases may only excel at two of the following
three attributes: consistency, availability and partition
tolerance
• NoSQL does not offer “ACID” guarantees
–Atomicity, consistency, isolation and durability
• Instead offers “eventual consistency”
–Similar to DNS propagation
CAP Theorem
Consistency
Availability
Partition
Tolerance
Relational
NoSQL
NoSQL Upside
• Distributed by default
• Open source lets you peg costs to personnel,
more than to customers
• Developer enthusiasm
Hadoop
• Open source, petabyte-scale data analysis and
processing framework
• Runs on commodity hardware
• Lots of ecosystem
• Two main components:
– Hadoop Distributed File System (HDFS)
– MapReduce engine
Hadoop
• Open source, petabyte-scale data analysis and
processing framework
• Runs on commodity hardware
• Lots of ecosystem
• Two main components:
– Hadoop Distributed File System (HDFS)
– MapReduce engine
Why MapReduce is Cool
• Extremely flexible – full power of a procedural
programming language
• Map step, essentially, allows ad hoc ETL
• With Reduce step, aggregation is a first-class
concept
• Growing ecosystem of tools that generate
MapReduce code
Why MapReduce Sucks
• It’s a batch mode technology
• It’s not declarative
• Most BI products don’t work with MR natively
– They connect via Hive instead (by and large)
• It’s good for a group of use cases, but it’s not a
good general framework
The Google DNA
• Hadoop and HBase came from Google
– MapReduce, GFS
– BigTable
• Hadoop was built for their use cases, and they
don’t use it as extensively now
• So why is the world going Hadoop-crazy?
Benefits of Schema-Free
• Variable schema is accommodated
– Great for product catalogs, content management
and the like
• Simple for archival storage
• For analysis:
– Avoids politics of achieving consensus on
structure
– Allows different schema for different applications
Cloud Effect
• Database as a service and SaaS BI/Analytics gets
companies excited
– Cloudant
– Amazon: DynamoDB, RDS, RedShift, Jaspersoft
• Elastic capabilities of cloud provide small customers
with access to huge clusters
– Amazon EMR, Microsoft Windows Azure HDInsight now
– Google Compute Engine, Rackspace/Hortonworks to come
• Cloud-borne reference data adds value
• But casualties emerging: e.g. Xeround
SQL Skillset and Ecosystem
• Making recruiting faster and cheaper
DBAs, most devs know it
ORMs expect it
• Even if they also talk to MDX and NoSQL sources
Reporting/analysis tools are premised on it
Companies are invested in it
Abandoning it is naive
MPP is Big Data
(via acquisition)
• Acquired Aster DataTeradata
• IBMNetezza
• HPVertica
• EMCPivotal/Greenplum
• ActianParAccel
• Microsoft-DATAllegro acquisitionSQL Server Parallel Data
Warehouse
SQL – BD Convergence
• Brings the SQL language and data warehouse
products, on one side, together with Hadoop, on
the other
• Goal is to make Hadoop interactive, non-batch
• May involve Hive and its APIs
• May involve direct access to HDFS
– Bypassing MapReduce
• Think of the “database” as HDFS, and MapReduce
as merely an access method.
One Repository, Multiple Access
Methods
HCatalog
Cloudera Impala (v1.0 shipped April 30)
Hortonworks “Stinger” initiative
•Make Hive 100x faster
EMC Pivotal
Microsoft PolyBase, Data Explorer
Teradata Aster SQL-H
ParAccel (Actian) ODI
SQL – BD Convergence
NuoDB
VoltDB
Clustrix
TransLattice
NewSQL Entrants
Dremel and Drill
• Dremel is Google’s column store analytical database
– Proprietary; available publicly as BigQuery
• Hierarchical/nested too
– Allows schema variance without anarchy
• “…scales to thousands of CPUs and petabytes of data,
and has thousands of users at Google.”
• Uses SQL, has growing BI tool support
• Petabyte scale
• Drill:Dremel as Hadoop:MapReduce+GFS
• And then there’s Spanner
In-Memory
• SAP HANA
– And Sybase IQ
• Data Warehouse Appliances
• VoltDB
• Oracle TimesTen
• IBM solidDB
– Also TM1 (in-memory OLAP)
• Coming: SQL Server’s “Hekaton” engine
The Truth About In-Memory
• Judicious use of in-memory database technology can
speed analytical queries
– Combine with columnar technology, rinse, repeat
• Can also eliminate need for deferred writes
• A RAM-only strategy like HANA’s seems impractical
• Keep in mind:
– SSD is memory too. It’s slower, but it’s memory.
– Conversely, L1, L2 and L3 cache is faster than RAM. Single
Instruction, Multiple Data (SIMD) makes things faster still.
• Hybrid approaches are most sensible
What’s Ahead?
• Consolidation! We can’t have this many vendors:
– Some will go out of business
– Some will get acquired
– A few will stay independent (but may merge with each
other)
• Hadoop recedes into the service layer
• NoSQL shakes out, matures, coexists
• NewSQL gets adopted or acquired
• In-memory becomes a standard option
Risks and Considerations
• Pick an esoteric database now and you may be
forced to migrate later
• SQL Server and Oracle could add features that
make the specialty products superfluous
– Or new products
• Conversely, NoSQL products may acquire
ACID-like features themselves
• More convergence
Recommendations
• NoSQL has its use cases. But it also has its
abuses.
• Look carefully at the number of customers
• Look also at how widely deployed the product
is within those customer companies
Recommendations
• If you haven’t looked seriously at Hadoop, do so.
But remember, it’s infrastructure.
• You can reach out to Big Data now, or you can
wait for it to reach out to you
– Cost/benefit of earlier adoption vs. late following
• For repeatable big problems, MapReduce works
well; for iterative query, “SQL” technologies are
much better
– akin to standard reports versus ad hoc queries
Parting Thoughts
• NoSQL and Big Data are disruptive
• You ignore them at your peril
• But if they can’t, ultimately, blend into current
technology environments then they’re
destined to fail
• You can embrace the change without being
sacrificed. Just watch your back.
Thank You!
• Email
• andrew.brust@bluebadgeinsights.com
• Blog:
• http://www.zdnet.com/blog/big-data
• Twitter
• @andrewbrust on twitter

Weitere ähnliche Inhalte

Was ist angesagt?

Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big DataAndrew Brust
 
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisCloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisAndrew Brust
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big DataAndrew Brust
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational DatabasesUdi Bauman
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big dataSteven Francia
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and HowBigBlueHat
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Managementsameerfaizan
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremRahul Jain
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureVenu Anuganti
 
A Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooA Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooAndrew Brust
 
Evolved BI with SQL Server 2012
Evolved BIwith SQL Server 2012Evolved BIwith SQL Server 2012
Evolved BI with SQL Server 2012Andrew Brust
 
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the EnterpriseSQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the EnterpriseAnita Luthra
 

Was ist angesagt? (20)

Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisCloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
RDBMS vs NoSQL
RDBMS vs NoSQLRDBMS vs NoSQL
RDBMS vs NoSQL
 
Rdbms vs. no sql
Rdbms vs. no sqlRdbms vs. no sql
Rdbms vs. no sql
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
A Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooA Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data Hullabaloo
 
Evolved BI with SQL Server 2012
Evolved BIwith SQL Server 2012Evolved BIwith SQL Server 2012
Evolved BI with SQL Server 2012
 
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the EnterpriseSQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
 

Ähnlich wie Big Data Strategy for the Relational World

Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketDremio Corporation
 
Big data and mstr bridge the elephant
Big data and mstr   bridge the elephantBig data and mstr   bridge the elephant
Big data and mstr bridge the elephantKognitio
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...BigDataEverywhere
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoopinside-BigData.com
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalVMware Tanzu Korea
 
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemZohar Elkayam
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 

Ähnlich wie Big Data Strategy for the Relational World (20)

50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Big data and mstr bridge the elephant
Big data and mstr   bridge the elephantBig data and mstr   bridge the elephant
Big data and mstr bridge the elephant
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Beyond TCO
Beyond TCOBeyond TCO
Beyond TCO
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from Pivotal
 
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 

Mehr von Andrew Brust

Azure ml screen grabs
Azure ml screen grabsAzure ml screen grabs
Azure ml screen grabsAndrew Brust
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackAndrew Brust
 
Hadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in ActionHadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in ActionAndrew Brust
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystemAndrew Brust
 
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012Andrew Brust
 
Power View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataPower View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataAndrew Brust
 
Grasping The LightSwitch Paradigm
Grasping The LightSwitch ParadigmGrasping The LightSwitch Paradigm
Grasping The LightSwitch ParadigmAndrew Brust
 
SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms Andrew Brust
 
Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis Andrew Brust
 

Mehr von Andrew Brust (9)

Azure ml screen grabs
Azure ml screen grabsAzure ml screen grabs
Azure ml screen grabs
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
 
Hadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in ActionHadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in Action
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
 
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
 
Power View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataPower View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s Data
 
Grasping The LightSwitch Paradigm
Grasping The LightSwitch ParadigmGrasping The LightSwitch Paradigm
Grasping The LightSwitch Paradigm
 
SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms
 
Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis
 

Kürzlich hochgeladen

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Big Data Strategy for the Relational World

  • 1. Big Data Strategy for the Relational World Embracing Disruption, Avoiding Regression Andrew J. Brust Founder & CEO, Blue Badge Insights Big Data correspondent, ZDNet Big Data Analyst, GigaOM Research
  • 2. Bio • CEO and Founder, Blue Badge Insights • Big Data blogger for ZDNet • Microsoft Regional Director, MVP • Co-chair, Visual Studio Live! and 18 years as a speaker • Founder, Microsoft BI User Group of NYC – http://www.msbinyc.com • Co-moderator, NYC .NET Developers Group – http://www.nycdotnetdev.com • “Redmond Review” columnist for Visual Studio Magazine and Redmond Developer News • Twitter: @andrewbrust
  • 3. Andrew on ZDNet (bit.ly/bigondata)
  • 5. Big Data: Why Should You Care? • Because analytics (i.e. BI) has always been important, but it was expensive and obscure • Because the economics of processing and storage make Big Data feasible
  • 6. Big Data: Why Should You be Cautious? • Too many vendors; too much churn • Designed for the lab, not for mainstream business • Immature technology and tooling – Results in serious recruiting and dev costs • So, you can’t ignore Big Data, but you can’t just pursue with abandon, either. – That’s hard!
  • 7. Agenda • Trends • Technologies – NoSQL – Hadoop – SQL Convergence – NewSQL – In-Memory • Forecasts • Risks • Recommendations
  • 8. Database Trends • Mongo and Cassandra, primarilyNoSQL • aka “unstructured data”Late-bound schema • Especially HDFSFile-based table handling • And Massively Parallel ProcessingColumnar storage • Very few throwing them awayCo-existence with RDBMS, OLAP databases • Still expect tables or cubesLittle change in tools/clients
  • 9. NoSQL Key-Value Store • Couchbase • Riak • Redis • Voldemort • DynamoDB • Azure tables Document Store • MongoDB • CouchDB • Cloudant • Couchbase Wide Column Store • HBase • Cassandra Graph Database • Neo4J SQLSQL
  • 10. Consistency • CAP Theorem –Databases may only excel at two of the following three attributes: consistency, availability and partition tolerance • NoSQL does not offer “ACID” guarantees –Atomicity, consistency, isolation and durability • Instead offers “eventual consistency” –Similar to DNS propagation
  • 12. NoSQL Upside • Distributed by default • Open source lets you peg costs to personnel, more than to customers • Developer enthusiasm
  • 13. Hadoop • Open source, petabyte-scale data analysis and processing framework • Runs on commodity hardware • Lots of ecosystem • Two main components: – Hadoop Distributed File System (HDFS) – MapReduce engine
  • 14. Hadoop • Open source, petabyte-scale data analysis and processing framework • Runs on commodity hardware • Lots of ecosystem • Two main components: – Hadoop Distributed File System (HDFS) – MapReduce engine
  • 15. Why MapReduce is Cool • Extremely flexible – full power of a procedural programming language • Map step, essentially, allows ad hoc ETL • With Reduce step, aggregation is a first-class concept • Growing ecosystem of tools that generate MapReduce code
  • 16. Why MapReduce Sucks • It’s a batch mode technology • It’s not declarative • Most BI products don’t work with MR natively – They connect via Hive instead (by and large) • It’s good for a group of use cases, but it’s not a good general framework
  • 17. The Google DNA • Hadoop and HBase came from Google – MapReduce, GFS – BigTable • Hadoop was built for their use cases, and they don’t use it as extensively now • So why is the world going Hadoop-crazy?
  • 18. Benefits of Schema-Free • Variable schema is accommodated – Great for product catalogs, content management and the like • Simple for archival storage • For analysis: – Avoids politics of achieving consensus on structure – Allows different schema for different applications
  • 19. Cloud Effect • Database as a service and SaaS BI/Analytics gets companies excited – Cloudant – Amazon: DynamoDB, RDS, RedShift, Jaspersoft • Elastic capabilities of cloud provide small customers with access to huge clusters – Amazon EMR, Microsoft Windows Azure HDInsight now – Google Compute Engine, Rackspace/Hortonworks to come • Cloud-borne reference data adds value • But casualties emerging: e.g. Xeround
  • 20. SQL Skillset and Ecosystem • Making recruiting faster and cheaper DBAs, most devs know it ORMs expect it • Even if they also talk to MDX and NoSQL sources Reporting/analysis tools are premised on it Companies are invested in it Abandoning it is naive
  • 21. MPP is Big Data (via acquisition) • Acquired Aster DataTeradata • IBMNetezza • HPVertica • EMCPivotal/Greenplum • ActianParAccel • Microsoft-DATAllegro acquisitionSQL Server Parallel Data Warehouse
  • 22. SQL – BD Convergence • Brings the SQL language and data warehouse products, on one side, together with Hadoop, on the other • Goal is to make Hadoop interactive, non-batch • May involve Hive and its APIs • May involve direct access to HDFS – Bypassing MapReduce • Think of the “database” as HDFS, and MapReduce as merely an access method.
  • 23. One Repository, Multiple Access Methods HCatalog
  • 24. Cloudera Impala (v1.0 shipped April 30) Hortonworks “Stinger” initiative •Make Hive 100x faster EMC Pivotal Microsoft PolyBase, Data Explorer Teradata Aster SQL-H ParAccel (Actian) ODI SQL – BD Convergence
  • 25.
  • 27. Dremel and Drill • Dremel is Google’s column store analytical database – Proprietary; available publicly as BigQuery • Hierarchical/nested too – Allows schema variance without anarchy • “…scales to thousands of CPUs and petabytes of data, and has thousands of users at Google.” • Uses SQL, has growing BI tool support • Petabyte scale • Drill:Dremel as Hadoop:MapReduce+GFS • And then there’s Spanner
  • 28. In-Memory • SAP HANA – And Sybase IQ • Data Warehouse Appliances • VoltDB • Oracle TimesTen • IBM solidDB – Also TM1 (in-memory OLAP) • Coming: SQL Server’s “Hekaton” engine
  • 29. The Truth About In-Memory • Judicious use of in-memory database technology can speed analytical queries – Combine with columnar technology, rinse, repeat • Can also eliminate need for deferred writes • A RAM-only strategy like HANA’s seems impractical • Keep in mind: – SSD is memory too. It’s slower, but it’s memory. – Conversely, L1, L2 and L3 cache is faster than RAM. Single Instruction, Multiple Data (SIMD) makes things faster still. • Hybrid approaches are most sensible
  • 30. What’s Ahead? • Consolidation! We can’t have this many vendors: – Some will go out of business – Some will get acquired – A few will stay independent (but may merge with each other) • Hadoop recedes into the service layer • NoSQL shakes out, matures, coexists • NewSQL gets adopted or acquired • In-memory becomes a standard option
  • 31. Risks and Considerations • Pick an esoteric database now and you may be forced to migrate later • SQL Server and Oracle could add features that make the specialty products superfluous – Or new products • Conversely, NoSQL products may acquire ACID-like features themselves • More convergence
  • 32. Recommendations • NoSQL has its use cases. But it also has its abuses. • Look carefully at the number of customers • Look also at how widely deployed the product is within those customer companies
  • 33. Recommendations • If you haven’t looked seriously at Hadoop, do so. But remember, it’s infrastructure. • You can reach out to Big Data now, or you can wait for it to reach out to you – Cost/benefit of earlier adoption vs. late following • For repeatable big problems, MapReduce works well; for iterative query, “SQL” technologies are much better – akin to standard reports versus ad hoc queries
  • 34. Parting Thoughts • NoSQL and Big Data are disruptive • You ignore them at your peril • But if they can’t, ultimately, blend into current technology environments then they’re destined to fail • You can embrace the change without being sacrificed. Just watch your back.
  • 35. Thank You! • Email • andrew.brust@bluebadgeinsights.com • Blog: • http://www.zdnet.com/blog/big-data • Twitter • @andrewbrust on twitter