SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Selecting the Right SQL-on-Hadoop Solution: 
What You Need to Know 
© 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 1
Rick F. van der Lans 
Rick F. van der Lans is an independent consultant, lecturer, and author. He 
specializes in data warehousing, business intelligence, database technology, 
and data virtualization. He is managing director of R20/Consultancy B.V.. Rick 
has been involved in various projects in which data warehousing, and 
integration technology was applied. 
Rick van der Lans is an internationally acclaimed lecturer. He has lectured 
professionally for the last twenty five years in many of the European and 
Middle East countries, the USA, South America, and in Australia. He has been 
invited by several major software vendors to present keynote speeches. 
He is the author of several books on computing, including his new Data 
Virtualization for Business Intelligence Systems. Some of these books are 
available in different languages. Books such as the popular Introduction to 
SQL is available in English, Dutch, Italian, Chinese, and German and is sold 
world wide. He also authored The SQL Guide to Ingres and SQL for MySQL 
Developers. 
As author for BeyeNetwork.com, writer of whitepapers, chairman for the 
annual European Enterprise Data and Business Intelligence Conference, and 
as columnist for a few IT magazines, he has close contacts with many 
vendors. 
R20/Consultancy B.V. is located in The Hague, The Netherlands, www.r20.nl. You can get in touch with Rick via: 
Email: rick@r20.nl 
Twitter: @Rick_vanderlans 
LinkedIn: http://www.linkedin.com/pub/rick-van-der-lans/9/207/223 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 2
Self-Service Data Exploration with Apache Drill 
© 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 3
The MapR Distribution including Apache Hadoop 
© 2014 MapR Technologies 4 
Top Ranked Exponential 
Growth 
500+ 
Customers 
Premier 
Investors 
>2x annual bookings 
90% software licenses 
80% of accounts expand 3X 
< 1% lifetime churn 
> $1B in incremental revenue 
generated by 1 customer
The Power of the Open Source Community 
Provisioning 
& 
coordination 
Savannah* 
Workflow 
& Data 
Governance 
Data 
Integration 
& Access 
Hue 
HttpFS 
Flume Knox* Falcon* Whirr 
MapR-FS MapR-DB 
© 2014 MapR Technologies 5 
Management 
APACHE HADOOP AND OSS ECOSYSTEM 
Streaming 
Storm* 
NoSQL & 
Search 
Solr 
MapR Data Platform 
Security 
SQL 
Drill 
Shark 
Impala 
YARN 
Batch 
Spark 
Cascading 
Pig 
Spark 
Streaming 
HBase 
Juju 
ML, Graph 
GraphX 
MLLib 
Mahout 
MapReduce 
v1 & v2 
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS 
Tez* 
Accumulo* 
Hive 
Sqoop Sentry* Oozie ZooKeeper 
* Certification/support planned for 2014
Today’s Data Comes in Different Shapes… 
© 2014 MapR Technologies 6 
Social Media 
Messages 
Audio 
Sensors 
Mobile Data 
Email 
Clickstream
Real-World Data Modeling and Transformations 
© 2014 MapR Technologies 7
© 2014 MapR Technologies 8
Evolution Towards Self-Service Data Exploration 
© 2014 MapR Technologies 9 
Data Modeling and 
Transformation 
Data Visualization 
IT-driven 
IT-driven 
IT-driven 
Self-service 
IT-driven 
Self-service 
Not needed 
Self-service 
Traditional BI 
w/ RDBMS 
Self-Service BI 
w/ RDBMS SQL-on-Hadoop 
Self-Service 
Data Exploration 
Zero-day analytics
Improve time to value Redu2ce the burden on IT 
© 2014 MapR Technologies 10 
Why Decrease the Distance to Data? 
• Enable rapid data exploration and 
application development 
• IT should provide a valuable 
service without “getting in the way” 
• Can’t add DBAs to keep up with 
the exponential data growth 
• Minimize “unnecessary work” so IT 
can focus on value-added 
activities and become a partner to 
the business users
• Pioneering Data Agility for Hadoop 
• Apache open source project 
• Scale-out execution engine for low-latency queries 
• Unified SQL-based API for analytics & operational applications 
© 2014 MapR Technologies 11 
APACHE DRILL 
40+ contributors 
150+ years of experience building 
databases and distributed systems
Drill Supports Schema Discovery On-The-Fly 
Schema Declared In Advance Schema2 Discovered On-The-Fly 
© 2014 MapR Technologies 12 
• Fixed schema 
• Leverage schema in centralized 
repository (Hive Metastore) 
• Fixed schema, evolving schema or 
schema-less 
• Leverage schema in centralized 
repository or self-describing data 
SCHEMA ON 
WRITE 
SCHEMA 
BEFORE READ 
SCHEMA ON THE 
FLY
Optimized Data Architecture Machine Learning 
© 2014 MapR Technologies 13 
MapR Optimized Data Architecture 
Sources 
RELATIONAL, 
SAAS, 
MAINFRAME 
DOCUMENTS, 
EMAILS 
BLOGS, 
TWEETS, 
LINK DATA 
LOG FILES, 
CLICKSTREAMS 
SENSORS 
Streaming 
(Spark Streaming, Storm) 
Batch / Search 
(MR, Spark, Hive, Pig, …) 
NoSQL ODBMS 
(HBase, Accumulo, …) 
MapR Data Platform 
MapR-DB 
MAPR DISTRIBUTION FOR HADOOP 
MapR-FS 
MAPR DISTRIBUTION FOR HADOOP 
DATA WAREHOUSE 
Data Movement 
Data Access 
Analytics 
Search 
Schema-less 
data exploration 
BI, reporting 
Ad-hoc integrated 
analytics 
Data Transformation, Enrichment 
and Integration 
Operational Apps 
Recommendations 
Fraud Detection 
Logistics
© 2014 MapR Technologies 14 
(1) Self-Describing Data is Ubiquitous 
Flat files in DFS 
• Complex data (Thrift, Avro, protobuf) 
• Columnar data (Parquet, ORC) 
• Loosely defined (JSON) 
• Traditional files (CSV, TSV) 
Data stored in NoSQL stores 
• Relational-like (rows, columns) 
• Sparse data (NoSQL maps) 
• Embedded blobs (JSON) 
• Document stores (nested objects) 
{ 
name: { 
first: Michael, 
last: Smith 
}, 
hobbies: [ski, soccer], 
district: Los Altos 
}{ 
name: { 
first: Jennifer, 
last: Gates 
}, 
hobbies: [sing], 
preschool: CCLC 
}
RDBMS/SQL-on-Hadoop table 
Apache Drill table 
© 2014 MapR Technologies 15 
(2) Drill’s Data Model is Flexible 
Fixed schema Schema-less 
HBase 
JSON 
BSON 
CSV 
TSV 
Parquet 
Avro 
Flat 
Complex 
Flexibility 
Flexibility 
Name Gender Age 
Michael M 6 
Jennifer F 3 
{ 
name: { 
first: Michael, 
last: Smith 
}, 
hobbies: [ski, soccer], 
district: Los Altos 
}{ 
name: { 
first: Jennifer, 
last: Gates 
}, 
hobbies: [sing], 
preschool: CCLC 
}
Quick Tour 
Self-Service Data Exploration with Apache Drill 
© ©20 21041 M4 aMpaRp RTe Tcehcnhonloogloiegsies 16
Zero to Results in 2 Minutes (3 Commands) 
$ tar xzf apache-drill.tar.gz 
$ apache-drill/bin/sqlline -u jdbc:drill:zk=local 
0: jdbc:drill:zk=local> 
SELECT count(*) AS incidents, columns[1] AS category 
FROM dfs.`/tmp/SFPD_Incidents_-_Previous_Three_Months.csv` 
GROUP BY columns[1] 
ORDER BY incidents DESC; 
+------------+------------+ 
| incidents | category | 
+------------+------------+ 
| 8372 | LARCENY/THEFT | 
| 4247 | OTHER OFFENSES | 
| 3765 | NON-CRIMINAL | 
| 2502 | ASSAULT | 
... 
35 rows selected (0.847 seconds) 
Install 
Launch shell 
(embedded 
mode) 
Query 
Results 
© 2014 MapR Technologies 17
© 2014 MapR Technologies 18 
A storage engine instance 
- DFS 
- HBase 
- Hive Metastore/HCatalog 
A workspace 
- Sub-directory 
- Hive database 
A table 
- pathnames 
- HBase table 
- Hive table 
Data Source is in the Query 
SELECT timestamp, message 
FROM dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet` 
WHERE errorLevel > 2
© 2014 MapR Technologies 19 
Query Directory Trees 
# Query file: How many errors per level in Jan 2014? 
SELECT errorLevel, count(*) 
FROM dfs.logs.`/AppServerLogs/2014/Jan/part0001.parquet` 
GROUP BY errorLevel; 
# Query directory sub-tree: How many errors per level? 
SELECT errorLevel, count(*) 
FROM dfs.logs.`/AppServerLogs` 
GROUP BY errorLevel; 
# Query some partitions: How many errors per level by month from 2012? 
SELECT errorLevel, count(*) 
FROM dfs.logs.`/AppServerLogs` 
WHERE dirs[1] >= 2012 
GROUP BY errorLevel, dirs[2];
Works with HBase and Embedded Blobs 
# Query an HBase table directly (no schemas) 
SELECT cf1.month, cf1.year 
FROM hbase.table1; 
# Embedded JSON value inside column profileBlob inside column family cf1 of 
the HBase table users 
SELECT profile.name, count(profile.children) 
FROM ( 
SELECT CONVERT_FROM(cf1.profileBlob, 'json') AS profile 
FROM hbase.users 
) 
© 2014 MapR Technologies 20
© 2014 MapR Technologies 21 
Combine Data Sources on the Fly 
# Join log directory with JSON file (user profiles) to identify the 
name and email address for anyone associated with an error message. 
SELECT DISTINCT users.name, users.emails.work 
FROM dfs.logs.`/data/logs` logs, 
dfs.users.`/profiles.json` users 
WHERE logs.uid = users.id AND 
logs.errorLevel > 5; 
# Join a Hive table and an HBase table (without Hive metadata) 
to determine the number of tweets per user 
SELECT users.name, count(*) as tweetCount 
FROM hive.social.tweets tweets, 
hbase.users users 
WHERE tweets.userId = convert_from(users.rowkey, 'UTF-8') 
GROUP BY tweets.userId;
Data Exploration Simple SQL-on-Hadoop (schema) Advanced SQL & Analytics 
© 2014 MapR Technologies 22 
SQL Technologies Available on MapR 
Drill 0.5 Hive 0.13 w/ Tez Impala 1.x Shark 0.9 Vertica 
Latency Low Medium Low Low for in-memory) 
Med for on disk 
Low 
Files Yes (all Hive formats) Yes (all Hive file formats) Yes (Parquet, Sequence, 
…) 
Yes (all Hive file formats) Proprietary 
All Hive file formats can 
be used as external 
tables 
HBase/MapR-DB Yes Yes, Performance issues Yes, performance issues Yes, Performance issues No 
Hive compatibility High High Medium High NA 
Schema Schema-less or Hive or 
Hbase 
Hive Hive Hive Proprietary or Hive 
SQL support ANSI SQL HiveQL HiveQL (subset) HiveQL ANSI SQL + advanced 
analytics 
Client support ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC, ADO.NET, 
… 
Large datasets Yes Yes Limited Yes Yes 
Nested data Yes Limited No Limited Limited 
Machine learning No No No Yes No 
Transactions No No No No Yes 
Optimizer Limited Limited Limited Limited High 
Concurrency Medium Medium Medium Limited High
© 2014 MapR Technologies 23 
Q& A Engage with us! 
• SQL-on-Hadoop engines explained 
http://info.mapr.com/wp-sql-on-hadoop-engines-explained 
• Get demo and tutorials on Apache Drill 
– https://www.mapr.com/products/apache-drill 
• Apache Drill 0.5 available now 
– Download and play: http://incubator.apache.org/drill/ 
– Ask questions: drill-user@incubator.apache.org 
– Contribute: http://github.com/apache/incubator-drill/ 
@rick_vanderlans – Rick van der Lans 
@swooledge – Steve Wooledge 
• Contact / follow us
Copyright © 1991 - 2014 R20/Consultancy B.V., 
The Hague, The Netherlands. All rights 
reserved. No part of this material may be 
reproduced, stored in a retrieval system, or 
transmitted in any form or by any means, 
electronic, mechanical, photographic, or 
otherwise, without the explicit written permission 
of the copyright owners. 
SQL‐on‐Hadoop 
Explained 
by 
Rick F. van der Lans 
R20/Consultancy BV 
Twitter @rick_vanderlans 
www.r20.nl
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 2
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 3
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 4
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 5
It’s All About Analytics 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 6
Requirements for Data Storage Technology 
High data storage scalability 
High data processing scalability 
High performance 
Low price/performance ratio 
All data types 
High schema flexibility 
Fast loading 
Enterprise-grade 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 7
Comparison Data Storage Technologies 
Hadoop Classic SQL DB 
High data storage scalability Yes Less 
High data processing 
scalability 
Yes Less 
High performance Yes Less 
Low price/performance ratio Yes No 
All data types Yes Most data types 
High schema flexibility Depends No 
Fast loading Yes Yes 
Enterprise-grade Depends Yes 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 8
Manipulating Hadoop Data 
Apache HBase 
API 
Apache HBase 
Apache HDFS 
API 
Apache HDFS 
Apache 
MapReduce API 
Apache 
MapReduce 
Apache HDFS 
API 
Apache HDFS 
Apache HDFS 
API 
Apache HDFS 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 9
Performance Dominates in Hadoop 
Productivity 
Maintainability 
Performance Time-to-market 
Scalability 
Availability 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 10
The Need for SQL‐on‐Hadoop 
Add high productivity and maintainability, while 
retaining high performance and scalability 
Advantages SQL-on-Hadoop 
• Well-known database language (especially in the BI 
community) 
• Large target audience 
• High productivity and maintainability 
• Openness to many reporting and analytical tools 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 11
Productivity is as Important 
Productivity 
Maintainability 
Time-to-market 
Performance 
Scalability 
Availability 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 12
Different Solutions 
Apache HiveQL 
Apache Hive 
Apache 
MapReduce API 
Apache 
MapReduce 
Apache HDFS 
API 
Apache HDFS 
A SQL Dialect 
SQL‐on‐Hadoop 
Apache HBase 
API 
Apache HBase 
Apache HDFS 
API 
Apache HDFS 
A SQL Dialect 
SQL‐on‐Hadoop 
Apache HDFS 
API 
Any HDFS 
A SQL Dialect 
SQL‐on‐Hadoop 
Apache HDFS 
API 
Any HDFS 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 13
Not all SQL‐on‐Hadoop Engines are Created Equal 
Batch-oriented query environment 
(data mining) 
Interactive query environment 
(OLAP, self-service BI, data 
visualization) 
Point-queries (retrieving individual 
objects) 
Investigative analytics (data science) 
Operational intelligence (real-time 
analytics) 
Transactional (production systems) 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 14
Technological Challenges 
Non-SQL-to-SQL 
transformational changes 
• Nested data 
• Variable data 
• Schema-less data 
• Self-describing data 
Architectural Challenges 
• Managing concurrent queries/users 
• Parallel execution of complex 
operations 
• Running complex analytical functions 
• Cost-based optimization 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 15
Is All Data Relational Data? 
 create table 
 insert data 
SQL-on- 
Hadoop 
HDFS 
 select data 
 insert data 
 select data 
SQL-on- 
Hadoop 
Other 
application 
HDFS 
 create file 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 16
Transforming Nested Data (1) 
CUSTOMER_ID LAST_NAME FIRST_NAME CUSTOMER_ORDERS 
75295 Sylvian David 
CUSTOMER_ORDER_ID ORDER_TIMESTAMP 
203699 2008-01-16 
306892 2008-07-21 
477047 2008-12-09 
103819 Scaggs Boz 
CUSTOMER_ORDER_ID ORDER_TIMESTAMP 
70675 2008-10-19 
530223 2008-12-01 
132171 Rundgren Todd 
CUSTOMER_ORDER_ID ORDER_TIMESTAMP 
210220 2008-04-21 
485584 2008-10-14 
718579 2008-11-23 
741912 2008-12-24 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 17
Transforming Nested Data (2) 
Alternative 1: 
CUSTOMER_ID LAST_NAME FIRST_NAME CUSTOMER_ORDERS 
75295 Sylvian David {203699,2008-01-16},{306892,2008—07-21}, 
103819 Scaggs Boz {70675,2008-10-19},{530223,2008—12-01} 
132171 Rundgren Todd {210220,2008-04-21},{485584,2008—10-14}, 
Alternative 2: 
{477047,2008-12-09} 
{718579,2008-12-24} 
CUSTOMER_ID CUSTOMER_ORDER_ID ORDER_TIMESTAMP LAST_NAME FIRST_NAME 
75295 203699 2008-01-16 Sylvian David 
75295 306892 2008-07-21 Sylvian David 
75295 477047 2008-12-09 Sylvian David 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 18
Transforming Variable Data (1) 
Example 1: 
CUSTOMER_ID CUSTOMER_ORDER_ID ORDER_TIMESTAMP 
75295 203699 2008-01-16 
75295 306892 2008-07-21 
75295 477047 2008-12-09 
CUSTOMER_ID CUSTOMER_ORDER_ID ORDER_TIMESTAMP ORDER_PROCESSED 
463281 203643 2008-01-16 2008-01-20 
CUSTOMER_ID CUSTOMER_ORDER_ID ORDER_TIMESTAMP ORDER_CANCELLED 
463246 285825 2008-01-19 2008-10-20 
Example 2: 
……………… 
CUSTOMER_ID CUSTOMER_NAME TELEPHONE_NUMBERS 
463246 O’Keefe {5157818, 2362436} 
463249 Zappa {1234567, 3262836, 4374777} 
463350 Donahue {3854757} 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 19
Transforming Variable Data (2) 
Alternative 1: 
CUSTOMER_ID CUSTOMER_NAME TELEPHONE_1 TELEPHONE_2 TELEPHONE_3 
463246 O’Keefe 5157818 2362436 ? 
463249 Zappa 1234567 3262836 4374777 
463350 Donahue 3854757 ? ? 
Alternative 2: 
CUSTOMER_ID CUSTOMER_NAME TELEPHONE_NUMBER 
463246 O’Keefe 5157818 
463246 O’Keefe 2362436 
463249 Zappa 1234567 
463249 Zappa 3262836 
463249 Zappa 4374777 
463350 Donahue 3854757 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 20
Transforming Schema‐Less Data 
Weblog record 
datestamp ip request 6/1/2012 11:10:19 AM 107.1.187.170 GET 
/x.php?u=http://studio-5.financialcontent.com/synacor?Page=QUOTE&Ticke 
HTTP/1.1 6/1/2012 5:53:49 AM 107.1.2.180 GET /tv/3/player/vendor/Chef% 
/player/fiveminute/content/steak/asset/gnrc_15879500 HTTP/1.1 6/1/2012 
107.34.51.63 GET /tv/3/search/content/The%20Andy%20Griffith%20Show/s/T 
Andy%20Griffith%20Show HTTP/1.1 6/1/2012 3:12:43 PM 107.5.115.117 GET 
/tv/3/search/content/Kathie%20Lee%20Gifford's%20epic%20'Today'%20gaffe 
%20Lee%20Gifford's%20epic%20'Today'%20gaffe HTTP/1.1 6/1/2012 4:48:35 
108.225.132.245 GET /tv/3/search/content/Deadliest%20Catch/s/Deadliest 
HTTP/1.1 6/1/2012 10:25:12 AM 108.246.20.125 GET /x.php?u=http://studi 
5.financialcontent.com/synacor?Page=QUOTE&Ticker=DJ:DJI HTTP/1.1 
6/1/2012 1:58:14 AM 108.246.25.117 GET /tv/3/player/vendor/Chef%20Tips 
/fiveminute/content/steak/asset/gnrc_15879500 HTTP/1.1 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 21
Transforming Self‐Describing Data 
ID VALUE 
75295 { “employee” : { 
“number” : “6”, 
“name” : “Manzarek”, 
“initials”: “R”, 
“street ”: “Haseltine Lane”} 
} 
103819 { “employee” : { 
“number” : “7”, 
“name” : “Metheny”, 
“initials”: “P”, 
“street” : “Brownstreet”} 
} 
132171 { “employee” : { 
“number” : “15”, 
“name” : “Metheny”, 
“initials”: “M”} 
} 
ID EMPLOYEE_NUMBER EMPLOYEE_NAME EMPLOYEE_INITIALS EMPLOYEE_STREET 
75295 6 Manzarek R Haseltine Lane 
103819 7 Metheny P Brownstreet 
132171 15 Metheny M ? 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 22
Architectural Challenges 
Managing concurrent queries/users 
Parallel execution of complex 
operations 
Running complex analytical functions 
Cost-based optimization 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 23
Use Cases of SQL‐on‐Hadoop 
Traditional Interactive Reporting and 
Analytics 
Self-Service Business Intelligence 
Batch Reporting 
Point Queries 
Operational Processing 
Investigative Analytics 
Data Stream Processing 
Storage Cold Data Warehouse Data 
Storage of External Data 
Fast Staging Area 
ETL (Pre)Processing Platform 
New Use Cases and Non-Relational Data 
… 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 24
Watch out for Big Data Silos! 
batch 
processing 
investigative 
analytics 
point 
queries 
operational 
processing 
interactive 
reporting 
data stream 
analytics processing 
Silo 1 Silo 2 Silo 3 Silo 4 Silo 5 Silo 6 Silo 7 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 25
The Integration Labyrinth 
analytics processing 
dedicated 
integration 
solution 
batch 
processing 
dedicated 
integration 
solution 
point 
queries 
dedicated 
integration 
solution 
interactive 
reporting 
dedicated 
integration 
solution 
operational 
processing 
dedicated 
integration 
solution 
investigative 
analytics 
dedicated 
integration 
solution 
data stream 
dedicated 
integration 
solution 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 26
One Platform to Rule Them All 
batch 
processing 
investigative 
analytics 
point 
queries 
operational 
processing 
interactive 
reporting 
data stream 
analytics processing 
One Data Management Platform 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 27
Closing Remarks 
SQL offers standardization 
and independency 
SQL increases productivity 
and eases maintenance 
Many SQL-on-Hadoop engines 
available 
One platform 
Being able to process all types 
of data is important 
Productivity 
Maintainability 
Time-to-market 
Performance 
Scalability 
Availability 
Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 28

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNDataWorks Summit
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoophadooparchbook
 
Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2DataWorks Summit
 
Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)Hortonworks
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
The Future of Hadoop Security
The Future of Hadoop SecurityThe Future of Hadoop Security
The Future of Hadoop SecurityDataWorks Summit
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeMapR Technologies
 
Azure_Business_Opportunity
Azure_Business_OpportunityAzure_Business_Opportunity
Azure_Business_OpportunityNojan Emad
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1Adam Muise
 
Data Regions: Modernizing your company's data ecosystem
Data Regions: Modernizing your company's data ecosystemData Regions: Modernizing your company's data ecosystem
Data Regions: Modernizing your company's data ecosystemDataWorks Summit/Hadoop Summit
 

Was ist angesagt? (20)

Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 
Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
The Future of Hadoop Security
The Future of Hadoop SecurityThe Future of Hadoop Security
The Future of Hadoop Security
 
NoSQL Needs SomeSQL
NoSQL Needs SomeSQLNoSQL Needs SomeSQL
NoSQL Needs SomeSQL
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
Azure_Business_Opportunity
Azure_Business_OpportunityAzure_Business_Opportunity
Azure_Business_Opportunity
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1
 
What's new in Ambari
What's new in AmbariWhat's new in Ambari
What's new in Ambari
 
Data Regions: Modernizing your company's data ecosystem
Data Regions: Modernizing your company's data ecosystemData Regions: Modernizing your company's data ecosystem
Data Regions: Modernizing your company's data ecosystem
 

Andere mochten auch

Securing Hadoop - MapR Technologies
Securing Hadoop - MapR TechnologiesSecuring Hadoop - MapR Technologies
Securing Hadoop - MapR TechnologiesMapR Technologies
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Technologies
 
Insight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationInsight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationMapR Technologies
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0MapR Technologies
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Technologies
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureMapR Technologies
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise ArchitectureMapR Technologies
 

Andere mochten auch (10)

Securing Hadoop - MapR Technologies
Securing Hadoop - MapR TechnologiesSecuring Hadoop - MapR Technologies
Securing Hadoop - MapR Technologies
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
 
MapR 5.2 Product Update
MapR 5.2 Product UpdateMapR 5.2 Product Update
MapR 5.2 Product Update
 
Insight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationInsight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital Transformation
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data Platform
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data Architecture
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise Architecture
 

Ähnlich wie Webinar: Selecting the Right SQL-on-Hadoop Solution

Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop BigDataEverywhere
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranMapR Technologies
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA
 
Hadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataHadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataSenturus
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeDataWorks Summit
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drillJulien Le Dem
 
Self-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache DrillSelf-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache DrillMapR Technologies
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalHortonworks
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudHortonworks
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Dataconomy Media
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Mats Uddenfeldt
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialRoxycodone Online
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Rajan Kanitkar
 

Ähnlich wie Webinar: Selecting the Right SQL-on-Hadoop Solution (20)

2014 08-20-pit-hug
2014 08-20-pit-hug2014 08-20-pit-hug
2014 08-20-pit-hug
 
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Hadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataHadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big Data
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Sureh hadoop 3 years t
Sureh hadoop 3 years tSureh hadoop 3 years t
Sureh hadoop 3 years t
 
Self-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache DrillSelf-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache Drill
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Robin_Hadoop
Robin_HadoopRobin_Hadoop
Robin_Hadoop
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study Material
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 

Mehr von MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications MapR Technologies
 

Mehr von MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
 

Kürzlich hochgeladen

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Kürzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

Webinar: Selecting the Right SQL-on-Hadoop Solution

  • 1. Selecting the Right SQL-on-Hadoop Solution: What You Need to Know © 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 1
  • 2. Rick F. van der Lans Rick F. van der Lans is an independent consultant, lecturer, and author. He specializes in data warehousing, business intelligence, database technology, and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects in which data warehousing, and integration technology was applied. Rick van der Lans is an internationally acclaimed lecturer. He has lectured professionally for the last twenty five years in many of the European and Middle East countries, the USA, South America, and in Australia. He has been invited by several major software vendors to present keynote speeches. He is the author of several books on computing, including his new Data Virtualization for Business Intelligence Systems. Some of these books are available in different languages. Books such as the popular Introduction to SQL is available in English, Dutch, Italian, Chinese, and German and is sold world wide. He also authored The SQL Guide to Ingres and SQL for MySQL Developers. As author for BeyeNetwork.com, writer of whitepapers, chairman for the annual European Enterprise Data and Business Intelligence Conference, and as columnist for a few IT magazines, he has close contacts with many vendors. R20/Consultancy B.V. is located in The Hague, The Netherlands, www.r20.nl. You can get in touch with Rick via: Email: rick@r20.nl Twitter: @Rick_vanderlans LinkedIn: http://www.linkedin.com/pub/rick-van-der-lans/9/207/223 Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 2
  • 3. Self-Service Data Exploration with Apache Drill © 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 3
  • 4. The MapR Distribution including Apache Hadoop © 2014 MapR Technologies 4 Top Ranked Exponential Growth 500+ Customers Premier Investors >2x annual bookings 90% software licenses 80% of accounts expand 3X < 1% lifetime churn > $1B in incremental revenue generated by 1 customer
  • 5. The Power of the Open Source Community Provisioning & coordination Savannah* Workflow & Data Governance Data Integration & Access Hue HttpFS Flume Knox* Falcon* Whirr MapR-FS MapR-DB © 2014 MapR Technologies 5 Management APACHE HADOOP AND OSS ECOSYSTEM Streaming Storm* NoSQL & Search Solr MapR Data Platform Security SQL Drill Shark Impala YARN Batch Spark Cascading Pig Spark Streaming HBase Juju ML, Graph GraphX MLLib Mahout MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Tez* Accumulo* Hive Sqoop Sentry* Oozie ZooKeeper * Certification/support planned for 2014
  • 6. Today’s Data Comes in Different Shapes… © 2014 MapR Technologies 6 Social Media Messages Audio Sensors Mobile Data Email Clickstream
  • 7. Real-World Data Modeling and Transformations © 2014 MapR Technologies 7
  • 8. © 2014 MapR Technologies 8
  • 9. Evolution Towards Self-Service Data Exploration © 2014 MapR Technologies 9 Data Modeling and Transformation Data Visualization IT-driven IT-driven IT-driven Self-service IT-driven Self-service Not needed Self-service Traditional BI w/ RDBMS Self-Service BI w/ RDBMS SQL-on-Hadoop Self-Service Data Exploration Zero-day analytics
  • 10. Improve time to value Redu2ce the burden on IT © 2014 MapR Technologies 10 Why Decrease the Distance to Data? • Enable rapid data exploration and application development • IT should provide a valuable service without “getting in the way” • Can’t add DBAs to keep up with the exponential data growth • Minimize “unnecessary work” so IT can focus on value-added activities and become a partner to the business users
  • 11. • Pioneering Data Agility for Hadoop • Apache open source project • Scale-out execution engine for low-latency queries • Unified SQL-based API for analytics & operational applications © 2014 MapR Technologies 11 APACHE DRILL 40+ contributors 150+ years of experience building databases and distributed systems
  • 12. Drill Supports Schema Discovery On-The-Fly Schema Declared In Advance Schema2 Discovered On-The-Fly © 2014 MapR Technologies 12 • Fixed schema • Leverage schema in centralized repository (Hive Metastore) • Fixed schema, evolving schema or schema-less • Leverage schema in centralized repository or self-describing data SCHEMA ON WRITE SCHEMA BEFORE READ SCHEMA ON THE FLY
  • 13. Optimized Data Architecture Machine Learning © 2014 MapR Technologies 13 MapR Optimized Data Architecture Sources RELATIONAL, SAAS, MAINFRAME DOCUMENTS, EMAILS BLOGS, TWEETS, LINK DATA LOG FILES, CLICKSTREAMS SENSORS Streaming (Spark Streaming, Storm) Batch / Search (MR, Spark, Hive, Pig, …) NoSQL ODBMS (HBase, Accumulo, …) MapR Data Platform MapR-DB MAPR DISTRIBUTION FOR HADOOP MapR-FS MAPR DISTRIBUTION FOR HADOOP DATA WAREHOUSE Data Movement Data Access Analytics Search Schema-less data exploration BI, reporting Ad-hoc integrated analytics Data Transformation, Enrichment and Integration Operational Apps Recommendations Fraud Detection Logistics
  • 14. © 2014 MapR Technologies 14 (1) Self-Describing Data is Ubiquitous Flat files in DFS • Complex data (Thrift, Avro, protobuf) • Columnar data (Parquet, ORC) • Loosely defined (JSON) • Traditional files (CSV, TSV) Data stored in NoSQL stores • Relational-like (rows, columns) • Sparse data (NoSQL maps) • Embedded blobs (JSON) • Document stores (nested objects) { name: { first: Michael, last: Smith }, hobbies: [ski, soccer], district: Los Altos }{ name: { first: Jennifer, last: Gates }, hobbies: [sing], preschool: CCLC }
  • 15. RDBMS/SQL-on-Hadoop table Apache Drill table © 2014 MapR Technologies 15 (2) Drill’s Data Model is Flexible Fixed schema Schema-less HBase JSON BSON CSV TSV Parquet Avro Flat Complex Flexibility Flexibility Name Gender Age Michael M 6 Jennifer F 3 { name: { first: Michael, last: Smith }, hobbies: [ski, soccer], district: Los Altos }{ name: { first: Jennifer, last: Gates }, hobbies: [sing], preschool: CCLC }
  • 16. Quick Tour Self-Service Data Exploration with Apache Drill © ©20 21041 M4 aMpaRp RTe Tcehcnhonloogloiegsies 16
  • 17. Zero to Results in 2 Minutes (3 Commands) $ tar xzf apache-drill.tar.gz $ apache-drill/bin/sqlline -u jdbc:drill:zk=local 0: jdbc:drill:zk=local> SELECT count(*) AS incidents, columns[1] AS category FROM dfs.`/tmp/SFPD_Incidents_-_Previous_Three_Months.csv` GROUP BY columns[1] ORDER BY incidents DESC; +------------+------------+ | incidents | category | +------------+------------+ | 8372 | LARCENY/THEFT | | 4247 | OTHER OFFENSES | | 3765 | NON-CRIMINAL | | 2502 | ASSAULT | ... 35 rows selected (0.847 seconds) Install Launch shell (embedded mode) Query Results © 2014 MapR Technologies 17
  • 18. © 2014 MapR Technologies 18 A storage engine instance - DFS - HBase - Hive Metastore/HCatalog A workspace - Sub-directory - Hive database A table - pathnames - HBase table - Hive table Data Source is in the Query SELECT timestamp, message FROM dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet` WHERE errorLevel > 2
  • 19. © 2014 MapR Technologies 19 Query Directory Trees # Query file: How many errors per level in Jan 2014? SELECT errorLevel, count(*) FROM dfs.logs.`/AppServerLogs/2014/Jan/part0001.parquet` GROUP BY errorLevel; # Query directory sub-tree: How many errors per level? SELECT errorLevel, count(*) FROM dfs.logs.`/AppServerLogs` GROUP BY errorLevel; # Query some partitions: How many errors per level by month from 2012? SELECT errorLevel, count(*) FROM dfs.logs.`/AppServerLogs` WHERE dirs[1] >= 2012 GROUP BY errorLevel, dirs[2];
  • 20. Works with HBase and Embedded Blobs # Query an HBase table directly (no schemas) SELECT cf1.month, cf1.year FROM hbase.table1; # Embedded JSON value inside column profileBlob inside column family cf1 of the HBase table users SELECT profile.name, count(profile.children) FROM ( SELECT CONVERT_FROM(cf1.profileBlob, 'json') AS profile FROM hbase.users ) © 2014 MapR Technologies 20
  • 21. © 2014 MapR Technologies 21 Combine Data Sources on the Fly # Join log directory with JSON file (user profiles) to identify the name and email address for anyone associated with an error message. SELECT DISTINCT users.name, users.emails.work FROM dfs.logs.`/data/logs` logs, dfs.users.`/profiles.json` users WHERE logs.uid = users.id AND logs.errorLevel > 5; # Join a Hive table and an HBase table (without Hive metadata) to determine the number of tweets per user SELECT users.name, count(*) as tweetCount FROM hive.social.tweets tweets, hbase.users users WHERE tweets.userId = convert_from(users.rowkey, 'UTF-8') GROUP BY tweets.userId;
  • 22. Data Exploration Simple SQL-on-Hadoop (schema) Advanced SQL & Analytics © 2014 MapR Technologies 22 SQL Technologies Available on MapR Drill 0.5 Hive 0.13 w/ Tez Impala 1.x Shark 0.9 Vertica Latency Low Medium Low Low for in-memory) Med for on disk Low Files Yes (all Hive formats) Yes (all Hive file formats) Yes (Parquet, Sequence, …) Yes (all Hive file formats) Proprietary All Hive file formats can be used as external tables HBase/MapR-DB Yes Yes, Performance issues Yes, performance issues Yes, Performance issues No Hive compatibility High High Medium High NA Schema Schema-less or Hive or Hbase Hive Hive Hive Proprietary or Hive SQL support ANSI SQL HiveQL HiveQL (subset) HiveQL ANSI SQL + advanced analytics Client support ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC, ADO.NET, … Large datasets Yes Yes Limited Yes Yes Nested data Yes Limited No Limited Limited Machine learning No No No Yes No Transactions No No No No Yes Optimizer Limited Limited Limited Limited High Concurrency Medium Medium Medium Limited High
  • 23. © 2014 MapR Technologies 23 Q& A Engage with us! • SQL-on-Hadoop engines explained http://info.mapr.com/wp-sql-on-hadoop-engines-explained • Get demo and tutorials on Apache Drill – https://www.mapr.com/products/apache-drill • Apache Drill 0.5 available now – Download and play: http://incubator.apache.org/drill/ – Ask questions: drill-user@incubator.apache.org – Contribute: http://github.com/apache/incubator-drill/ @rick_vanderlans – Rick van der Lans @swooledge – Steve Wooledge • Contact / follow us
  • 24. Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photographic, or otherwise, without the explicit written permission of the copyright owners. SQL‐on‐Hadoop Explained by Rick F. van der Lans R20/Consultancy BV Twitter @rick_vanderlans www.r20.nl
  • 25. Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 2
  • 26. Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 3
  • 27. Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 4
  • 28. Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 5
  • 29. It’s All About Analytics Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 6
  • 30. Requirements for Data Storage Technology High data storage scalability High data processing scalability High performance Low price/performance ratio All data types High schema flexibility Fast loading Enterprise-grade Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 7
  • 31. Comparison Data Storage Technologies Hadoop Classic SQL DB High data storage scalability Yes Less High data processing scalability Yes Less High performance Yes Less Low price/performance ratio Yes No All data types Yes Most data types High schema flexibility Depends No Fast loading Yes Yes Enterprise-grade Depends Yes Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 8
  • 32. Manipulating Hadoop Data Apache HBase API Apache HBase Apache HDFS API Apache HDFS Apache MapReduce API Apache MapReduce Apache HDFS API Apache HDFS Apache HDFS API Apache HDFS Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 9
  • 33. Performance Dominates in Hadoop Productivity Maintainability Performance Time-to-market Scalability Availability Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 10
  • 34. The Need for SQL‐on‐Hadoop Add high productivity and maintainability, while retaining high performance and scalability Advantages SQL-on-Hadoop • Well-known database language (especially in the BI community) • Large target audience • High productivity and maintainability • Openness to many reporting and analytical tools Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 11
  • 35. Productivity is as Important Productivity Maintainability Time-to-market Performance Scalability Availability Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 12
  • 36. Different Solutions Apache HiveQL Apache Hive Apache MapReduce API Apache MapReduce Apache HDFS API Apache HDFS A SQL Dialect SQL‐on‐Hadoop Apache HBase API Apache HBase Apache HDFS API Apache HDFS A SQL Dialect SQL‐on‐Hadoop Apache HDFS API Any HDFS A SQL Dialect SQL‐on‐Hadoop Apache HDFS API Any HDFS Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 13
  • 37. Not all SQL‐on‐Hadoop Engines are Created Equal Batch-oriented query environment (data mining) Interactive query environment (OLAP, self-service BI, data visualization) Point-queries (retrieving individual objects) Investigative analytics (data science) Operational intelligence (real-time analytics) Transactional (production systems) Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 14
  • 38. Technological Challenges Non-SQL-to-SQL transformational changes • Nested data • Variable data • Schema-less data • Self-describing data Architectural Challenges • Managing concurrent queries/users • Parallel execution of complex operations • Running complex analytical functions • Cost-based optimization Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 15
  • 39. Is All Data Relational Data?  create table  insert data SQL-on- Hadoop HDFS  select data  insert data  select data SQL-on- Hadoop Other application HDFS  create file Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 16
  • 40. Transforming Nested Data (1) CUSTOMER_ID LAST_NAME FIRST_NAME CUSTOMER_ORDERS 75295 Sylvian David CUSTOMER_ORDER_ID ORDER_TIMESTAMP 203699 2008-01-16 306892 2008-07-21 477047 2008-12-09 103819 Scaggs Boz CUSTOMER_ORDER_ID ORDER_TIMESTAMP 70675 2008-10-19 530223 2008-12-01 132171 Rundgren Todd CUSTOMER_ORDER_ID ORDER_TIMESTAMP 210220 2008-04-21 485584 2008-10-14 718579 2008-11-23 741912 2008-12-24 Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 17
  • 41. Transforming Nested Data (2) Alternative 1: CUSTOMER_ID LAST_NAME FIRST_NAME CUSTOMER_ORDERS 75295 Sylvian David {203699,2008-01-16},{306892,2008—07-21}, 103819 Scaggs Boz {70675,2008-10-19},{530223,2008—12-01} 132171 Rundgren Todd {210220,2008-04-21},{485584,2008—10-14}, Alternative 2: {477047,2008-12-09} {718579,2008-12-24} CUSTOMER_ID CUSTOMER_ORDER_ID ORDER_TIMESTAMP LAST_NAME FIRST_NAME 75295 203699 2008-01-16 Sylvian David 75295 306892 2008-07-21 Sylvian David 75295 477047 2008-12-09 Sylvian David Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 18
  • 42. Transforming Variable Data (1) Example 1: CUSTOMER_ID CUSTOMER_ORDER_ID ORDER_TIMESTAMP 75295 203699 2008-01-16 75295 306892 2008-07-21 75295 477047 2008-12-09 CUSTOMER_ID CUSTOMER_ORDER_ID ORDER_TIMESTAMP ORDER_PROCESSED 463281 203643 2008-01-16 2008-01-20 CUSTOMER_ID CUSTOMER_ORDER_ID ORDER_TIMESTAMP ORDER_CANCELLED 463246 285825 2008-01-19 2008-10-20 Example 2: ……………… CUSTOMER_ID CUSTOMER_NAME TELEPHONE_NUMBERS 463246 O’Keefe {5157818, 2362436} 463249 Zappa {1234567, 3262836, 4374777} 463350 Donahue {3854757} Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 19
  • 43. Transforming Variable Data (2) Alternative 1: CUSTOMER_ID CUSTOMER_NAME TELEPHONE_1 TELEPHONE_2 TELEPHONE_3 463246 O’Keefe 5157818 2362436 ? 463249 Zappa 1234567 3262836 4374777 463350 Donahue 3854757 ? ? Alternative 2: CUSTOMER_ID CUSTOMER_NAME TELEPHONE_NUMBER 463246 O’Keefe 5157818 463246 O’Keefe 2362436 463249 Zappa 1234567 463249 Zappa 3262836 463249 Zappa 4374777 463350 Donahue 3854757 Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 20
  • 44. Transforming Schema‐Less Data Weblog record datestamp ip request 6/1/2012 11:10:19 AM 107.1.187.170 GET /x.php?u=http://studio-5.financialcontent.com/synacor?Page=QUOTE&Ticke HTTP/1.1 6/1/2012 5:53:49 AM 107.1.2.180 GET /tv/3/player/vendor/Chef% /player/fiveminute/content/steak/asset/gnrc_15879500 HTTP/1.1 6/1/2012 107.34.51.63 GET /tv/3/search/content/The%20Andy%20Griffith%20Show/s/T Andy%20Griffith%20Show HTTP/1.1 6/1/2012 3:12:43 PM 107.5.115.117 GET /tv/3/search/content/Kathie%20Lee%20Gifford's%20epic%20'Today'%20gaffe %20Lee%20Gifford's%20epic%20'Today'%20gaffe HTTP/1.1 6/1/2012 4:48:35 108.225.132.245 GET /tv/3/search/content/Deadliest%20Catch/s/Deadliest HTTP/1.1 6/1/2012 10:25:12 AM 108.246.20.125 GET /x.php?u=http://studi 5.financialcontent.com/synacor?Page=QUOTE&Ticker=DJ:DJI HTTP/1.1 6/1/2012 1:58:14 AM 108.246.25.117 GET /tv/3/player/vendor/Chef%20Tips /fiveminute/content/steak/asset/gnrc_15879500 HTTP/1.1 Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 21
  • 45. Transforming Self‐Describing Data ID VALUE 75295 { “employee” : { “number” : “6”, “name” : “Manzarek”, “initials”: “R”, “street ”: “Haseltine Lane”} } 103819 { “employee” : { “number” : “7”, “name” : “Metheny”, “initials”: “P”, “street” : “Brownstreet”} } 132171 { “employee” : { “number” : “15”, “name” : “Metheny”, “initials”: “M”} } ID EMPLOYEE_NUMBER EMPLOYEE_NAME EMPLOYEE_INITIALS EMPLOYEE_STREET 75295 6 Manzarek R Haseltine Lane 103819 7 Metheny P Brownstreet 132171 15 Metheny M ? Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 22
  • 46. Architectural Challenges Managing concurrent queries/users Parallel execution of complex operations Running complex analytical functions Cost-based optimization Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 23
  • 47. Use Cases of SQL‐on‐Hadoop Traditional Interactive Reporting and Analytics Self-Service Business Intelligence Batch Reporting Point Queries Operational Processing Investigative Analytics Data Stream Processing Storage Cold Data Warehouse Data Storage of External Data Fast Staging Area ETL (Pre)Processing Platform New Use Cases and Non-Relational Data … Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 24
  • 48. Watch out for Big Data Silos! batch processing investigative analytics point queries operational processing interactive reporting data stream analytics processing Silo 1 Silo 2 Silo 3 Silo 4 Silo 5 Silo 6 Silo 7 Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 25
  • 49. The Integration Labyrinth analytics processing dedicated integration solution batch processing dedicated integration solution point queries dedicated integration solution interactive reporting dedicated integration solution operational processing dedicated integration solution investigative analytics dedicated integration solution data stream dedicated integration solution Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 26
  • 50. One Platform to Rule Them All batch processing investigative analytics point queries operational processing interactive reporting data stream analytics processing One Data Management Platform Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 27
  • 51. Closing Remarks SQL offers standardization and independency SQL increases productivity and eases maintenance Many SQL-on-Hadoop engines available One platform Being able to process all types of data is important Productivity Maintainability Time-to-market Performance Scalability Availability Copyright © 1991 - 2014 R20/Consultancy B.V., The Hague, The Netherlands 28