SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Discover HDP 2.2: 
Even Faster SQL Queries 
with Apache Hive & Stinger.next 
Page 1 © Hortonworks Inc. 2014 
Hortonworks. We do Hadoop.
Speakers 
Page 2 © Hortonworks Inc. 2014 
Justin Sears 
Hortonworks Product Marketing Manager 
Alan Gates 
Hortonworks Co-Founder and Apache Hive Committer & 
PMC Member 
Raj Bains 
Hortonworks Senior Manger of Product Management for 
Apache Hive
Agenda 
• Introduction to Stinger.next 
• New Innovation in Apache Hive 0.14 
§ SQL: Transactions with ACID semantics 
§ Speed: Cost based optimizer for star and bushy joins 
§ Scale: Dynamic query optimizations 
• The Road Ahead for Stinger.next 
• Q & A 
We’ll move quickly: 
• Attendee phone lines are muted 
• Text any questions to Raj Bains using Webex chat 
• Questions answered at the end 
• Unanswered questions and answers in upcoming blog post 
Page 3 © Hortonworks Inc. 2014
Big Data, Hadoop & Data Center Re-platforming 
Business Drivers 
• From reactive analytics 
to proactive interactions 
• Insights that drive 
competitive advantage 
& optimal returns 
Page 4 © Hortonworks Inc. 2014 
$ 
Financial Drivers 
• Cost of data systems, as 
% of IT spend, 
continues to grow 
• Cost advantages of 
commodity hardware 
& open source software 
Technical Drivers 
• Data is growing 
exponentially & existing 
systems overwhelmed 
• Predominantly driven by 
NEW types of data that 
can inform analytics 
There is an inequitable balance between vendor and customer in the market
Clickstream 
Capture and analyze 
website visitors’ data 
trails and optimize 
your website 
Page 5 © Hortonworks Inc. 2014 
Sensors 
Discover patterns in 
data streaming 
automatically from 
remote sensors and 
machines 
Server Logs 
Research logs to 
diagnose process 
failures and prevent 
security breaches 
Hadoop Value: New Types of Data 
Sentiment 
Understand how 
your customers feel 
about your brand 
and products – 
right now 
Geographic 
Analyze location-based 
data to 
manage operations 
where they occur 
Unstructured 
Understand patterns 
in files across millions 
of web pages, emails, 
and documents
A Shift from Reactive to Proactive Interactions 
A shift in Advertising 
From mass branding …to 1x1 Targeting 
A shift in Financial Services 
From Educated Investing …to Automated Algorithms 
A shift in Healthcare 
From mass treatment …to Designer Medicine 
A shift in Retail 
A shift in Telco 
Page 6 © Hortonworks Inc. 2014 
HDP and Hadoop allow 
organizations to use 
data to shift interactions 
from… 
Reactive 
Post Transaction 
Proactive 
Pre Decision 
…to Real-t From static branding ime Personalization 
From break then fix …to repair before break
Enterprise Goals for the Modern Data Architecture 
Batch Interactive Real-Time 
Page 7 © Hortonworks Inc. 2014 
• Consolidate siloed data sets structured 
and unstructured 
• Central data set on a single cluster 
• Multiple workloads across batch 
interactive and real time 
• Central services for security, governance 
and operation 
• Preserve existing investment in current 
tools and platforms 
• Single view of the customer, product, 
supply chain 
DATA SYSTEM APPLICATIONS 
Business 
Analytics 
Custom 
Applications 
Packaged 
Applications 
RDBMS 
EDW 
MPP 
YARN: Data Operating System 
1 ° ° ° ° ° ° ° ° ° 
° 
° ° ° ° ° ° ° ° N 
CRM 
ERP 
Other 
1 ° ° ° 
° ° ° HDFS 
(Hadoop Distributed File System) 
SOURCES 
EXISTING 
Systems 
Clickstream 
Web 
&Social 
Geoloca9on 
Sensor 
& 
Machine 
Server 
Logs 
Unstructured
YARN Transformed Hadoop & Opened a New Era 
Script 
Pig 
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS 
SQL 
Hive 
TezTez 
Page 8 © Hortonworks Inc. 2014 
YARN 
The Architectural 
Center of Hadoop 
• Common data platform, many applications 
• Support multi-tenant access & processing 
• Batch, interactive & real-time use cases 
Java 
Scala 
Cascading 
Tez 
Stream 
Storm 
YARN: Data Operating System 
(Cluster Resource Management) 
1 ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° 
° ° 
° ° 
Others 
ISV 
Engines 
° ° ° ° ° 
° ° ° ° ° 
HDFS 
(Hadoop Distributed File System) 
Search 
Solr 
NoSQL 
HBase 
Accumulo 
Sli der 
Slider 
In-Memory 
Spark
YARN Extends Hadoop to Other Data Center Leaders 
Script 
Pig 
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS 
SQL 
Hive 
TezTez 
Java 
Scala 
Cascading 
Tez 
NoSQL 
HBase 
Accumulo 
Sli der 
1 ° ° ° ° ° ° ° 
Stream 
Storm 
Slider 
HDFS 
In-Memory 
Spark 
(Hadoop Distributed File System) 
° ° ° ° ° ° ° ° 
Page 9 © Hortonworks Inc. 2014 
YARN 
The Architectural 
Center of Hadoop 
• Common data platform, many applications 
• Support multi-tenant access & processing 
• Batch, interactive & real-time use cases 
• Supports 3rd-party ISV tools 
(ex. SAS, Syncsort, Actian, etc.) 
YARN: Data Operating System 
(Cluster Resource Management) 
° ° 
° ° 
Others 
ISV 
Engines 
Search 
Solr 
° ° ° ° ° 
° ° ° ° ° 
YARN Ready Applications 
Facilitates ongoing innovation and enterprise adoption via 
ecosystem of new and existing “YARN Ready” solutions
Enterprise Hadoop: Central Set of Services 
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS 
GOVERNANCE SECURITY OPERATIONS 
Tez 
TezTez 
Page 10 © Hortonworks Inc. 2014 
Slider 
Slider 
YARN: Data Operating System 
(Cluster Resource Management) 
1 ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° 
° ° 
° ° 
° ° ° ° ° 
° ° ° ° ° 
Enables Apache Hadoop to be 
an Enterprise Data Platform 
with centralized services for: 
• Governance 
• Operations 
• Security 
Everything that plugs into 
Hadoop inherits these services 
Provision, 
Manage & 
Monitor 
Ambari 
Zookeeper 
Scheduling 
Oozie 
Load data and 
manage 
according 
to policy 
Deploy and 
effectively 
manage the 
platform 
Provide layered 
approach to 
security through 
Authentication, 
Authorization, 
Accounting, and 
Data Protection 
Script 
Pig 
SQL 
Hive 
Java 
Scala 
Cascading 
Stream 
Storm 
Search 
Solr 
NoSQL 
HBase 
Accumulo 
In-Memory 
Spark 
Others 
ISV 
Engines 
HDFS 
(Hadoop Distributed File System)
Hortonworks Development Investment for the Enterprise 
Vertical Integration with YARN and HDFS 
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS 
GOVERNANCE SECURITY OPERATIONS 
Tez 
TezTez 
Slider 
1 ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° 
Page 11 © Hortonworks Inc. 2014 
Slider 
° ° 
° ° 
° ° ° ° ° 
° ° ° ° ° 
Provision, 
Manage & 
Monitor 
Ambari 
Zookeeper 
Scheduling 
Oozie 
Load data and 
manage 
according 
to policy 
Deploy and 
effectively 
manage the 
platform 
Provide layered 
approach to 
security through 
Authentication, 
Authorization, 
Accounting, and 
Data Protection 
Script 
Pig 
SQL 
Hive 
Java 
Scala 
Cascading 
Stream 
Storm 
Search 
Solr 
NoSQL 
HBase 
Accumulo 
In-Memory 
Spark 
Others 
ISV 
Engines 
YARN: Data Operating System 
(Cluster Resource Management) 
HDFS 
(Hadoop Distributed File System) 
• Ensure engines can run reliably and respectfully in a YARN based cluster 
• Implement features throughout the stack to accommodate
Hortonworks Development Investment for the Enterprise 
Horizontal Integration for Enterprise Services 
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS 
GOVERNANCE SECURITY OPERATIONS 
Tez 
TezTez 
Slider 
1 ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° 
Page 12 © Hortonworks Inc. 2014 
Slider 
° ° 
° ° 
° ° ° ° ° 
° ° ° ° ° 
Provision, 
Manage & 
Monitor 
Ambari 
Zookeeper 
Scheduling 
Oozie 
Load data and 
manage 
according 
to policy 
Deploy and 
effectively 
manage the 
platform 
Provide layered 
approach to 
security through 
Authentication, 
Authorization, 
Accounting, and 
Data Protection 
Script 
Pig 
SQL 
Hive 
Java 
Scala 
Cascading 
Stream 
Storm 
Search 
Solr 
NoSQL 
HBase 
Accumulo 
In-Memory 
Spark 
Others 
ISV 
Engines 
YARN: Data Operating System 
(Cluster Resource Management) 
HDFS 
(Hadoop Distributed File System) 
• Ensure consistent enterprise services are applied across the entire Hadoop stack 
• Integrate with and extend existing data center solutions for these key requirements
HDP Delivers Enterprise Hadoop 
Hortonworks Data Platform 2.2 
GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SECURITY OPERATIONS 
Script 
Pig 
SQL 
Hive 
TezTez 
Page 13 © Hortonworks Inc. 2014 
Java 
Scala 
Cascading 
Tez 
Stream 
Storm 
YARN: Data Operating System 
(Cluster Resource Management) 
1 ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° 
° ° 
° ° 
° ° ° ° ° 
° ° ° ° ° 
HDFS 
(Hadoop Distributed File System) 
Search 
Solr 
NoSQL 
HBase 
Accumulo 
Sli der 
Slider 
In-Memory 
Spark 
Provision, 
Manage & 
Monitor 
Ambari 
Zookeeper 
Scheduling 
Oozie 
Data Workflow, 
Lifecycle & 
Governance 
Falcon 
Sqoop 
Flume 
Kafka 
NFS 
WebHDFS 
Authentication 
Authorization 
Audit 
Data Protection 
Storage: HDFS 
Resources: YARN 
Access: Hive 
Pipeline: Falcon 
Cluster: Ranger 
Cluster: Knox 
Linux Windows Deployment Choice Cloud 
YARN is the architectural 
center of HDP 
• Common data set across all 
applications 
• Batch, interactive & real-time 
workloads 
• Multi-tenant access & processing 
Provides comprehensive 
enterprise capabilities 
• Governance 
• Security 
• Operations 
Enables broad 
ecosystem adoption 
• ISVs can plug directly into Hadoop 
The widest range of deployment options 
• Linux & Windows 
• On premises & cloud 
Others 
ISV 
Engines 
On-Premises
HDP Delivers Enterprise Hadoop 
Hortonworks Data Platform 2.2 
GOVERNANCE OPERATIONS 
Script 
Pig 
Tez 
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS 
YARN: Data Operating System 
(Cluster Resource Management) 
SQL 
Hive 
Tez 
1 ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° 
Page 14 © Hortonworks Inc. 2014 
Java 
Scala 
Cascading 
Tez 
Stream 
Storm 
° ° 
° ° 
° ° ° ° ° 
° ° ° ° ° 
HDFS 
(Hadoop Distributed File System) 
Search 
Solr 
NoSQL 
HBase 
Accumulo 
Sli der 
Slider 
In-Memory 
Spark 
Provision, 
Manage & 
Monitor 
Ambari 
Zookeeper 
Scheduling 
Oozie 
Data Workflow, 
Lifecycle & 
Governance 
Falcon 
Sqoop 
Flume 
Kafka 
NFS 
WebHDFS 
YARN is the architectural 
center of HDP 
• Common data set across all 
applications 
• Batch, interactive & real-time 
workloads 
• Multi-tenant access & processing 
Provides comprehensive 
enterprise capabilities 
• Governance 
• Security 
• Operations 
Enables broad 
ecosystem adoption 
• ISVs can plug directly into Hadoop 
The widest range of deployment options 
• Linux & Windows 
• On premises & cloud 
Others 
ISV 
Engines 
SECURITY 
Authentication 
Authorization 
Audit 
Data Protection 
Storage: HDFS 
Resources: YARN 
Access: Hive 
Pipeline: Falcon 
Cluster: Ranger 
Cluster: Knox 
Linux Windows Deployment Choice On-Premises Cloud
Introduction to Stinger.next 
Page 15 © Hortonworks Inc. 2014
Stinger.next – Enterprise SQL at Hadoop Scale 
Stinger (Hive 0.13, Tez, ORC File) 
Scale to Petabytes 
Batch to Interactive Queries 
Read-Only Data 
Substantial SQL Support 
Single Tool for Multiple SQL workloads – 
Interactive, Reporting and ETL 
MapReduce, Tez Engines 
Page 16 © Hortonworks Inc. 2014 
Stinger.next 
Scale to Petabytes 
Sub-Second Queries 
Modify Data with Transactions 
Comprehensive SQL:2011 Analytics 
Single Tool for Multiple SQL workloads – 
Interactive, Reporting, ETL, ML 
MapReduce, Tez, Spark Engines
SQL in Hive 0.14: 
Transactions with ACID Semantics 
Page 17 © Hortonworks Inc. 2014
Transaction Use Cases 
Reporting with Analytics (YES) 
• Reporting on data with occasional updates 
• Corrections to the fact tables, evolving dimension tables 
• Low concurrency updates, low TPS 
Operational Reporting (YES, next) 
• High throughput ingest from operational (OLTP) database 
• Periodic inserts every 5-30 minutes 
• Requires tool support and changes in our Transactions 
Operational (OLTP) Database (NO) 
• Small Transactions, each doing single line inserts 
• High Concurrency - Hundreds to thousands of connections 
Page 18 © Hortonworks Inc. 2014 
Analytics Modifications 
Hive 
OLTP Replication Hive 
Hive 
High Concurrency 
OLTP
Deep Dive: Transactions 
Transaction Support in Hive with ACID semantics 
• Hive native support for INSERT, UPDATE, DELETE. 
• Split Into Phases: 
• Phase 1: Hive Streaming Ingest (append) 
• Phase 2: INSERT / UPDATE / DELETE Support 
• Phase 3: BEGIN / COMMIT / ROLLBACK Txn 
[Done] 
[HDP 2.2] 
[Next] 
Page 19 © Hortonworks Inc. 2014 
Read- 
Optimized 
ORCFile 
Hive ACID Compactor 
periodically merges the delta 
files in the background. 
Delta File Merged 
Read- 
Optimized 
ORCFile 
1. Original File 
Task reads the latest 
ORCFile 
Task 
Read- 
Optimized 
ORCFile 
Task Task 
2. Edits Made 
Task reads the ORCFile and merges 
the delta file with the edits 
3. Edits Merged 
Task reads the updated ORCFile
Speed in Hive 0.14: 
Cost Based Optimizer 
Page 20 © Hortonworks Inc. 2014
TPC-DS 
Query 17 
Page 21 © Hortonworks Inc. 2014 
SELECT i_item_id, i_item_desc, s_state, 
Count(ss_quantity) AS store_sales_quantitycount, 
Avg(ss_quantity) AS store_sales_quantityave, 
Stddev_samp(ss_quantity) AS store_sales_quantitystdev, 
Stddev_samp(ss_quantity) / Avg(ss_quantity) AS store_sales_quantitycov, 
Count(sr_return_quantity) as_store_returns_quantitycount, 
Avg(sr_return_quantity) as_store_returns_quantityave, 
Stddev_samp(sr_return_quantity) as_store_returns_quantitystdev, 
Stddev_samp(sr_return_quantity) / Avg(sr_return_quantity) AS store_returns_quantitycov, 
Count(cs_quantity) AS catalog_sales_quantitycount, 
Avg(cs_quantity) AS catalog_sales_quantityave, 
Stddev_samp(cs_quantity) / Avg(cs_quantity) AS catalog_sales_quantitystdev, 
Stddev_samp(cs_quantity) / Avg(cs_quantity) AS catalog_sales_quantitycov 
FROM store_sales, 
store_returns, 
catalog_sales, 
date_dim d1, 
date_dim d2, 
date_dim d3, 
store, 
item 
WHERE d1.d_quarter_name = '2000Q1' 
AND d1.d_date_sk = store_sales.ss_sold_date_sk 
AND ss_sold_date BETWEEN '2000-01-01' AND '2000-03-31' 
AND item.i_item_sk = store_sales.ss_item_sk 
AND store.s_store_sk = store_sales.ss_store_sk 
AND store_sales.ss_customer_sk = store_returns.sr_customer_sk 
AND store_sales.ss_item_sk = store_returns.sr_item_sk 
AND store_sales.ss_ticket_number = store_returns.sr_ticket_number 
AND store_returns.sr_returned_date_sk = d2.d_date_sk 
AND d2.d_quarter_name IN ( '2000Q1', '2000Q2', '2000Q3' ) 
AND sr_returned_date BETWEEN '2000-01-01' AND '2000-09-01' 
AND store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk 
AND store_returns.sr_item_sk = catalog_sales.cs_item_sk 
AND catalog_sales.cs_sold_date_sk = d3.d_date_sk 
AND d3.d_quarter_name IN ( '2000Q1', '2000Q2', '2000Q3' ) 
AND cs_sold_date BETWEEN '2000-01-01' AND '2000-09-31' 
GROUP BY i_item_id, 
i_item_desc, 
s_state 
ORDER BY i_item_id, 
i_item_desc, 
s_state 
LIMIT 100;
CBO on Selected Queries – 17 
Filter: date Filter: date Filter: date 
store_sales store_returns catalog_sales 
date_dim d1 date_dim d2 date_dim d3 
Filter: quarter Filter: quarter Filter: quarter 
items store 
Page 22 © Hortonworks Inc. 2014 
customer_sk 
ticket_number 
customer_sk 
Item_sk 
date_sk date_sk date_sk 
item_sk store_sk
OLD: Left Deep Plan 
Map 12 
Table_scan 
Store_returns 
Reducer 10 
Merge join 12, 9 
Reducer 3 
• Merge join 2 & 10 
• Map join 1 
• Map join 6 
• Map Join 7 
• Map Join 8 store 
• Map Join 11 item 
• Filter 
• Group By 
• Reduce 
Page 23 © Hortonworks Inc. 2014 
Map 2 Table_scan catalog_sales 
Map 6 Table_scan d2, filter 
Map 7 Table_scan d3, filter 
Reducer 4 
Group_By 
Reduce 
Map 9 
Table_scan 
store_sales 
Map 1 Table_scan d1, filter 
Reducer 5 
Limit 
B 
B 
B 
Map 8 Table_scan store B 
Map 11 Table_scan item 
Large Fact tables 
joined together 
without filters 
B
NEW: Complex Bushy Plan 
Page 24 © Hortonworks Inc. 2014 
Reducer 4 
Merge join 3 & 8 
Map join store 
Map join item 
Reduce 
Map 10 
table_scan 
store 
Map 12 
Table_scan 
item 
Map 3 
Store_sales 
Map join 
Map 8 
Store_returns 
Map join 
Reducer 5 
Merge_Join 
Group_By 
Reduce 
Map 9 
Table_scan d1, 
filter 
Map 11 
catalog_sales, 
Map Join 
Map 1 
Table_scan d1, 
filter 
Map 2 
Table_scan d1, 
filter 
Reducer 6 
Group by 
Reduce 
Reducer7 
Limit 
B 
B B 
B B 
All 3 Large Fact 
tables joined with 
date dimension 
limiting data to 
few quarters
Performance Improvement – Query 17 
Scale = 30TB 
Input records ~186mil 
Page 25 © Hortonworks Inc. 2014 
CBO Elapsed 
Time (sec) 
Elapsed 
Time 
Intermediate 
data (GB) 
Output and 
Intermediate 
Records 
OFF 10,683 ~3 hrs 5,017 135,647,792,123 
ON 1,284 ~20 mins 275 8,543,232,360
Scale in Hive 0.14: 
Dynamic Query Optimization 
Page 26 © Hortonworks Inc. 2014
Auto Reducer Parallelism 
Use dynamic data volume during 
execution 
rather than estimates from query 
compilation to determine the number of 
reducers 
Leads to 
faster query execution, 
better resource utilizations 
Page 27 © Hortonworks Inc. 2014 
Vertex 
Manager 
Vertex 
State 
Machine 
App Master 
Time 
1. Data size statistics 
Tasks for a single map vertex 
Tasks for a single reduce vertex 
2. Set parallelism 
3. Re-route 
4. Cancel 
task 
Vertex 
Manager 
Vertex 
State 
Machine 
App Master 
5. Tasks Completed 
Tasks for a single map vertex 
Tasks for a single reduce vertex 
6. Start Tasks 
7. Start
Auto Reducer Parallelism 
use tpcds_bin_partitioned_orc_30000; 
set hive.tez.auto.reducer.parallelism=true; 
set hive.tez.min.partition.factor=0.125; 
SELECT ss_promo_sk, 
Sum(ss_sales_price), 
Count(*) 
FROM store_sales 
WHERE ss_sold_date < '1998-03-01' 
GROUP BY ss_promo_sk 
ORDER BY 2 DESC 
LIMIT 10; 
Page 28 © Hortonworks Inc. 2014
Dynamic Partition Pruning 
Table Definition 
create table store_sales 
(...) 
partitioned by (ss_sold_date_sk int) 
stored as orc; 
Example Join of 
• a large Fact table with multiple partitions 
• with a dimension table that has a filter 
Page 29 © Hortonworks Inc. 2014 
store_sales 
d1 d2 d3 d4 … 
ss_sold_date_sk = date_sk 
Filter date_dim d1 
The ss_sold_date_sk partitions 
that can be pruned away at join 
time is not known till the filter is 
applied at runtime 
Compile Time Design 
• Insert synthetic conditions for each join representing "x in 
(keys of other side in join)”. Optimizer will push it as far down 
as possible 
• If the condition hits a table scan and the column involved is a 
partition column: 
• Setup Operator to send key events to AM 
• else: 
• Remove synthetic predicate 
1. Send events for partition pruning 
Vertex 
Manager 
Vertex 
State 
Machine 
App Master 
Tasks for a single map vertex 
Tasks for a single map vertex
Dynamic Pruning 
TPC-DS Query 3 
SELECT dt.d_year, 
item.i_brand_id brand_id, 
item.i_brand brand, 
Sum(ss_ext_sales_price) sum_agg 
FROM date_dim dt, 
store_sales, 
item 
WHERE dt.d_date_sk = store_sales.ss_sold_date_sk 
AND store_sales.ss_item_sk = item.i_item_sk 
AND item.i_manufact_id = 436 
AND dt.d_moy = 12 
GROUP BY dt.d_year, 
item.i_brand, 
item.i_brand_id 
ORDER BY dt.d_year, 
sum_agg DESC, 
brand_id 
LIMIT 100; 
Page 30 © Hortonworks Inc. 2014
Stinger.next: The Road Ahead 
Page 31 © Hortonworks Inc. 2014
Stinger.next - Delivery Themes 
Beyond 
Read-­‐Only 
2nd 
Half 
2014 
• Transac(ons 
with 
ACID 
allowing 
insert, 
update 
and 
delete 
• Temporary 
Tables 
• Cost 
Based 
Op(mizer 
op(mizes 
star 
and 
bushy 
join 
queries 
Page 32 © Hortonworks Inc. 2014 
Sub-­‐Second 
1st 
Half 
2015 
• Sub-­‐Second 
queries 
with 
LLAP 
• Hive-­‐Spark 
Machine 
Learning 
integra(on 
• Opera(onal 
repor(ng 
with 
Hive 
Streaming 
Ingest 
and 
Transac(ons 
• Replica(on 
and 
SQL/CBO 
improvements 
Richer 
Analy9cs 
2nd 
Half 
2015 
• Toward 
SQL:2011 
Analy(cs 
• Materialized 
Views 
• Cross-­‐Geo 
Queries 
• Workload 
Management 
via 
YARN 
and 
LLAP 
integra(on
Q & A 
Page 33 © Hortonworks Inc. 2014
Thank you! 
Learn more at: 
hortonworks.com/hadoop/hive/ 
Page 34 © Hortonworks Inc. 2014 
Register for the remaining 6 
Discover HDP 2.2 Webinars 
Hortonworks.com/webinars

Más contenido relacionado

Was ist angesagt?

Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Hortonworks
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalHortonworks
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun ConnollyHortonworks
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceHortonworks
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Hortonworks
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHortonworks
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHortonworks
 
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Hortonworks
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudHortonworks
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramHortonworks
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageHortonworks
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Hortonworks
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 

Was ist angesagt? (20)

Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun Connolly
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 

Andere mochten auch

Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Josh Elser
 
Nov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataNov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataYahoo Developer Network
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarHortonworks
 
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks Technical Workshop -  build a yarn ready application with apache ...Hortonworks Technical Workshop -  build a yarn ready application with apache ...
Hortonworks Technical Workshop - build a yarn ready application with apache ...Hortonworks
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Hortonworks
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopHortonworks
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN ApplicationsHortonworks
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Hortonworks
 
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNYARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNHortonworks
 
Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNApache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNHortonworks
 
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Hortonworks
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Hortonworks
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark Hortonworks
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Hortonworks
 

Andere mochten auch (17)

Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
 
Nov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataNov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big Data
 
Nov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.HNov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.H
 
HUG Nov 2010: HDFS Raid - Facebook
HUG Nov 2010: HDFS Raid - FacebookHUG Nov 2010: HDFS Raid - Facebook
HUG Nov 2010: HDFS Raid - Facebook
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider Webinar
 
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks Technical Workshop -  build a yarn ready application with apache ...Hortonworks Technical Workshop -  build a yarn ready application with apache ...
Hortonworks Technical Workshop - build a yarn ready application with apache ...
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25
 
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNYARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
 
Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNApache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARN
 
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
 

Ähnlich wie Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - finalHortonworks
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in HadoopRommel Garcia
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Rommel Garcia
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Joan Novino
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014Hortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with sparkHortonworks
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 

Ähnlich wie Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next (20)

Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in Hadoop
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 

Mehr von Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

Mehr von Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Último

Steps to Successfully Hire Ionic Developers
Steps to Successfully Hire Ionic DevelopersSteps to Successfully Hire Ionic Developers
Steps to Successfully Hire Ionic Developersmichealwillson701
 
BusinessGPT - SECURITY AND GOVERNANCE FOR GENERATIVE AI.pptx
BusinessGPT  - SECURITY AND GOVERNANCE  FOR GENERATIVE AI.pptxBusinessGPT  - SECURITY AND GOVERNANCE  FOR GENERATIVE AI.pptx
BusinessGPT - SECURITY AND GOVERNANCE FOR GENERATIVE AI.pptxAGATSoftware
 
Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...
Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...
Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...Splashtop Inc
 
Enterprise Content Managements Solutions
Enterprise Content Managements SolutionsEnterprise Content Managements Solutions
Enterprise Content Managements SolutionsIQBG inc
 
renewable energy renewable energy renewable energy renewable energy
renewable energy renewable energy renewable energy  renewable energyrenewable energy renewable energy renewable energy  renewable energy
renewable energy renewable energy renewable energy renewable energyjeyasrig
 
Leveling Up your Branding and Mastering MERN: Fullstack WebDev
Leveling Up your Branding and Mastering MERN: Fullstack WebDevLeveling Up your Branding and Mastering MERN: Fullstack WebDev
Leveling Up your Branding and Mastering MERN: Fullstack WebDevpmgdscunsri
 
VuNet software organisation powerpoint deck
VuNet software organisation powerpoint deckVuNet software organisation powerpoint deck
VuNet software organisation powerpoint deckNaval Singh
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
BATbern52 Swisscom's Journey into Data Mesh
BATbern52 Swisscom's Journey into Data MeshBATbern52 Swisscom's Journey into Data Mesh
BATbern52 Swisscom's Journey into Data MeshBATbern
 
MinionLabs_Mr. Gokul Srinivas_Young Entrepreneur
MinionLabs_Mr. Gokul Srinivas_Young EntrepreneurMinionLabs_Mr. Gokul Srinivas_Young Entrepreneur
MinionLabs_Mr. Gokul Srinivas_Young EntrepreneurPriyadarshini T
 
Einstein Copilot Conversational AI for your CRM.pdf
Einstein Copilot Conversational AI for your CRM.pdfEinstein Copilot Conversational AI for your CRM.pdf
Einstein Copilot Conversational AI for your CRM.pdfCloudMetic
 
Building Generative AI-infused apps: what's possible and how to start
Building Generative AI-infused apps: what's possible and how to startBuilding Generative AI-infused apps: what's possible and how to start
Building Generative AI-infused apps: what's possible and how to startMaxim Salnikov
 
Mobile App Development process | Expert Tips
Mobile App Development process | Expert TipsMobile App Development process | Expert Tips
Mobile App Development process | Expert Tipsmichealwillson701
 
Technical improvements. Reasons. Methods. Estimations. CJ
Technical improvements.  Reasons. Methods. Estimations. CJTechnical improvements.  Reasons. Methods. Estimations. CJ
Technical improvements. Reasons. Methods. Estimations. CJpolinaucc
 
Practical Advice for FDA’s 510(k) Requirements.pdf
Practical Advice for FDA’s 510(k) Requirements.pdfPractical Advice for FDA’s 510(k) Requirements.pdf
Practical Advice for FDA’s 510(k) Requirements.pdfICS
 
8 key point on optimizing web hosting services in your business.pdf
8 key point on optimizing web hosting services in your business.pdf8 key point on optimizing web hosting services in your business.pdf
8 key point on optimizing web hosting services in your business.pdfOffsiteNOC
 
Unlocking AI: Navigating Open Source vs. Commercial Frontiers
Unlocking AI:Navigating Open Source vs. Commercial FrontiersUnlocking AI:Navigating Open Source vs. Commercial Frontiers
Unlocking AI: Navigating Open Source vs. Commercial FrontiersRaphaël Semeteys
 
8 Steps to Build a LangChain RAG Chatbot.
8 Steps to Build a LangChain RAG Chatbot.8 Steps to Build a LangChain RAG Chatbot.
8 Steps to Build a LangChain RAG Chatbot.Ritesh Kanjee
 
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...jackiepotts6
 

Último (20)

Steps to Successfully Hire Ionic Developers
Steps to Successfully Hire Ionic DevelopersSteps to Successfully Hire Ionic Developers
Steps to Successfully Hire Ionic Developers
 
BusinessGPT - SECURITY AND GOVERNANCE FOR GENERATIVE AI.pptx
BusinessGPT  - SECURITY AND GOVERNANCE  FOR GENERATIVE AI.pptxBusinessGPT  - SECURITY AND GOVERNANCE  FOR GENERATIVE AI.pptx
BusinessGPT - SECURITY AND GOVERNANCE FOR GENERATIVE AI.pptx
 
Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...
Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...
Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...
 
Enterprise Content Managements Solutions
Enterprise Content Managements SolutionsEnterprise Content Managements Solutions
Enterprise Content Managements Solutions
 
renewable energy renewable energy renewable energy renewable energy
renewable energy renewable energy renewable energy  renewable energyrenewable energy renewable energy renewable energy  renewable energy
renewable energy renewable energy renewable energy renewable energy
 
Leveling Up your Branding and Mastering MERN: Fullstack WebDev
Leveling Up your Branding and Mastering MERN: Fullstack WebDevLeveling Up your Branding and Mastering MERN: Fullstack WebDev
Leveling Up your Branding and Mastering MERN: Fullstack WebDev
 
VuNet software organisation powerpoint deck
VuNet software organisation powerpoint deckVuNet software organisation powerpoint deck
VuNet software organisation powerpoint deck
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
20140812 - OBD2 Solution
20140812 - OBD2 Solution20140812 - OBD2 Solution
20140812 - OBD2 Solution
 
BATbern52 Swisscom's Journey into Data Mesh
BATbern52 Swisscom's Journey into Data MeshBATbern52 Swisscom's Journey into Data Mesh
BATbern52 Swisscom's Journey into Data Mesh
 
MinionLabs_Mr. Gokul Srinivas_Young Entrepreneur
MinionLabs_Mr. Gokul Srinivas_Young EntrepreneurMinionLabs_Mr. Gokul Srinivas_Young Entrepreneur
MinionLabs_Mr. Gokul Srinivas_Young Entrepreneur
 
Einstein Copilot Conversational AI for your CRM.pdf
Einstein Copilot Conversational AI for your CRM.pdfEinstein Copilot Conversational AI for your CRM.pdf
Einstein Copilot Conversational AI for your CRM.pdf
 
Building Generative AI-infused apps: what's possible and how to start
Building Generative AI-infused apps: what's possible and how to startBuilding Generative AI-infused apps: what's possible and how to start
Building Generative AI-infused apps: what's possible and how to start
 
Mobile App Development process | Expert Tips
Mobile App Development process | Expert TipsMobile App Development process | Expert Tips
Mobile App Development process | Expert Tips
 
Technical improvements. Reasons. Methods. Estimations. CJ
Technical improvements.  Reasons. Methods. Estimations. CJTechnical improvements.  Reasons. Methods. Estimations. CJ
Technical improvements. Reasons. Methods. Estimations. CJ
 
Practical Advice for FDA’s 510(k) Requirements.pdf
Practical Advice for FDA’s 510(k) Requirements.pdfPractical Advice for FDA’s 510(k) Requirements.pdf
Practical Advice for FDA’s 510(k) Requirements.pdf
 
8 key point on optimizing web hosting services in your business.pdf
8 key point on optimizing web hosting services in your business.pdf8 key point on optimizing web hosting services in your business.pdf
8 key point on optimizing web hosting services in your business.pdf
 
Unlocking AI: Navigating Open Source vs. Commercial Frontiers
Unlocking AI:Navigating Open Source vs. Commercial FrontiersUnlocking AI:Navigating Open Source vs. Commercial Frontiers
Unlocking AI: Navigating Open Source vs. Commercial Frontiers
 
8 Steps to Build a LangChain RAG Chatbot.
8 Steps to Build a LangChain RAG Chatbot.8 Steps to Build a LangChain RAG Chatbot.
8 Steps to Build a LangChain RAG Chatbot.
 
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
 

Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

  • 1. Discover HDP 2.2: Even Faster SQL Queries with Apache Hive & Stinger.next Page 1 © Hortonworks Inc. 2014 Hortonworks. We do Hadoop.
  • 2. Speakers Page 2 © Hortonworks Inc. 2014 Justin Sears Hortonworks Product Marketing Manager Alan Gates Hortonworks Co-Founder and Apache Hive Committer & PMC Member Raj Bains Hortonworks Senior Manger of Product Management for Apache Hive
  • 3. Agenda • Introduction to Stinger.next • New Innovation in Apache Hive 0.14 § SQL: Transactions with ACID semantics § Speed: Cost based optimizer for star and bushy joins § Scale: Dynamic query optimizations • The Road Ahead for Stinger.next • Q & A We’ll move quickly: • Attendee phone lines are muted • Text any questions to Raj Bains using Webex chat • Questions answered at the end • Unanswered questions and answers in upcoming blog post Page 3 © Hortonworks Inc. 2014
  • 4. Big Data, Hadoop & Data Center Re-platforming Business Drivers • From reactive analytics to proactive interactions • Insights that drive competitive advantage & optimal returns Page 4 © Hortonworks Inc. 2014 $ Financial Drivers • Cost of data systems, as % of IT spend, continues to grow • Cost advantages of commodity hardware & open source software Technical Drivers • Data is growing exponentially & existing systems overwhelmed • Predominantly driven by NEW types of data that can inform analytics There is an inequitable balance between vendor and customer in the market
  • 5. Clickstream Capture and analyze website visitors’ data trails and optimize your website Page 5 © Hortonworks Inc. 2014 Sensors Discover patterns in data streaming automatically from remote sensors and machines Server Logs Research logs to diagnose process failures and prevent security breaches Hadoop Value: New Types of Data Sentiment Understand how your customers feel about your brand and products – right now Geographic Analyze location-based data to manage operations where they occur Unstructured Understand patterns in files across millions of web pages, emails, and documents
  • 6. A Shift from Reactive to Proactive Interactions A shift in Advertising From mass branding …to 1x1 Targeting A shift in Financial Services From Educated Investing …to Automated Algorithms A shift in Healthcare From mass treatment …to Designer Medicine A shift in Retail A shift in Telco Page 6 © Hortonworks Inc. 2014 HDP and Hadoop allow organizations to use data to shift interactions from… Reactive Post Transaction Proactive Pre Decision …to Real-t From static branding ime Personalization From break then fix …to repair before break
  • 7. Enterprise Goals for the Modern Data Architecture Batch Interactive Real-Time Page 7 © Hortonworks Inc. 2014 • Consolidate siloed data sets structured and unstructured • Central data set on a single cluster • Multiple workloads across batch interactive and real time • Central services for security, governance and operation • Preserve existing investment in current tools and platforms • Single view of the customer, product, supply chain DATA SYSTEM APPLICATIONS Business Analytics Custom Applications Packaged Applications RDBMS EDW MPP YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N CRM ERP Other 1 ° ° ° ° ° ° HDFS (Hadoop Distributed File System) SOURCES EXISTING Systems Clickstream Web &Social Geoloca9on Sensor & Machine Server Logs Unstructured
  • 8. YARN Transformed Hadoop & Opened a New Era Script Pig BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SQL Hive TezTez Page 8 © Hortonworks Inc. 2014 YARN The Architectural Center of Hadoop • Common data platform, many applications • Support multi-tenant access & processing • Batch, interactive & real-time use cases Java Scala Cascading Tez Stream Storm YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Others ISV Engines ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) Search Solr NoSQL HBase Accumulo Sli der Slider In-Memory Spark
  • 9. YARN Extends Hadoop to Other Data Center Leaders Script Pig BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SQL Hive TezTez Java Scala Cascading Tez NoSQL HBase Accumulo Sli der 1 ° ° ° ° ° ° ° Stream Storm Slider HDFS In-Memory Spark (Hadoop Distributed File System) ° ° ° ° ° ° ° ° Page 9 © Hortonworks Inc. 2014 YARN The Architectural Center of Hadoop • Common data platform, many applications • Support multi-tenant access & processing • Batch, interactive & real-time use cases • Supports 3rd-party ISV tools (ex. SAS, Syncsort, Actian, etc.) YARN: Data Operating System (Cluster Resource Management) ° ° ° ° Others ISV Engines Search Solr ° ° ° ° ° ° ° ° ° ° YARN Ready Applications Facilitates ongoing innovation and enterprise adoption via ecosystem of new and existing “YARN Ready” solutions
  • 10. Enterprise Hadoop: Central Set of Services BATCH, INTERACTIVE & REAL-TIME DATA ACCESS GOVERNANCE SECURITY OPERATIONS Tez TezTez Page 10 © Hortonworks Inc. 2014 Slider Slider YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Enables Apache Hadoop to be an Enterprise Data Platform with centralized services for: • Governance • Operations • Security Everything that plugs into Hadoop inherits these services Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Load data and manage according to policy Deploy and effectively manage the platform Provide layered approach to security through Authentication, Authorization, Accounting, and Data Protection Script Pig SQL Hive Java Scala Cascading Stream Storm Search Solr NoSQL HBase Accumulo In-Memory Spark Others ISV Engines HDFS (Hadoop Distributed File System)
  • 11. Hortonworks Development Investment for the Enterprise Vertical Integration with YARN and HDFS BATCH, INTERACTIVE & REAL-TIME DATA ACCESS GOVERNANCE SECURITY OPERATIONS Tez TezTez Slider 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Page 11 © Hortonworks Inc. 2014 Slider ° ° ° ° ° ° ° ° ° ° ° ° ° ° Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Load data and manage according to policy Deploy and effectively manage the platform Provide layered approach to security through Authentication, Authorization, Accounting, and Data Protection Script Pig SQL Hive Java Scala Cascading Stream Storm Search Solr NoSQL HBase Accumulo In-Memory Spark Others ISV Engines YARN: Data Operating System (Cluster Resource Management) HDFS (Hadoop Distributed File System) • Ensure engines can run reliably and respectfully in a YARN based cluster • Implement features throughout the stack to accommodate
  • 12. Hortonworks Development Investment for the Enterprise Horizontal Integration for Enterprise Services BATCH, INTERACTIVE & REAL-TIME DATA ACCESS GOVERNANCE SECURITY OPERATIONS Tez TezTez Slider 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Page 12 © Hortonworks Inc. 2014 Slider ° ° ° ° ° ° ° ° ° ° ° ° ° ° Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Load data and manage according to policy Deploy and effectively manage the platform Provide layered approach to security through Authentication, Authorization, Accounting, and Data Protection Script Pig SQL Hive Java Scala Cascading Stream Storm Search Solr NoSQL HBase Accumulo In-Memory Spark Others ISV Engines YARN: Data Operating System (Cluster Resource Management) HDFS (Hadoop Distributed File System) • Ensure consistent enterprise services are applied across the entire Hadoop stack • Integrate with and extend existing data center solutions for these key requirements
  • 13. HDP Delivers Enterprise Hadoop Hortonworks Data Platform 2.2 GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SECURITY OPERATIONS Script Pig SQL Hive TezTez Page 13 © Hortonworks Inc. 2014 Java Scala Cascading Tez Stream Storm YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) Search Solr NoSQL HBase Accumulo Sli der Slider In-Memory Spark Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume Kafka NFS WebHDFS Authentication Authorization Audit Data Protection Storage: HDFS Resources: YARN Access: Hive Pipeline: Falcon Cluster: Ranger Cluster: Knox Linux Windows Deployment Choice Cloud YARN is the architectural center of HDP • Common data set across all applications • Batch, interactive & real-time workloads • Multi-tenant access & processing Provides comprehensive enterprise capabilities • Governance • Security • Operations Enables broad ecosystem adoption • ISVs can plug directly into Hadoop The widest range of deployment options • Linux & Windows • On premises & cloud Others ISV Engines On-Premises
  • 14. HDP Delivers Enterprise Hadoop Hortonworks Data Platform 2.2 GOVERNANCE OPERATIONS Script Pig Tez BATCH, INTERACTIVE & REAL-TIME DATA ACCESS YARN: Data Operating System (Cluster Resource Management) SQL Hive Tez 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Page 14 © Hortonworks Inc. 2014 Java Scala Cascading Tez Stream Storm ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) Search Solr NoSQL HBase Accumulo Sli der Slider In-Memory Spark Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume Kafka NFS WebHDFS YARN is the architectural center of HDP • Common data set across all applications • Batch, interactive & real-time workloads • Multi-tenant access & processing Provides comprehensive enterprise capabilities • Governance • Security • Operations Enables broad ecosystem adoption • ISVs can plug directly into Hadoop The widest range of deployment options • Linux & Windows • On premises & cloud Others ISV Engines SECURITY Authentication Authorization Audit Data Protection Storage: HDFS Resources: YARN Access: Hive Pipeline: Falcon Cluster: Ranger Cluster: Knox Linux Windows Deployment Choice On-Premises Cloud
  • 15. Introduction to Stinger.next Page 15 © Hortonworks Inc. 2014
  • 16. Stinger.next – Enterprise SQL at Hadoop Scale Stinger (Hive 0.13, Tez, ORC File) Scale to Petabytes Batch to Interactive Queries Read-Only Data Substantial SQL Support Single Tool for Multiple SQL workloads – Interactive, Reporting and ETL MapReduce, Tez Engines Page 16 © Hortonworks Inc. 2014 Stinger.next Scale to Petabytes Sub-Second Queries Modify Data with Transactions Comprehensive SQL:2011 Analytics Single Tool for Multiple SQL workloads – Interactive, Reporting, ETL, ML MapReduce, Tez, Spark Engines
  • 17. SQL in Hive 0.14: Transactions with ACID Semantics Page 17 © Hortonworks Inc. 2014
  • 18. Transaction Use Cases Reporting with Analytics (YES) • Reporting on data with occasional updates • Corrections to the fact tables, evolving dimension tables • Low concurrency updates, low TPS Operational Reporting (YES, next) • High throughput ingest from operational (OLTP) database • Periodic inserts every 5-30 minutes • Requires tool support and changes in our Transactions Operational (OLTP) Database (NO) • Small Transactions, each doing single line inserts • High Concurrency - Hundreds to thousands of connections Page 18 © Hortonworks Inc. 2014 Analytics Modifications Hive OLTP Replication Hive Hive High Concurrency OLTP
  • 19. Deep Dive: Transactions Transaction Support in Hive with ACID semantics • Hive native support for INSERT, UPDATE, DELETE. • Split Into Phases: • Phase 1: Hive Streaming Ingest (append) • Phase 2: INSERT / UPDATE / DELETE Support • Phase 3: BEGIN / COMMIT / ROLLBACK Txn [Done] [HDP 2.2] [Next] Page 19 © Hortonworks Inc. 2014 Read- Optimized ORCFile Hive ACID Compactor periodically merges the delta files in the background. Delta File Merged Read- Optimized ORCFile 1. Original File Task reads the latest ORCFile Task Read- Optimized ORCFile Task Task 2. Edits Made Task reads the ORCFile and merges the delta file with the edits 3. Edits Merged Task reads the updated ORCFile
  • 20. Speed in Hive 0.14: Cost Based Optimizer Page 20 © Hortonworks Inc. 2014
  • 21. TPC-DS Query 17 Page 21 © Hortonworks Inc. 2014 SELECT i_item_id, i_item_desc, s_state, Count(ss_quantity) AS store_sales_quantitycount, Avg(ss_quantity) AS store_sales_quantityave, Stddev_samp(ss_quantity) AS store_sales_quantitystdev, Stddev_samp(ss_quantity) / Avg(ss_quantity) AS store_sales_quantitycov, Count(sr_return_quantity) as_store_returns_quantitycount, Avg(sr_return_quantity) as_store_returns_quantityave, Stddev_samp(sr_return_quantity) as_store_returns_quantitystdev, Stddev_samp(sr_return_quantity) / Avg(sr_return_quantity) AS store_returns_quantitycov, Count(cs_quantity) AS catalog_sales_quantitycount, Avg(cs_quantity) AS catalog_sales_quantityave, Stddev_samp(cs_quantity) / Avg(cs_quantity) AS catalog_sales_quantitystdev, Stddev_samp(cs_quantity) / Avg(cs_quantity) AS catalog_sales_quantitycov FROM store_sales, store_returns, catalog_sales, date_dim d1, date_dim d2, date_dim d3, store, item WHERE d1.d_quarter_name = '2000Q1' AND d1.d_date_sk = store_sales.ss_sold_date_sk AND ss_sold_date BETWEEN '2000-01-01' AND '2000-03-31' AND item.i_item_sk = store_sales.ss_item_sk AND store.s_store_sk = store_sales.ss_store_sk AND store_sales.ss_customer_sk = store_returns.sr_customer_sk AND store_sales.ss_item_sk = store_returns.sr_item_sk AND store_sales.ss_ticket_number = store_returns.sr_ticket_number AND store_returns.sr_returned_date_sk = d2.d_date_sk AND d2.d_quarter_name IN ( '2000Q1', '2000Q2', '2000Q3' ) AND sr_returned_date BETWEEN '2000-01-01' AND '2000-09-01' AND store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk AND store_returns.sr_item_sk = catalog_sales.cs_item_sk AND catalog_sales.cs_sold_date_sk = d3.d_date_sk AND d3.d_quarter_name IN ( '2000Q1', '2000Q2', '2000Q3' ) AND cs_sold_date BETWEEN '2000-01-01' AND '2000-09-31' GROUP BY i_item_id, i_item_desc, s_state ORDER BY i_item_id, i_item_desc, s_state LIMIT 100;
  • 22. CBO on Selected Queries – 17 Filter: date Filter: date Filter: date store_sales store_returns catalog_sales date_dim d1 date_dim d2 date_dim d3 Filter: quarter Filter: quarter Filter: quarter items store Page 22 © Hortonworks Inc. 2014 customer_sk ticket_number customer_sk Item_sk date_sk date_sk date_sk item_sk store_sk
  • 23. OLD: Left Deep Plan Map 12 Table_scan Store_returns Reducer 10 Merge join 12, 9 Reducer 3 • Merge join 2 & 10 • Map join 1 • Map join 6 • Map Join 7 • Map Join 8 store • Map Join 11 item • Filter • Group By • Reduce Page 23 © Hortonworks Inc. 2014 Map 2 Table_scan catalog_sales Map 6 Table_scan d2, filter Map 7 Table_scan d3, filter Reducer 4 Group_By Reduce Map 9 Table_scan store_sales Map 1 Table_scan d1, filter Reducer 5 Limit B B B Map 8 Table_scan store B Map 11 Table_scan item Large Fact tables joined together without filters B
  • 24. NEW: Complex Bushy Plan Page 24 © Hortonworks Inc. 2014 Reducer 4 Merge join 3 & 8 Map join store Map join item Reduce Map 10 table_scan store Map 12 Table_scan item Map 3 Store_sales Map join Map 8 Store_returns Map join Reducer 5 Merge_Join Group_By Reduce Map 9 Table_scan d1, filter Map 11 catalog_sales, Map Join Map 1 Table_scan d1, filter Map 2 Table_scan d1, filter Reducer 6 Group by Reduce Reducer7 Limit B B B B B All 3 Large Fact tables joined with date dimension limiting data to few quarters
  • 25. Performance Improvement – Query 17 Scale = 30TB Input records ~186mil Page 25 © Hortonworks Inc. 2014 CBO Elapsed Time (sec) Elapsed Time Intermediate data (GB) Output and Intermediate Records OFF 10,683 ~3 hrs 5,017 135,647,792,123 ON 1,284 ~20 mins 275 8,543,232,360
  • 26. Scale in Hive 0.14: Dynamic Query Optimization Page 26 © Hortonworks Inc. 2014
  • 27. Auto Reducer Parallelism Use dynamic data volume during execution rather than estimates from query compilation to determine the number of reducers Leads to faster query execution, better resource utilizations Page 27 © Hortonworks Inc. 2014 Vertex Manager Vertex State Machine App Master Time 1. Data size statistics Tasks for a single map vertex Tasks for a single reduce vertex 2. Set parallelism 3. Re-route 4. Cancel task Vertex Manager Vertex State Machine App Master 5. Tasks Completed Tasks for a single map vertex Tasks for a single reduce vertex 6. Start Tasks 7. Start
  • 28. Auto Reducer Parallelism use tpcds_bin_partitioned_orc_30000; set hive.tez.auto.reducer.parallelism=true; set hive.tez.min.partition.factor=0.125; SELECT ss_promo_sk, Sum(ss_sales_price), Count(*) FROM store_sales WHERE ss_sold_date < '1998-03-01' GROUP BY ss_promo_sk ORDER BY 2 DESC LIMIT 10; Page 28 © Hortonworks Inc. 2014
  • 29. Dynamic Partition Pruning Table Definition create table store_sales (...) partitioned by (ss_sold_date_sk int) stored as orc; Example Join of • a large Fact table with multiple partitions • with a dimension table that has a filter Page 29 © Hortonworks Inc. 2014 store_sales d1 d2 d3 d4 … ss_sold_date_sk = date_sk Filter date_dim d1 The ss_sold_date_sk partitions that can be pruned away at join time is not known till the filter is applied at runtime Compile Time Design • Insert synthetic conditions for each join representing "x in (keys of other side in join)”. Optimizer will push it as far down as possible • If the condition hits a table scan and the column involved is a partition column: • Setup Operator to send key events to AM • else: • Remove synthetic predicate 1. Send events for partition pruning Vertex Manager Vertex State Machine App Master Tasks for a single map vertex Tasks for a single map vertex
  • 30. Dynamic Pruning TPC-DS Query 3 SELECT dt.d_year, item.i_brand_id brand_id, item.i_brand brand, Sum(ss_ext_sales_price) sum_agg FROM date_dim dt, store_sales, item WHERE dt.d_date_sk = store_sales.ss_sold_date_sk AND store_sales.ss_item_sk = item.i_item_sk AND item.i_manufact_id = 436 AND dt.d_moy = 12 GROUP BY dt.d_year, item.i_brand, item.i_brand_id ORDER BY dt.d_year, sum_agg DESC, brand_id LIMIT 100; Page 30 © Hortonworks Inc. 2014
  • 31. Stinger.next: The Road Ahead Page 31 © Hortonworks Inc. 2014
  • 32. Stinger.next - Delivery Themes Beyond Read-­‐Only 2nd Half 2014 • Transac(ons with ACID allowing insert, update and delete • Temporary Tables • Cost Based Op(mizer op(mizes star and bushy join queries Page 32 © Hortonworks Inc. 2014 Sub-­‐Second 1st Half 2015 • Sub-­‐Second queries with LLAP • Hive-­‐Spark Machine Learning integra(on • Opera(onal repor(ng with Hive Streaming Ingest and Transac(ons • Replica(on and SQL/CBO improvements Richer Analy9cs 2nd Half 2015 • Toward SQL:2011 Analy(cs • Materialized Views • Cross-­‐Geo Queries • Workload Management via YARN and LLAP integra(on
  • 33. Q & A Page 33 © Hortonworks Inc. 2014
  • 34. Thank you! Learn more at: hortonworks.com/hadoop/hive/ Page 34 © Hortonworks Inc. 2014 Register for the remaining 6 Discover HDP 2.2 Webinars Hortonworks.com/webinars