SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Page 1 © Hortonworks Inc. 2014
SQL on HBase with Phoenix
Page 2 © Hortonworks Inc. 2014
Agenda
What Is Apache HBase
•  High Level Overview.
•  Technical Detail.
What Is Apache Phoenix
•  Overview.
•  What’s New.
•  Secondary Index Demo.
Page 3 © Hortonworks Inc. 2014
New Data Requires a New Data Architecture
Source: IDC
2.8	
  ZB	
  in	
  2012	
  
85%	
  from	
  New	
  Data	
  Types	
  
15x	
  Machine	
  Data	
  by	
  2020	
  
40	
  ZB	
  by	
  2020	
  
OLTP,	
  ERP,	
  CRM	
  Systems	
  
Unstructured	
  documents,	
  emails	
  
Clickstream	
  
Server	
  logs	
  
Sen>ment,	
  Web	
  Data	
  
Sensor,	
  Machine	
  Data	
  
Geoloca>on	
  
Modern	
  Database	
  Needs	
  
More	
  Scalable	
  
Handle	
  New	
  Data	
  Types	
  
Intelligent	
  and	
  Predic>ve	
  
Page 4 © Hortonworks Inc. 2014
What Is Apache HBase?
100%	
  Open	
  Source	
  
Store	
  and	
  Process	
  Petabytes	
  of	
  Data	
  
Flexible	
  Schema	
  
Scale	
  out	
  on	
  Commodity	
  Servers	
  
High	
  Performance,	
  High	
  Availability	
  
Integrated	
  with	
  YARN	
  
SQL	
  and	
  NoSQL	
  Interfaces	
  
YARN	
  :	
  Data	
  OperaGng	
  System	
  
HBase	
  
	
  
RegionServer	
  
1	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   N	
  
HDFS	
  
(Permanent	
  Data	
  Storage)	
  
HBase	
  
	
  
RegionServer	
  
HBase	
  
	
  
RegionServer	
  
Dynamic Schema
Scales Horizontally to PB of Data
Directly Integrated with Hadoop
Page 5 © Hortonworks Inc. 2014
Kinds of Apps Built with HBase
Interested? See HBase Case Studies later in this document.
Write Heavy Low-Latency
Search /
Indexing
Messaging
Audit /
Log Archive AdvertisingData Cubes
Time Series
Sensor /
Device
Page 6 © Hortonworks Inc. 2014
HBase is Deeply Integrated with Hadoop
•  Data	
  is	
  stored	
  in	
  HDFS.	
  You	
  can	
  
store	
  more	
  data	
  and	
  re-­‐use	
  exis>ng	
  
HDFS	
  exper>se.	
  
•  HBase	
  is	
  integrated	
  with	
  YARN.	
  
•  Analy>cs	
  in-­‐place	
  using	
  Hive,	
  Pig,	
  
Spark	
  and	
  more.	
  
Page 7 © Hortonworks Inc. 2014
Who’s Using HBase?
Page 8 © Hortonworks Inc. 2014
HBase Technical Details
Spring 2014
Version 1.0
Page 9 © Hortonworks Inc. 2014
HBase Technical Details
Based on Google BigTable
•  Dynamic schema.
•  Good for very sparse datasets.
•  All data is range-partitioned for trivial horizontal scaling across commodity hardware.
Directly integrated with HDFS and Hadoop
•  Analyze data in HBase with any Hadoop ecosystem tools (Hive, Pig, MapReduce, Tez, etc.)
•  Re-use existing Hadoop skills to run HBase.
Page 10 © Hortonworks Inc. 2014
Page 11 © Hortonworks Inc. 2014
Logical Architecture
Distributed, persistent partitions of a BigTable
a
b
d
c
e
f
h
g
i
j
l
k
m
n
p
o
Table A
Region 1
Region 2
Region 3
Region 4
Region Server 7
Table A, Region 1
Table A, Region 2
Table G, Region 1070
Table L, Region 25
Region Server 86
Table A, Region 3
Table C, Region 30
Table F, Region 160
Table F, Region 776
Region Server 367
Table A, Region 4
Table C, Region 17
Table E, Region 52
Table P, Region 1116
Legend:
- A single table is partitioned into Regions of roughly equal size.
- Regions are assigned to Region Servers across the cluster.
- Region Servers host roughly the same number of regions.
Page 12 © Hortonworks Inc. 2014
Logical Data Model
A sparse, multi-dimensional, sorted map
Legend:
- Rows are sorted by rowkey.
- Within a row, values are located by column family and qualifier.
- Values also carry a timestamp; there can me multiple versions of a value.
- Within a column family, data is schemaless. Qualifiers and values are treated as arbitrary bytes.
1368387247 [3.6 kb png data]"thumb"cf2b
a
cf1
1368394583 7
1368394261 "hello"
"bar"
1368394583 22
1368394925 13.6
1368393847 "world"
"foo"
cf2
1368387684 "almost the loneliest number"1.0001
1368396302 "fourth of July""2011-07-04"
Table A
rowkey
column
family
column
qualifier
timestamp value
Page 13 © Hortonworks Inc. 2014
HBase HA Overview (Introduced in HDP 2.1)
HMaster	
  
Zookeeper	
  
Client	
   Client	
   Client	
   Client	
  
HBase	
  RegionServer	
  
Region:	
  
100-­‐199	
  
(Standby)	
  
Region:	
  
200-­‐299	
  
(Standby)	
  
Region:	
  
0-­‐99	
  
(Primary)	
  
HBase	
  RegionServer	
  
Region:	
  
100-­‐199	
  
(Primary)	
  
Region:	
  
0-­‐99	
  
(Standby)	
  
Region:	
  
200-­‐299	
  
(Primary)	
  
HFile	
   HFile	
   HFile	
   HFile	
   HFile	
   HFile	
  
HDFS	
  
HBase	
  HA:	
  
Real-­‐Time	
  
Replica>on	
  
Low-­‐Latency	
  
Reads	
  and	
  Writes	
  
In-­‐Memory	
  Cache	
   In-­‐Memory	
  Cache	
  
Hive,	
  Pig,	
  MapReduce	
   Hive,	
  Pig,	
  MapReduce	
  
Data	
  Stored	
  
to	
  HDFS	
  
Read	
  or	
  Write	
  Directly	
  
from	
  Hadoop	
  Tools	
  
Cluster	
  Topology,	
  
Data	
  Placement	
  
Page 14 © Hortonworks Inc. 2014
Apache Phoenix
Spring 2014
Version 1.0
The SQL Skin for HBase
Page 15 © Hortonworks Inc. 2014
Apache Phoenix
A SQL Skin for HBase
•  Provides a SQL interface for managing data in HBase.
•  Large subset of SQL:1999 mandatory featureset.
•  Create tables, insert and update data and perform low-latency point lookups through JDBC.
•  Phoenix JDBC driver easily embeddable in any app that supports JDBC.
Phoenix Makes HBase Better
•  Oriented toward online / semi-transactional apps.
•  If HBase is a good fit for your app, Phoenix makes it even better.
•  Phoenix gets you out of the “one table per query” model many other NoSQL stores force you into.
Page 16 © Hortonworks Inc. 2014
Apache Phoenix: Current Capabilities
Feature Supported?
Common SQL Datatypes Yes
Inserts and Updates Yes
SELECT, DISTINCT, GROUP BY, HAVING Yes
NOT NULL and Primary Key constrants Yes
Inner and Outer JOINs Yes
Views Yes
Subqueries HDP 2.2
Robust Secondary Indexes HDP 2.2
Page 17 © Hortonworks Inc. 2014
Apache Phoenix: Future Capabilities
Feature Supported?
Multi-Table Transactions Future
Scalable Joins (Fact-to-Fact) Future
Analytics, Windowing Functions Future
Page 18 © Hortonworks Inc. 2014
Phoenix Provides Familiar SQL Constructs
Compare: Phoenix versus Native API
Code Notes
//	
  HBase	
  Native	
  API.	
  
HBaseAdmin	
  hbase	
  =	
  new	
  HBaseAdmin(conf);	
  
HTableDescriptor	
  desc	
  =	
  new	
  HTableDescriptor("us_population");	
  
HColumnDescriptor	
  state	
  =	
  new	
  HColumnDescriptor("state".getBytes());	
  
HColumnDescriptor	
  city	
  =	
  new	
  HColumnDescriptor("city".getBytes());	
  
HColumnDescriptor	
  population	
  =	
  new	
  HColumnDescriptor("population".getBytes());	
  
desc.addFamily(state);	
  
desc.addFamily(city);	
  
desc.addFamily(population);	
  
hbase.createTable(desc);	
  
	
  
//	
  Phoenix	
  DDL.	
  
CREATE	
  TABLE	
  us_population	
  (	
  
	
  	
  	
  	
  	
  	
  	
  	
  state	
  CHAR(2)	
  NOT	
  NULL,	
  
	
  	
  	
  	
  	
  	
  	
  	
  city	
  VARCHAR	
  NOT	
  NULL,	
  
	
  	
  	
  	
  	
  	
  	
  	
  population	
  BIGINT	
  
CONSTRAINT	
  my_pk	
  PRIMARY	
  KEY	
  (state,	
  city));	
  
•  Familiar SQL syntax.
•  Provides additional constraint
checking.
Page 19 © Hortonworks Inc. 2014
Phoenix: Architecture
HBase Cluster
Phoenix	
  
Coprocessor	
  
Phoenix	
  
Coprocessor	
  
Phoenix	
  
Coprocessor	
  
Java	
  
Applica>on	
  
Phoenix	
  JDBC	
  
Driver	
  
User Application
Page 20 © Hortonworks Inc. 2014
Phoenix Performance
Phoenix Performance Characterization:
•  Suitable for 10s of thousands of point-lookups per second.
•  Suitable for thousands of aggregations / filtered searches per second.
•  Supports extremely high concurrency.
Phoenix Performance Optimizations
•  Column skipping.
•  Table salting.
•  Skip scans.
Performance characteristics:
•  Index point lookups in milliseconds.
•  Aggregation and Top-N queries in a few seconds over large datasets.
Page 21 © Hortonworks Inc. 2014
Phoenix Use Cases
Phoenix is for:
•  Rapidly and easily building an application backed by HBase.
•  Making use of your existing SQL skills and investment.
•  High performing aggregations of moderately-sized datasets inside HBase.
Phoenix is not for:
•  Sophisticated SQL queries involving large joins or advanced SQL features.
•  Queries requiring large scans that do not use indexes.
•  ETL.
Page 22 © Hortonworks Inc. 2014
Phoenix: Futures
Short-term focus:
•  Transactions.
•  Scalable joins.
•  Analytical capabilities.
Long-term focus: Primary interface for HBase.
•  Build HBase applications using Phoenix.
•  Configure cluster security and replication using Phoenix.
•  Integration with BI tools like Microstrategy.
Page 23 © Hortonworks Inc. 2014
What’s New in Apache Phoenix
Page 24 © Hortonworks Inc. 2014
What’s New in Apache Phoenix
Phoenix in HDP 2.2
•  Based on Apache Phoenix 4.2.
•  8 new features, 143 total improvements and fixes.
Notable new features.
•  Robust secondary indexes.
•  Sub-joins.
•  Basic window functions.
•  Bulk loader improvements.
Page 25 © Hortonworks Inc. 2014
Robust Secondary Index
Background / Refresher
•  Phoenix supports local and global secondary indexes.
•  Updating a global index may require coordination with another RegionServer.
•  See Phoenix docs if you need info on which to use when.
Before Phoenix 4.1 (HDP 2.1):
•  Using global indexes, if the RegionServer serving the index key was down, regionservers would abort.
•  Note: Does not affect local indexes.
Phoenix 4.1+:
•  If the global index cannot be updated:
•  The index is temporarily disabled.
•  Background job is launched to rebuild the index.
•  Reads will go directly to base tables rather than accessing the index.
•  Writes will continue to update the index.
•  Controlled by: phoenix.index.failure.handling.rebuild
Page 26 © Hortonworks Inc. 2014
Improved SQL: Sub Joins
Example:
select	
  *	
  from	
  A	
  
	
  left	
  join	
  (B	
  join	
  C	
  on	
  B.bc_id	
  =	
  C.bc_id)	
  
	
  on	
  A.ab_id	
  =	
  B.ab_id	
  and	
  A.ac_id	
  =	
  C.ac_id;
Caveats related to joins still apply:
•  Still broadcast joins only.
Page 27 © Hortonworks Inc. 2014
Phoenix: Basic Window Functions
FIRST_VALUE, LAST_VALUE, NTH_VALUE
•  No OVER or PARTITION BY.
•  Function applied to each group based on GROUP BY.
Example:
SELECT	
  
	
  FIRST_VALUE(“column1”)	
  
	
  WITHIN	
  GROUP	
  
	
   	
  (ORDER	
  BY	
  column2	
  ASC)	
  
	
  FROM	
  
	
   	
  table	
  
	
  GROUP	
  BY	
  
	
   	
  column3;	
  
Page 28 © Hortonworks Inc. 2014
ENCODE, DECODE
DECODE
•  Supports hexadecimal format.
DECODE('000000008512af277ffffff8',	
  'hex')	
  
	
  
ENCODE
•  Supports hexadecimal and Base62
ENCODE(1,	
  'base62')	
  
	
  
What is base 62???
•  Used to encode data using only letters and numbers.	
  
•  Commonly used for things like URL shorteners.
Page 29 © Hortonworks Inc. 2014
Demo
Phoenix Secondary Indexes
Page 30 © Hortonworks Inc. 2014
Secondary Index Recap
Index Management via JDBC:
•  CREATE INDEX my_index ON my_table (v1);
•  DROP INDEX my_index ON my_table;
•  ALTER INDEX my_index ON my_table DISABLE / REBUILD;
Index population during bulk import:
•  Uses the CsvBulkLoadTool utility (not psql.py).
•  Adds the --index-table argument to specify your target index.
HADOOP_CLASSPATH=/path/to/hbase-­‐protocol.jar:/path/to/hbase/conf	
  
hadoop	
  jar	
  phoenix-­‐4.0.0.jar	
  	
  
	
  	
  	
  	
  org.apache.phoenix.mapreduce.CsvBulkLoadTool	
  	
  
	
  	
  	
  	
  -­‐-­‐table	
  EXAMPLE	
  -­‐-­‐input	
  /data/example.csv	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clustermas4share
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveDataWorks Summit
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingApache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingDataWorks Summit/Hadoop Summit
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshotsenissoz
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseDataWorks Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureDataWorks Summit
 
How to manage Hortonworks HDB Resources with YARN
How to manage Hortonworks HDB Resources with YARNHow to manage Hortonworks HDB Resources with YARN
How to manage Hortonworks HDB Resources with YARNHortonworks
 

Was ist angesagt? (20)

Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
Apache Hive on ACID
Apache Hive on ACIDApache Hive on ACID
Apache Hive on ACID
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
April 2014 HUG : Apache Phoenix
April 2014 HUG : Apache PhoenixApril 2014 HUG : Apache Phoenix
April 2014 HUG : Apache Phoenix
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and HadoopEmpower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingApache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshots
 
Spark + HBase
Spark + HBase Spark + HBase
Spark + HBase
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBase
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
How to manage Hortonworks HDB Resources with YARN
How to manage Hortonworks HDB Resources with YARNHow to manage Hortonworks HDB Resources with YARN
How to manage Hortonworks HDB Resources with YARN
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 

Ähnlich wie SQL on HBase Made Easy with Phoenix

Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Hortonworks
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0DataWorks Summit
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrowSteve Loughran
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData DayJohn Park
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopHortonworks
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Architectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopArchitectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopSpagoWorld
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopPOSSCON
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Rommel Garcia
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in HadoopRommel Garcia
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchHortonworks
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelt3rmin4t0r
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBaseHortonworks
 

Ähnlich wie SQL on HBase Made Easy with Phoenix (20)

Hbase mhug 2015
Hbase mhug 2015Hbase mhug 2015
Hbase mhug 2015
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData Day
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Architectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopArchitectural Evolution Starting from Hadoop
Architectural Evolution Starting from Hadoop
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in Hadoop
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 

Mehr von Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

Mehr von Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Kürzlich hochgeladen

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Kürzlich hochgeladen (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

SQL on HBase Made Easy with Phoenix

  • 1. Page 1 © Hortonworks Inc. 2014 SQL on HBase with Phoenix
  • 2. Page 2 © Hortonworks Inc. 2014 Agenda What Is Apache HBase •  High Level Overview. •  Technical Detail. What Is Apache Phoenix •  Overview. •  What’s New. •  Secondary Index Demo.
  • 3. Page 3 © Hortonworks Inc. 2014 New Data Requires a New Data Architecture Source: IDC 2.8  ZB  in  2012   85%  from  New  Data  Types   15x  Machine  Data  by  2020   40  ZB  by  2020   OLTP,  ERP,  CRM  Systems   Unstructured  documents,  emails   Clickstream   Server  logs   Sen>ment,  Web  Data   Sensor,  Machine  Data   Geoloca>on   Modern  Database  Needs   More  Scalable   Handle  New  Data  Types   Intelligent  and  Predic>ve  
  • 4. Page 4 © Hortonworks Inc. 2014 What Is Apache HBase? 100%  Open  Source   Store  and  Process  Petabytes  of  Data   Flexible  Schema   Scale  out  on  Commodity  Servers   High  Performance,  High  Availability   Integrated  with  YARN   SQL  and  NoSQL  Interfaces   YARN  :  Data  OperaGng  System   HBase     RegionServer   1   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   N   HDFS   (Permanent  Data  Storage)   HBase     RegionServer   HBase     RegionServer   Dynamic Schema Scales Horizontally to PB of Data Directly Integrated with Hadoop
  • 5. Page 5 © Hortonworks Inc. 2014 Kinds of Apps Built with HBase Interested? See HBase Case Studies later in this document. Write Heavy Low-Latency Search / Indexing Messaging Audit / Log Archive AdvertisingData Cubes Time Series Sensor / Device
  • 6. Page 6 © Hortonworks Inc. 2014 HBase is Deeply Integrated with Hadoop •  Data  is  stored  in  HDFS.  You  can   store  more  data  and  re-­‐use  exis>ng   HDFS  exper>se.   •  HBase  is  integrated  with  YARN.   •  Analy>cs  in-­‐place  using  Hive,  Pig,   Spark  and  more.  
  • 7. Page 7 © Hortonworks Inc. 2014 Who’s Using HBase?
  • 8. Page 8 © Hortonworks Inc. 2014 HBase Technical Details Spring 2014 Version 1.0
  • 9. Page 9 © Hortonworks Inc. 2014 HBase Technical Details Based on Google BigTable •  Dynamic schema. •  Good for very sparse datasets. •  All data is range-partitioned for trivial horizontal scaling across commodity hardware. Directly integrated with HDFS and Hadoop •  Analyze data in HBase with any Hadoop ecosystem tools (Hive, Pig, MapReduce, Tez, etc.) •  Re-use existing Hadoop skills to run HBase.
  • 10. Page 10 © Hortonworks Inc. 2014
  • 11. Page 11 © Hortonworks Inc. 2014 Logical Architecture Distributed, persistent partitions of a BigTable a b d c e f h g i j l k m n p o Table A Region 1 Region 2 Region 3 Region 4 Region Server 7 Table A, Region 1 Table A, Region 2 Table G, Region 1070 Table L, Region 25 Region Server 86 Table A, Region 3 Table C, Region 30 Table F, Region 160 Table F, Region 776 Region Server 367 Table A, Region 4 Table C, Region 17 Table E, Region 52 Table P, Region 1116 Legend: - A single table is partitioned into Regions of roughly equal size. - Regions are assigned to Region Servers across the cluster. - Region Servers host roughly the same number of regions.
  • 12. Page 12 © Hortonworks Inc. 2014 Logical Data Model A sparse, multi-dimensional, sorted map Legend: - Rows are sorted by rowkey. - Within a row, values are located by column family and qualifier. - Values also carry a timestamp; there can me multiple versions of a value. - Within a column family, data is schemaless. Qualifiers and values are treated as arbitrary bytes. 1368387247 [3.6 kb png data]"thumb"cf2b a cf1 1368394583 7 1368394261 "hello" "bar" 1368394583 22 1368394925 13.6 1368393847 "world" "foo" cf2 1368387684 "almost the loneliest number"1.0001 1368396302 "fourth of July""2011-07-04" Table A rowkey column family column qualifier timestamp value
  • 13. Page 13 © Hortonworks Inc. 2014 HBase HA Overview (Introduced in HDP 2.1) HMaster   Zookeeper   Client   Client   Client   Client   HBase  RegionServer   Region:   100-­‐199   (Standby)   Region:   200-­‐299   (Standby)   Region:   0-­‐99   (Primary)   HBase  RegionServer   Region:   100-­‐199   (Primary)   Region:   0-­‐99   (Standby)   Region:   200-­‐299   (Primary)   HFile   HFile   HFile   HFile   HFile   HFile   HDFS   HBase  HA:   Real-­‐Time   Replica>on   Low-­‐Latency   Reads  and  Writes   In-­‐Memory  Cache   In-­‐Memory  Cache   Hive,  Pig,  MapReduce   Hive,  Pig,  MapReduce   Data  Stored   to  HDFS   Read  or  Write  Directly   from  Hadoop  Tools   Cluster  Topology,   Data  Placement  
  • 14. Page 14 © Hortonworks Inc. 2014 Apache Phoenix Spring 2014 Version 1.0 The SQL Skin for HBase
  • 15. Page 15 © Hortonworks Inc. 2014 Apache Phoenix A SQL Skin for HBase •  Provides a SQL interface for managing data in HBase. •  Large subset of SQL:1999 mandatory featureset. •  Create tables, insert and update data and perform low-latency point lookups through JDBC. •  Phoenix JDBC driver easily embeddable in any app that supports JDBC. Phoenix Makes HBase Better •  Oriented toward online / semi-transactional apps. •  If HBase is a good fit for your app, Phoenix makes it even better. •  Phoenix gets you out of the “one table per query” model many other NoSQL stores force you into.
  • 16. Page 16 © Hortonworks Inc. 2014 Apache Phoenix: Current Capabilities Feature Supported? Common SQL Datatypes Yes Inserts and Updates Yes SELECT, DISTINCT, GROUP BY, HAVING Yes NOT NULL and Primary Key constrants Yes Inner and Outer JOINs Yes Views Yes Subqueries HDP 2.2 Robust Secondary Indexes HDP 2.2
  • 17. Page 17 © Hortonworks Inc. 2014 Apache Phoenix: Future Capabilities Feature Supported? Multi-Table Transactions Future Scalable Joins (Fact-to-Fact) Future Analytics, Windowing Functions Future
  • 18. Page 18 © Hortonworks Inc. 2014 Phoenix Provides Familiar SQL Constructs Compare: Phoenix versus Native API Code Notes //  HBase  Native  API.   HBaseAdmin  hbase  =  new  HBaseAdmin(conf);   HTableDescriptor  desc  =  new  HTableDescriptor("us_population");   HColumnDescriptor  state  =  new  HColumnDescriptor("state".getBytes());   HColumnDescriptor  city  =  new  HColumnDescriptor("city".getBytes());   HColumnDescriptor  population  =  new  HColumnDescriptor("population".getBytes());   desc.addFamily(state);   desc.addFamily(city);   desc.addFamily(population);   hbase.createTable(desc);     //  Phoenix  DDL.   CREATE  TABLE  us_population  (                  state  CHAR(2)  NOT  NULL,                  city  VARCHAR  NOT  NULL,                  population  BIGINT   CONSTRAINT  my_pk  PRIMARY  KEY  (state,  city));   •  Familiar SQL syntax. •  Provides additional constraint checking.
  • 19. Page 19 © Hortonworks Inc. 2014 Phoenix: Architecture HBase Cluster Phoenix   Coprocessor   Phoenix   Coprocessor   Phoenix   Coprocessor   Java   Applica>on   Phoenix  JDBC   Driver   User Application
  • 20. Page 20 © Hortonworks Inc. 2014 Phoenix Performance Phoenix Performance Characterization: •  Suitable for 10s of thousands of point-lookups per second. •  Suitable for thousands of aggregations / filtered searches per second. •  Supports extremely high concurrency. Phoenix Performance Optimizations •  Column skipping. •  Table salting. •  Skip scans. Performance characteristics: •  Index point lookups in milliseconds. •  Aggregation and Top-N queries in a few seconds over large datasets.
  • 21. Page 21 © Hortonworks Inc. 2014 Phoenix Use Cases Phoenix is for: •  Rapidly and easily building an application backed by HBase. •  Making use of your existing SQL skills and investment. •  High performing aggregations of moderately-sized datasets inside HBase. Phoenix is not for: •  Sophisticated SQL queries involving large joins or advanced SQL features. •  Queries requiring large scans that do not use indexes. •  ETL.
  • 22. Page 22 © Hortonworks Inc. 2014 Phoenix: Futures Short-term focus: •  Transactions. •  Scalable joins. •  Analytical capabilities. Long-term focus: Primary interface for HBase. •  Build HBase applications using Phoenix. •  Configure cluster security and replication using Phoenix. •  Integration with BI tools like Microstrategy.
  • 23. Page 23 © Hortonworks Inc. 2014 What’s New in Apache Phoenix
  • 24. Page 24 © Hortonworks Inc. 2014 What’s New in Apache Phoenix Phoenix in HDP 2.2 •  Based on Apache Phoenix 4.2. •  8 new features, 143 total improvements and fixes. Notable new features. •  Robust secondary indexes. •  Sub-joins. •  Basic window functions. •  Bulk loader improvements.
  • 25. Page 25 © Hortonworks Inc. 2014 Robust Secondary Index Background / Refresher •  Phoenix supports local and global secondary indexes. •  Updating a global index may require coordination with another RegionServer. •  See Phoenix docs if you need info on which to use when. Before Phoenix 4.1 (HDP 2.1): •  Using global indexes, if the RegionServer serving the index key was down, regionservers would abort. •  Note: Does not affect local indexes. Phoenix 4.1+: •  If the global index cannot be updated: •  The index is temporarily disabled. •  Background job is launched to rebuild the index. •  Reads will go directly to base tables rather than accessing the index. •  Writes will continue to update the index. •  Controlled by: phoenix.index.failure.handling.rebuild
  • 26. Page 26 © Hortonworks Inc. 2014 Improved SQL: Sub Joins Example: select  *  from  A    left  join  (B  join  C  on  B.bc_id  =  C.bc_id)    on  A.ab_id  =  B.ab_id  and  A.ac_id  =  C.ac_id; Caveats related to joins still apply: •  Still broadcast joins only.
  • 27. Page 27 © Hortonworks Inc. 2014 Phoenix: Basic Window Functions FIRST_VALUE, LAST_VALUE, NTH_VALUE •  No OVER or PARTITION BY. •  Function applied to each group based on GROUP BY. Example: SELECT    FIRST_VALUE(“column1”)    WITHIN  GROUP      (ORDER  BY  column2  ASC)    FROM      table    GROUP  BY      column3;  
  • 28. Page 28 © Hortonworks Inc. 2014 ENCODE, DECODE DECODE •  Supports hexadecimal format. DECODE('000000008512af277ffffff8',  'hex')     ENCODE •  Supports hexadecimal and Base62 ENCODE(1,  'base62')     What is base 62??? •  Used to encode data using only letters and numbers.   •  Commonly used for things like URL shorteners.
  • 29. Page 29 © Hortonworks Inc. 2014 Demo Phoenix Secondary Indexes
  • 30. Page 30 © Hortonworks Inc. 2014 Secondary Index Recap Index Management via JDBC: •  CREATE INDEX my_index ON my_table (v1); •  DROP INDEX my_index ON my_table; •  ALTER INDEX my_index ON my_table DISABLE / REBUILD; Index population during bulk import: •  Uses the CsvBulkLoadTool utility (not psql.py). •  Adds the --index-table argument to specify your target index. HADOOP_CLASSPATH=/path/to/hbase-­‐protocol.jar:/path/to/hbase/conf   hadoop  jar  phoenix-­‐4.0.0.jar            org.apache.phoenix.mapreduce.CsvBulkLoadTool            -­‐-­‐table  EXAMPLE  -­‐-­‐input  /data/example.csv