SlideShare ist ein Scribd-Unternehmen logo
1 von 18
HiveServer2
Oct., 2013
Schubert Zhang
Hive Evolution
• Original
• Let users express their queries in a high-level language without having to
write MapReduce programs.
• Mainly target to ad-hoc queries.
• As a data tool, usually work in CLI mode.

• Now more …
• A parallel SQL DBMS that happens to use Hadoop for its storage and
execution layers.
• Ad-hoc + regular
• As a service …
Introduction
• Limitations of HiveServer1
•
•
•
•

Concurrency
Security
Client Interface
Stability

• Sessions/Currency

• Old Thrift API and server implementation
didn’t support currency.

• xDBC

• Old Thrift API didn’t support common xDBC

• Authentication/Authorization
• Incomplete implementations

• Auditing/Logging

HiveServer2:
• From hive-0.11 / CDH4.1
• Reconstructed and Re-implemented.
(HIVE-2935)
• HiveServer2 is a container for the Hive
execution engine (Driver).
• For each client connection, it creates a
new execution context (Connection and
Session) that serves Hive SQL requests
from the client.
• The new RPC interface enables the server
to associate this Hive execution context
with the thread serving the client’s
request.
Architecture

In fact,
Driver in
Operation
Context

System Arch.

Authentication Arch.
(don’t talk here)

http://blog.cloudera.com/blog/2013/07/how-hiveserver2-brings-security-and-concurrency-to-apache-hive/

@Cloudera
hiveServer2

Architecture:
Internal

Client-1

(main entry)
start

Thrift RPC Iface
Client-2

thriftCLIService
(TThreadPoolServer,
implements Client RPC Iface)
lIsten() and accept() new client connection, and process in each Thread)

• Core Contexts
• Connections
• Sessions
• Operations

• Operation Path …

Threads for Client Connections

…
call (ICLIService internal interface)

cliService
(Real implementations of
various operations)

open/close sessions, run operations in existing sessions …
HiveSession Interface
session
HiveConf, SessionState
sessionManager

backgroundOperationPool

runAsync

session

HiveConf, SessionState

operationManager

Threads for Async Operations
…

(handleToSessionMap)

...

...

session
HiveConf, SessionState
(handleToOperationMap)
create and run operations
SQLop
sync/async

create and run hive Driver
Hive Driver

op

op

...

op

SQLOp/SetOp/DfsOp/AddResourceOp/DeleteResourceOp ..
GetTypeInfoOp/GetCatalogsOp/GetSchemasOp/GetTablesOp/
GetTableTypesOp/GetColumnsOp/GetFunctionsOp ...
Architecture: Server Context
•
•
•
•

Client-1

Connection-1
(Thread)

Client
Connection (Thread)
Session (-> HiveConf, SessionState)
Operation (-> Driver)

Client-2

Connection-2
(Thread)

Session-12

• Usually, a client only opens one
Session in a Connection. (refer to JDBC
HiveDriver: HiveConnection)

Op-121
(SQL)

Driver

Session-11

Op-122

Op-123
(SQL)

Driver
Session

New Client API

SQL and Hive
Operation

• TCLIService.thrift
• Complete API
• Complete Database API

Hive
Command
Operation
DB Metadata
Operation

• Think about JDBC/ODBC
• To be compatible with
existing DB software

• Hive Specific API

• Best Practice

Operation for
Operation

• Client API vs. Internal
API
• Converting and Isolation
Get Result

OpenSession

CloseSession
ExecuteStatement

GetInfo *
GetTypeInfo
GetCatalogs
GetSchemas
GetTables
GetTableTypes

Client request to open a new session. A new HiveSession is created
in server and return a unique SessionHandler (UUID). All other calls
depend on this session.
Client request to close the session. Will also close and remove all
operations in this session.
Execute a HQL statement. SQLOp
Some SQL statement can be tagged “runAsync”, then it will be
executed in a dedicated Thread and return immediately.
SetOp,DfsOp,AddResourceOp,DeleteResourceOp

Get various global variables of Hive. (Key-Type->Value)
Get the detailed description and constraint of data type.
Do nothing so far.
Get schema from metastore.
Get table schema from metastore.
Get the table type, e.g. MANAGED_TABLE, EXTERNAL_TABLE,
VIRTUAL_VIEW, INDEX_TABLE.
GetColumns
Get columns of a table from metastore.
GetFunctions
Get the UDF functions.
GetOperationStatu Get state of an operation by opHandler, INITIALIZED/
s
RUNNING/FINISHED/CANCELED/CLOSED/ERROR/UNKNOWN/PENDI
NG.
CancelOperation
Cancel a RUNNING or PENDING operation by opHandler.
For SQLOp, do cleanup: close and destroy Hive Driver, delete temp
output files, and cancel the task running in the background thread…
CloseOperation
Remove this operation and close it: for SQLOp, do cleanup; for
HiveCommandOp, tearDownSessionIO.
GetResultSetMeta Get the resultset’s schema, such as the title columns.
data
FetchResults
Fetch the result rows from the real resultset.
Code
• Packages

• org.apache.hive.service …, top project of apache…

• Pros

• Clear Implementation
• Decoupling of HiveServer2 and HiveCore
• Decoupling of Thrift Client API and Internal Code

• Cons
•
•
•
•

Too many design pattern.
Somewhere, inconsistent principle.
Still not complete decoupling of HiveServer2 and HiveCore.
The JDBC Driver package/jar still relies on many other core code, such Hive->Hadoop and the
libs… (may be because of the support of Embedded Mode.)
Service
+state

CompositeService

Code

HiveServer2

AbstractService

+serviceList

+HiveConf: Global,set by init()

+addService()
+removeService()

+main(): 入口

+init()
+start()
+stop()
+register(): StateChangeListener
TCLIService.Iface

ThriftCLIService

ThrifyBinaryService

+cliService
ICLIService

TThreadPoolServer

+openSession()
+closeSession()
+getInfo()
+executeStatement()
+...()
+fetchResults()

CLIService
+sessionManager

FixedThreadPool

+OpenSession()
+CloseSession()
+GetInfo()
+ExecuteStatement()
+...()
+FetchResults()
OperationManager
+handleToOperation: HashMap
+newExecuteStatementOperation()
+newGetTypeInfoOperation()
+...()
+addOperation()
+removeOperation()
+getOperation()
+getOperationState()
+cancelOperation()
+closeOperation()
+getOperationNextRowSet()
+...()

SessionManager
+handleToSession: HashMap
+operationManager
+backgroundOperationPool

HiveSession
HiveSessionImpl

+sessionHandle
+hiveConf: new for each
+sessionState: new for each
+opHandleSet

+openSession()
+closeSession()
+getSession()
+...()
+submitBackgroundOperation()

Operation
+opHandle
+parentSession
+state
+getState()
+setState()
+run()
+getNextRowSet()
+close()
+cancel()
+...()

+getSessionHandle()
+getInfo()
+executeStatement()
+executeStatementAsync()
+...()
+fetchResults()

GetInfoOperation

ExecuteStatementOperation

SQLOperation

AddResourceOperation

DeleteResourceOpetation

DfsOperation

SetOperation

GetSchemasOperation

XXXOperation

This is just a quick view, may be not exact
in some detail, and intentionally missed
something not so important.
HiveCore and Depending
Hive

Env.?

• HiveConf

• Global instance
• Instance for each Session.

• Client can inject additional KeyValue style configurations when
OpenSession.
• Set an explicit session name(id) to
control the download directory
name.

• Hive SessionState

• Instance for each Session.

• Hive Driver

• Instance for each SQL Operation.

• Global static variables?
• ??

• SetOperation ->SetProcessor

• set env: variables can not be set.
• set system: global
System.getProperties().setProperty(..)
• We may forbid system setting? Or, only
administrator can do it?

• set hiveconf: instanced.
• set hivevar: instanced.
• Set: instanced

• AddResource and DeleteResourceOperation

• SessionState. add_resource/delete_resource
• DOWNLOADED_RESOURCES_DIR("hive.downlo
aded.resources.dir",
System.getProperty("java.io.tmpdir") +
File.separator + "${hive.session.id}_resources")

• DfsOperation

• Auth. With HDFS?
Handler (Identifier)
• SessionHandler
• OperationHandler
Theift IDL:

• Use UUID

struct THandleIdentifier {
// 16 byte globally unique identifier
// This is the public ID of the handle and
// can be used for reporting.
1: required binary guid,
Now, only the public ID is used, it’s OK.
// 16 byte secret generated by the server
// and used to verify that the handle is not
// being hijacked by another user.
2: required binary secret,

}
Configurations and Run
Config:

Run:

•
•
•
•
•
•
•

• Start HiveServer2

hive.server2.transport.mode = binary | http | https
hive.server2.thrift.port = 10000
hive.server2.thrift.bind.host
hive.server2.thrift.min.worker.threads = 5
hive.server2.thrift.max.worker.threads = 500
hive.server2.async.exec.threads = 50
hive.server2.async.exec.shutdown.timeout = 10
(seconds)

• hive.support.concurrency = true ???
• hive.zookeeper.quorum =
• …

• bin/hiveserver2 &

• Start CLI (use standard JDBC)
• bin/beeline
• !connect
jdbc:hive2://localhost:10000
• show tables;
• select * from tablename limit 10;
Interface and Clients
• RPC (TCLIService.thrift)

• Binary Protocol
• Http/https Protocol (to be researched)

• New JDBC Driver

• org.apache.hive.jdbc.HiveDriver
• URL: jdbc:hive2://hostname:10000/dbname… (jdbc:hive2://localhost:10000/default)
• Implemented more API features.

3party Client over JDBC:
• CLI

• Beeline based on SQLine

• IDE: SQuirreL SQL Client
• Web Client (e.g. H2 Web, etc.)
Client Tools: CLI
SQLine, Beeline
Client Tools: IDE SQuirreL SQL Client
Client Tools: Web Client
Think More …
• Thinking of XX as Platform

• Standard JDBC/ODBC
• RESTful API over HTTP, Web Service
• AWS Redshift, SimpleDB …

• Hive as a Service?

• http://www.qubole.com/
• Request Cluster, run SQL ad-hoc and Regularly, workflow and schedule.

• Language

• SQL, R, Pig

• Computing of Estimation, Probability …
Thank You!

Weitere ähnliche Inhalte

Was ist angesagt?

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introductionchrislusf
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?DataWorks Summit
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache SqoopAvkash Chauhan
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High PerformanceInderaj (Raj) Bains
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationKnoldus Inc.
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQLkristinferrier
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop securitybigdatagurus_meetup
 
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기NAVER D2
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemCloudera, Inc.
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
Building an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache PulsarBuilding an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache PulsarScyllaDB
 
Under The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database ArchitectureUnder The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database ArchitectureScyllaDB
 
Alexei Vladishev - Zabbix - Monitoring Solution for Everyone
Alexei Vladishev - Zabbix - Monitoring Solution for EveryoneAlexei Vladishev - Zabbix - Monitoring Solution for Everyone
Alexei Vladishev - Zabbix - Monitoring Solution for EveryoneZabbix
 

Was ist angesagt? (20)

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introduction
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache Sqoop
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configuration
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
Internal Hive
Internal HiveInternal Hive
Internal Hive
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 
Sqoop
SqoopSqoop
Sqoop
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Building an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache PulsarBuilding an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache Pulsar
 
Under The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database ArchitectureUnder The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database Architecture
 
Alexei Vladishev - Zabbix - Monitoring Solution for Everyone
Alexei Vladishev - Zabbix - Monitoring Solution for EveryoneAlexei Vladishev - Zabbix - Monitoring Solution for Everyone
Alexei Vladishev - Zabbix - Monitoring Solution for Everyone
 

Ähnlich wie HiveServer2

Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
 
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Mandi Walls
 
.NET Core Apps: Design & Development
.NET Core Apps: Design & Development.NET Core Apps: Design & Development
.NET Core Apps: Design & DevelopmentGlobalLogic Ukraine
 
Adding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemAdding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemJohn Efstathiades
 
AAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty ProfileAAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty ProfileWASdev Community
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?DataWorks Summit
 
Станислав Сидоренко «DeviceHive Java Server – миграция на Spring Boot»
Станислав Сидоренко «DeviceHive Java Server – миграция на Spring Boot»Станислав Сидоренко «DeviceHive Java Server – миграция на Spring Boot»
Станислав Сидоренко «DeviceHive Java Server – миграция на Spring Boot»DataArt
 
TriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache SentryTriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache Sentrytrihug
 
Delivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDBDelivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDBJohn Bennett
 
Servletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postServletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postvamsitricks
 
Servletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postServletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postvamsi krishna
 
Servletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postServletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postvamsitricks
 
SignalR: Add real-time to your applications
SignalR: Add real-time to your applicationsSignalR: Add real-time to your applications
SignalR: Add real-time to your applicationsEugene Zharkov
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoDataWorks Summit
 
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...Ceph Community
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCloudIDSummit
 
Local development environment evolution
Local development environment evolutionLocal development environment evolution
Local development environment evolutionWise Engineering
 

Ähnlich wie HiveServer2 (20)

Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
 
.NET Core Apps: Design & Development
.NET Core Apps: Design & Development.NET Core Apps: Design & Development
.NET Core Apps: Design & Development
 
Adding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemAdding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded System
 
AAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty ProfileAAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
Станислав Сидоренко «DeviceHive Java Server – миграция на Spring Boot»
Станислав Сидоренко «DeviceHive Java Server – миграция на Spring Boot»Станислав Сидоренко «DeviceHive Java Server – миграция на Spring Boot»
Станислав Сидоренко «DeviceHive Java Server – миграция на Spring Boot»
 
TriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache SentryTriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache Sentry
 
Delivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDBDelivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDB
 
Ecom 1
Ecom 1Ecom 1
Ecom 1
 
Servletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postServletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,post
 
Servletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postServletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,post
 
Servletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postServletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,post
 
SignalR: Add real-time to your applications
SignalR: Add real-time to your applicationsSignalR: Add real-time to your applications
SignalR: Add real-time to your applications
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
 
Beginners Node.js
Beginners Node.jsBeginners Node.js
Beginners Node.js
 
Local development environment evolution
Local development environment evolutionLocal development environment evolution
Local development environment evolution
 

Mehr von Schubert Zhang

Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and InfrastructureSchubert Zhang
 
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSchubert Zhang
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile DevelopmentSchubert Zhang
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processingSchubert Zhang
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Schubert Zhang
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aSchubert Zhang
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor IntroductionSchubert Zhang
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验Schubert Zhang
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBaseSchubert Zhang
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on HadoopSchubert Zhang
 
Hadoop compress-stream
Hadoop compress-streamHadoop compress-stream
Hadoop compress-streamSchubert Zhang
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南Schubert Zhang
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionSchubert Zhang
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Schubert Zhang
 

Mehr von Schubert Zhang (20)

Blockchain in Action
Blockchain in ActionBlockchain in Action
Blockchain in Action
 
科普区块链
科普区块链科普区块链
科普区块链
 
Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and Infrastructure
 
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluation
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile Development
 
Career Advice
Career AdviceCareer Advice
Career Advice
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
 
Horizon for Big Data
Horizon for Big DataHorizon for Big Data
Horizon for Big Data
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223a
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBase
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on Hadoop
 
Fans of running gump
Fans of running gumpFans of running gump
Fans of running gump
 
Hadoop compress-stream
Hadoop compress-streamHadoop compress-stream
Hadoop compress-stream
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
 
Big data and cloud
Big data and cloudBig data and cloud
Big data and cloud
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)
 

Kürzlich hochgeladen

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Kürzlich hochgeladen (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

HiveServer2

  • 2. Hive Evolution • Original • Let users express their queries in a high-level language without having to write MapReduce programs. • Mainly target to ad-hoc queries. • As a data tool, usually work in CLI mode. • Now more … • A parallel SQL DBMS that happens to use Hadoop for its storage and execution layers. • Ad-hoc + regular • As a service …
  • 3. Introduction • Limitations of HiveServer1 • • • • Concurrency Security Client Interface Stability • Sessions/Currency • Old Thrift API and server implementation didn’t support currency. • xDBC • Old Thrift API didn’t support common xDBC • Authentication/Authorization • Incomplete implementations • Auditing/Logging HiveServer2: • From hive-0.11 / CDH4.1 • Reconstructed and Re-implemented. (HIVE-2935) • HiveServer2 is a container for the Hive execution engine (Driver). • For each client connection, it creates a new execution context (Connection and Session) that serves Hive SQL requests from the client. • The new RPC interface enables the server to associate this Hive execution context with the thread serving the client’s request.
  • 4. Architecture In fact, Driver in Operation Context System Arch. Authentication Arch. (don’t talk here) http://blog.cloudera.com/blog/2013/07/how-hiveserver2-brings-security-and-concurrency-to-apache-hive/ @Cloudera
  • 5. hiveServer2 Architecture: Internal Client-1 (main entry) start Thrift RPC Iface Client-2 thriftCLIService (TThreadPoolServer, implements Client RPC Iface) lIsten() and accept() new client connection, and process in each Thread) • Core Contexts • Connections • Sessions • Operations • Operation Path … Threads for Client Connections … call (ICLIService internal interface) cliService (Real implementations of various operations) open/close sessions, run operations in existing sessions … HiveSession Interface session HiveConf, SessionState sessionManager backgroundOperationPool runAsync session HiveConf, SessionState operationManager Threads for Async Operations … (handleToSessionMap) ... ... session HiveConf, SessionState (handleToOperationMap) create and run operations SQLop sync/async create and run hive Driver Hive Driver op op ... op SQLOp/SetOp/DfsOp/AddResourceOp/DeleteResourceOp .. GetTypeInfoOp/GetCatalogsOp/GetSchemasOp/GetTablesOp/ GetTableTypesOp/GetColumnsOp/GetFunctionsOp ...
  • 6. Architecture: Server Context • • • • Client-1 Connection-1 (Thread) Client Connection (Thread) Session (-> HiveConf, SessionState) Operation (-> Driver) Client-2 Connection-2 (Thread) Session-12 • Usually, a client only opens one Session in a Connection. (refer to JDBC HiveDriver: HiveConnection) Op-121 (SQL) Driver Session-11 Op-122 Op-123 (SQL) Driver
  • 7. Session New Client API SQL and Hive Operation • TCLIService.thrift • Complete API • Complete Database API Hive Command Operation DB Metadata Operation • Think about JDBC/ODBC • To be compatible with existing DB software • Hive Specific API • Best Practice Operation for Operation • Client API vs. Internal API • Converting and Isolation Get Result OpenSession CloseSession ExecuteStatement GetInfo * GetTypeInfo GetCatalogs GetSchemas GetTables GetTableTypes Client request to open a new session. A new HiveSession is created in server and return a unique SessionHandler (UUID). All other calls depend on this session. Client request to close the session. Will also close and remove all operations in this session. Execute a HQL statement. SQLOp Some SQL statement can be tagged “runAsync”, then it will be executed in a dedicated Thread and return immediately. SetOp,DfsOp,AddResourceOp,DeleteResourceOp Get various global variables of Hive. (Key-Type->Value) Get the detailed description and constraint of data type. Do nothing so far. Get schema from metastore. Get table schema from metastore. Get the table type, e.g. MANAGED_TABLE, EXTERNAL_TABLE, VIRTUAL_VIEW, INDEX_TABLE. GetColumns Get columns of a table from metastore. GetFunctions Get the UDF functions. GetOperationStatu Get state of an operation by opHandler, INITIALIZED/ s RUNNING/FINISHED/CANCELED/CLOSED/ERROR/UNKNOWN/PENDI NG. CancelOperation Cancel a RUNNING or PENDING operation by opHandler. For SQLOp, do cleanup: close and destroy Hive Driver, delete temp output files, and cancel the task running in the background thread… CloseOperation Remove this operation and close it: for SQLOp, do cleanup; for HiveCommandOp, tearDownSessionIO. GetResultSetMeta Get the resultset’s schema, such as the title columns. data FetchResults Fetch the result rows from the real resultset.
  • 8. Code • Packages • org.apache.hive.service …, top project of apache… • Pros • Clear Implementation • Decoupling of HiveServer2 and HiveCore • Decoupling of Thrift Client API and Internal Code • Cons • • • • Too many design pattern. Somewhere, inconsistent principle. Still not complete decoupling of HiveServer2 and HiveCore. The JDBC Driver package/jar still relies on many other core code, such Hive->Hadoop and the libs… (may be because of the support of Embedded Mode.)
  • 9. Service +state CompositeService Code HiveServer2 AbstractService +serviceList +HiveConf: Global,set by init() +addService() +removeService() +main(): 入口 +init() +start() +stop() +register(): StateChangeListener TCLIService.Iface ThriftCLIService ThrifyBinaryService +cliService ICLIService TThreadPoolServer +openSession() +closeSession() +getInfo() +executeStatement() +...() +fetchResults() CLIService +sessionManager FixedThreadPool +OpenSession() +CloseSession() +GetInfo() +ExecuteStatement() +...() +FetchResults() OperationManager +handleToOperation: HashMap +newExecuteStatementOperation() +newGetTypeInfoOperation() +...() +addOperation() +removeOperation() +getOperation() +getOperationState() +cancelOperation() +closeOperation() +getOperationNextRowSet() +...() SessionManager +handleToSession: HashMap +operationManager +backgroundOperationPool HiveSession HiveSessionImpl +sessionHandle +hiveConf: new for each +sessionState: new for each +opHandleSet +openSession() +closeSession() +getSession() +...() +submitBackgroundOperation() Operation +opHandle +parentSession +state +getState() +setState() +run() +getNextRowSet() +close() +cancel() +...() +getSessionHandle() +getInfo() +executeStatement() +executeStatementAsync() +...() +fetchResults() GetInfoOperation ExecuteStatementOperation SQLOperation AddResourceOperation DeleteResourceOpetation DfsOperation SetOperation GetSchemasOperation XXXOperation This is just a quick view, may be not exact in some detail, and intentionally missed something not so important.
  • 10. HiveCore and Depending Hive Env.? • HiveConf • Global instance • Instance for each Session. • Client can inject additional KeyValue style configurations when OpenSession. • Set an explicit session name(id) to control the download directory name. • Hive SessionState • Instance for each Session. • Hive Driver • Instance for each SQL Operation. • Global static variables? • ?? • SetOperation ->SetProcessor • set env: variables can not be set. • set system: global System.getProperties().setProperty(..) • We may forbid system setting? Or, only administrator can do it? • set hiveconf: instanced. • set hivevar: instanced. • Set: instanced • AddResource and DeleteResourceOperation • SessionState. add_resource/delete_resource • DOWNLOADED_RESOURCES_DIR("hive.downlo aded.resources.dir", System.getProperty("java.io.tmpdir") + File.separator + "${hive.session.id}_resources") • DfsOperation • Auth. With HDFS?
  • 11. Handler (Identifier) • SessionHandler • OperationHandler Theift IDL: • Use UUID struct THandleIdentifier { // 16 byte globally unique identifier // This is the public ID of the handle and // can be used for reporting. 1: required binary guid, Now, only the public ID is used, it’s OK. // 16 byte secret generated by the server // and used to verify that the handle is not // being hijacked by another user. 2: required binary secret, }
  • 12. Configurations and Run Config: Run: • • • • • • • • Start HiveServer2 hive.server2.transport.mode = binary | http | https hive.server2.thrift.port = 10000 hive.server2.thrift.bind.host hive.server2.thrift.min.worker.threads = 5 hive.server2.thrift.max.worker.threads = 500 hive.server2.async.exec.threads = 50 hive.server2.async.exec.shutdown.timeout = 10 (seconds) • hive.support.concurrency = true ??? • hive.zookeeper.quorum = • … • bin/hiveserver2 & • Start CLI (use standard JDBC) • bin/beeline • !connect jdbc:hive2://localhost:10000 • show tables; • select * from tablename limit 10;
  • 13. Interface and Clients • RPC (TCLIService.thrift) • Binary Protocol • Http/https Protocol (to be researched) • New JDBC Driver • org.apache.hive.jdbc.HiveDriver • URL: jdbc:hive2://hostname:10000/dbname… (jdbc:hive2://localhost:10000/default) • Implemented more API features. 3party Client over JDBC: • CLI • Beeline based on SQLine • IDE: SQuirreL SQL Client • Web Client (e.g. H2 Web, etc.)
  • 15. Client Tools: IDE SQuirreL SQL Client
  • 17. Think More … • Thinking of XX as Platform • Standard JDBC/ODBC • RESTful API over HTTP, Web Service • AWS Redshift, SimpleDB … • Hive as a Service? • http://www.qubole.com/ • Request Cluster, run SQL ad-hoc and Regularly, workflow and schedule. • Language • SQL, R, Pig • Computing of Estimation, Probability …