SlideShare ist ein Scribd-Unternehmen logo
1 von 17
12 SQL-ON-HADOOP TOOLS
Saggi Neumann - CTO and co-founder, Xplenty
BRINGING SQL TO HADOOP
In our recent post, 8 SQL-on-Hadoop challenges, we quickly listed several
tools that help to bridge the gap between the two technologies without
going into details. This time we’ll dive in and learn about 12 tools that
bring SQL to Hadoop in various ways.
OPEN SOURCE
SQL-ON-HADOOP TOOLS
APACHE HIVE
Initially developed by Facebook, Apache Hive is a data warehouse
infrastructure that is built on top of Hadoop. It allows querying data
stored on HDFS for analysis via HQL, an SQL-like language that is
translated to MapReduce jobs. Although it seems to provide SQL
functionality, Hive performs batch processing on Hadoop and does not
provide interactive querying. It stores metadata in a relational database
and requires maintaining a schema for the data. Only four file formats are
supported by Hive: text, SequenceFile, ORC and RCFile. Hive supports
processing compressed data on Hadoop and also user defined functions.
▪ Bottom line - batch processing on Hadoop with an SQL like language
APACHE SQOOP
Apache Sqoop allows importing and exporting data from relational
databases to Hadoop via JDBC, the standard API for connecting to
databases with Java. It can also work without JDBC as long as the relevant
tools allow bulk import/export of data. Sqoop works by running a query
on the relational database and exporting the resulting rows into files in
either one of these formats: text, binary, Avro, or Sequence Files. These
files can then be saved on Hadoop’s HDFS. They can also be exported from
Hadoop back into a relational database. Finally, Sqoop integrates with
HCatalog, a table and storage management service for Hadoop that allows
querying Sqoop’s imported files via Hive or Pig. See our Sqoop blog
post for more info.
▪ Bottom line - import/export data from SQL databases to/from Apache
Hadoop
BIGSQL
BigSQL is a pre-made package of PostgreSQL and Hadoop that you can
easily download and install to try out on your local machine. Aside from
Apache Hadoop and PostgreSQL, it also includes Cassandra, Tez, Hive,
Zookeeper, and HadoopFDW. Extra components such as Pig, Sqoop, and
HBase can be downloaded additionally.
▪ Bottom line - pre-made package for trying out Hadoop with PostgreSQL
on your machine
LINGUAL
While other tools provide SQL-like syntax, Cascading’s Lingual claims to
provide a full ANSI SQL interface for Hadoop, thus allowing for easier
integration with existing BI tools and helping SQL skilled personnel to use
Hadoop immediately. Lingual supports JDBC and also includes an SQL
shell. Despite the SQL interface, it still executes queries on Hadoop in
batch processing.
▪ Bottom line - ANSI SQL interface for Hadoop
APACHE PHOENIX
Apache Phoenix is an SQL skin for interactive queries over HBase. It
compiles SQL queries into a series of HBase scans and produces JDBC
result sets. Note that it requires maintaining a schema which could be
built from scratch or mapped from an existing HBase table. Furthermore,
there are several features Phoenix doesn’t support: full transaction
support, derived tables, relational operators, and misc built-in functions
(although they can be added manually). The project is mainly maintained
by Salesforce, Intel, and Hortonworks.
▪ Bottom line - interactive SQL over HBase
IMPALA
Cloudera’s Impala is a query engine that runs on top of Hadoop and
executes interactive SQL queries on HDFS and HBase. While Hive runs in
batch processing, Impala runs the queries in real-time, thus integrating
SQL based business intelligence tools with Hadoop. Although Cloudera is
the main developer behind this tool, it is fully open source and supports
the following file formats: text, LZO, SequenceFile, Avro and RCFile.
Impala can also run on the cloud via Amazon’s Elastic MapReduce.
▪ Bottom line - Cloudera’s solution for interactive SQL queries over HDFS
and HBase
PRESTO
Presto is also an interactive SQL query engine. It runs on top of Hive,
HBase, and even relational databases and proprietary data stores, thus
combining data from multiple sources across the organization. Facebook is
the main developer behind Presto and the company uses it to query
internal data stores, including a 300PB data warehouse. Airbnb and
Dropbox also use Presto, so it seems tried and tested for the enterprise.
▪ Bottom line - Facebook’s solution for interactive SQL queries over Hive
and HBase
CITUSDB
CitusDB (not to be confused with CitrusDB) is another interactive querying
engine with SQL-like functionality that works over Hadoop. It’s based on
Dremel, Google’s version of a real-time analytics database to process Big
Data, and unlike Impala and Presto it uses PostgreSQL as the SQL engine
that works behind the scenes. CitusDB can run on-premise or in the cloud
and supports features such as full-text search and geo search as well as
ODBC/JDBC compatibility. However, being an analytical database it only
supports loading the data in batches.
▪ Bottom line - SQL on Hadoop interactive querying with PostgreSQL
INFINIDB
InfiniDB is a columnar database that integrates with HDFS to perform real-
time analytics on Hadoop with MySQL compatibility. The data is stored in
their own columnar format on disk with support for MySQL’s major data
types. Other formats and non-relational data structures aren’t supported,
although Parquet is on the long term road map. They recently
ran benchmarks against other open source SQL-on-Hadoop engines and
claim to have much better performance than Hive and Presto. InfiniDB
also supports windowing functions for analytics.
COMMERCIAL
SQL-ON-HADOOP TOOLS
HADAPT
Hadapt is a commercial product that brings a native SQL implementation
to Hadoop. Because it combines Hadoop with a storage layer of a
relational database, it allows querying Hadoop via SQL interactively rather
than as a batch process. They can handle structured and unstructured
data without a predefined schema.
▪ Bottom line - interactive SQL querying on Hadoop
JETHRO DATA
Jethro claims the title of "fastest SQL on Hadoop" by providing an SQL
engine for Hadoop that automatically indexes the data as soon as it is
written to Hadoop. According to them, it executes queries 100 times
faster than Hive and 10 times faster than Impala. Jethro can be added to
an existing Hadoop cluster and is supposed to be non-intrusive and it isn’t
installed on any of the Hadoop storage nodes.
▪ Bottom line - fast non-intrusive SQL-on-Hadoop via auto-indexing
HAWQ
HAWQ (HAdoop With Query) is a commercial SQL-on-Hadoop platform by
Pivotal, a subsidiary of EMC. It provides a parallel SQL query engine using
Pivotal’s Greenplum Analytic Database and Hadoop’s HDFS for data
storage. This engine is supposed to be useful for analytics with full
transaction support and supports creating external tables on HDFS that
read text, Hive, HBase, and soon Parquet. Pivotal received
some criticism about a year ago that this is not a true Hadoop product
because they claim to have over 300 engineers working on Hadoop, yet
none of them contribute to any of the Hadoop related projects. As these
lines are written, that’s still true.
▪ Bottom line - Pivotal’s SQL-on-Hadoop
XPLENTY
WWW.XPLENTY.COM

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)
 
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Rich Data Graphs for MapReduce
Rich Data Graphs for MapReduceRich Data Graphs for MapReduce
Rich Data Graphs for MapReduce
 
Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Apache Spark Briefing
Apache Spark BriefingApache Spark Briefing
Apache Spark Briefing
 
JethroData technical white paper
JethroData technical white paperJethroData technical white paper
JethroData technical white paper
 
Hadoop & Complex Systems Research
Hadoop & Complex Systems ResearchHadoop & Complex Systems Research
Hadoop & Complex Systems Research
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
 
Hbase mhug 2015
Hbase mhug 2015Hbase mhug 2015
Hbase mhug 2015
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
Intro to Big Data - Spark
Intro to Big Data - SparkIntro to Big Data - Spark
Intro to Big Data - Spark
 
Data Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsData Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0
 
Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala
 

Andere mochten auch

Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014
Tsuyoshi OZAWA
 
ckan 2.0: Harvesting from other sources
ckan 2.0: Harvesting from other sourcesckan 2.0: Harvesting from other sources
ckan 2.0: Harvesting from other sources
Chengjen Lee
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
datasalt
 

Andere mochten auch (13)

Final version sql over hadoop ver1
Final version sql over hadoop ver1Final version sql over hadoop ver1
Final version sql over hadoop ver1
 
Design for a Distributed Name Node
Design for a Distributed Name NodeDesign for a Distributed Name Node
Design for a Distributed Name Node
 
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
 
DCAT-AP exchanging metadata
DCAT-AP exchanging metadataDCAT-AP exchanging metadata
DCAT-AP exchanging metadata
 
DCAT: a tale of exchanging metadata
DCAT: a tale of exchanging metadataDCAT: a tale of exchanging metadata
DCAT: a tale of exchanging metadata
 
ckan 2.0: Harvesting from other sources
ckan 2.0: Harvesting from other sourcesckan 2.0: Harvesting from other sources
ckan 2.0: Harvesting from other sources
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRHadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
 

Ähnlich wie 12 SQL On-Hadoop Tools

Ähnlich wie 12 SQL On-Hadoop Tools (20)

Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in Mohali
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in Amritsar
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in Ludhiana
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
 
Apache hive1
Apache hive1Apache hive1
Apache hive1
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
Bigdata ppt
Bigdata pptBigdata ppt
Bigdata ppt
 
Bigdata
BigdataBigdata
Bigdata
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoop
 
Hadoop vs Apache Spark
Hadoop vs Apache SparkHadoop vs Apache Spark
Hadoop vs Apache Spark
 
Intro to Hybrid Data Warehouse
Intro to Hybrid Data WarehouseIntro to Hybrid Data Warehouse
Intro to Hybrid Data Warehouse
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 

Kürzlich hochgeladen

怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Kürzlich hochgeladen (20)

怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 

12 SQL On-Hadoop Tools

  • 1. 12 SQL-ON-HADOOP TOOLS Saggi Neumann - CTO and co-founder, Xplenty
  • 2. BRINGING SQL TO HADOOP In our recent post, 8 SQL-on-Hadoop challenges, we quickly listed several tools that help to bridge the gap between the two technologies without going into details. This time we’ll dive in and learn about 12 tools that bring SQL to Hadoop in various ways.
  • 4. APACHE HIVE Initially developed by Facebook, Apache Hive is a data warehouse infrastructure that is built on top of Hadoop. It allows querying data stored on HDFS for analysis via HQL, an SQL-like language that is translated to MapReduce jobs. Although it seems to provide SQL functionality, Hive performs batch processing on Hadoop and does not provide interactive querying. It stores metadata in a relational database and requires maintaining a schema for the data. Only four file formats are supported by Hive: text, SequenceFile, ORC and RCFile. Hive supports processing compressed data on Hadoop and also user defined functions. ▪ Bottom line - batch processing on Hadoop with an SQL like language
  • 5. APACHE SQOOP Apache Sqoop allows importing and exporting data from relational databases to Hadoop via JDBC, the standard API for connecting to databases with Java. It can also work without JDBC as long as the relevant tools allow bulk import/export of data. Sqoop works by running a query on the relational database and exporting the resulting rows into files in either one of these formats: text, binary, Avro, or Sequence Files. These files can then be saved on Hadoop’s HDFS. They can also be exported from Hadoop back into a relational database. Finally, Sqoop integrates with HCatalog, a table and storage management service for Hadoop that allows querying Sqoop’s imported files via Hive or Pig. See our Sqoop blog post for more info. ▪ Bottom line - import/export data from SQL databases to/from Apache Hadoop
  • 6. BIGSQL BigSQL is a pre-made package of PostgreSQL and Hadoop that you can easily download and install to try out on your local machine. Aside from Apache Hadoop and PostgreSQL, it also includes Cassandra, Tez, Hive, Zookeeper, and HadoopFDW. Extra components such as Pig, Sqoop, and HBase can be downloaded additionally. ▪ Bottom line - pre-made package for trying out Hadoop with PostgreSQL on your machine
  • 7. LINGUAL While other tools provide SQL-like syntax, Cascading’s Lingual claims to provide a full ANSI SQL interface for Hadoop, thus allowing for easier integration with existing BI tools and helping SQL skilled personnel to use Hadoop immediately. Lingual supports JDBC and also includes an SQL shell. Despite the SQL interface, it still executes queries on Hadoop in batch processing. ▪ Bottom line - ANSI SQL interface for Hadoop
  • 8. APACHE PHOENIX Apache Phoenix is an SQL skin for interactive queries over HBase. It compiles SQL queries into a series of HBase scans and produces JDBC result sets. Note that it requires maintaining a schema which could be built from scratch or mapped from an existing HBase table. Furthermore, there are several features Phoenix doesn’t support: full transaction support, derived tables, relational operators, and misc built-in functions (although they can be added manually). The project is mainly maintained by Salesforce, Intel, and Hortonworks. ▪ Bottom line - interactive SQL over HBase
  • 9. IMPALA Cloudera’s Impala is a query engine that runs on top of Hadoop and executes interactive SQL queries on HDFS and HBase. While Hive runs in batch processing, Impala runs the queries in real-time, thus integrating SQL based business intelligence tools with Hadoop. Although Cloudera is the main developer behind this tool, it is fully open source and supports the following file formats: text, LZO, SequenceFile, Avro and RCFile. Impala can also run on the cloud via Amazon’s Elastic MapReduce. ▪ Bottom line - Cloudera’s solution for interactive SQL queries over HDFS and HBase
  • 10. PRESTO Presto is also an interactive SQL query engine. It runs on top of Hive, HBase, and even relational databases and proprietary data stores, thus combining data from multiple sources across the organization. Facebook is the main developer behind Presto and the company uses it to query internal data stores, including a 300PB data warehouse. Airbnb and Dropbox also use Presto, so it seems tried and tested for the enterprise. ▪ Bottom line - Facebook’s solution for interactive SQL queries over Hive and HBase
  • 11. CITUSDB CitusDB (not to be confused with CitrusDB) is another interactive querying engine with SQL-like functionality that works over Hadoop. It’s based on Dremel, Google’s version of a real-time analytics database to process Big Data, and unlike Impala and Presto it uses PostgreSQL as the SQL engine that works behind the scenes. CitusDB can run on-premise or in the cloud and supports features such as full-text search and geo search as well as ODBC/JDBC compatibility. However, being an analytical database it only supports loading the data in batches. ▪ Bottom line - SQL on Hadoop interactive querying with PostgreSQL
  • 12. INFINIDB InfiniDB is a columnar database that integrates with HDFS to perform real- time analytics on Hadoop with MySQL compatibility. The data is stored in their own columnar format on disk with support for MySQL’s major data types. Other formats and non-relational data structures aren’t supported, although Parquet is on the long term road map. They recently ran benchmarks against other open source SQL-on-Hadoop engines and claim to have much better performance than Hive and Presto. InfiniDB also supports windowing functions for analytics.
  • 14. HADAPT Hadapt is a commercial product that brings a native SQL implementation to Hadoop. Because it combines Hadoop with a storage layer of a relational database, it allows querying Hadoop via SQL interactively rather than as a batch process. They can handle structured and unstructured data without a predefined schema. ▪ Bottom line - interactive SQL querying on Hadoop
  • 15. JETHRO DATA Jethro claims the title of "fastest SQL on Hadoop" by providing an SQL engine for Hadoop that automatically indexes the data as soon as it is written to Hadoop. According to them, it executes queries 100 times faster than Hive and 10 times faster than Impala. Jethro can be added to an existing Hadoop cluster and is supposed to be non-intrusive and it isn’t installed on any of the Hadoop storage nodes. ▪ Bottom line - fast non-intrusive SQL-on-Hadoop via auto-indexing
  • 16. HAWQ HAWQ (HAdoop With Query) is a commercial SQL-on-Hadoop platform by Pivotal, a subsidiary of EMC. It provides a parallel SQL query engine using Pivotal’s Greenplum Analytic Database and Hadoop’s HDFS for data storage. This engine is supposed to be useful for analytics with full transaction support and supports creating external tables on HDFS that read text, Hive, HBase, and soon Parquet. Pivotal received some criticism about a year ago that this is not a true Hadoop product because they claim to have over 300 engineers working on Hadoop, yet none of them contribute to any of the Hadoop related projects. As these lines are written, that’s still true. ▪ Bottom line - Pivotal’s SQL-on-Hadoop