Submit Search
Upload
Hadoop Hive Talk At IIT-Delhi
•
Download as PPT, PDF
•
14 likes
•
3,830 views
Joydeep Sen Sarma
Follow
Talk at the CS department in IIT 04/02/09.
Read less
Read more
Technology
Report
Share
Report
Share
1 of 37
Download now
Recommended
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
Joydeep Sen Sarma
Nextag talk
Nextag talk
Joydeep Sen Sarma
Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012
Joydeep Sen Sarma
Cloud Optimized Big Data
Cloud Optimized Big Data
Joydeep Sen Sarma
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
Joydeep Sen Sarma
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
Messaging architecture @FB (Fifth Elephant Conference)
Messaging architecture @FB (Fifth Elephant Conference)
Joydeep Sen Sarma
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet
Recommended
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
Joydeep Sen Sarma
Nextag talk
Nextag talk
Joydeep Sen Sarma
Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012
Joydeep Sen Sarma
Cloud Optimized Big Data
Cloud Optimized Big Data
Joydeep Sen Sarma
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
Joydeep Sen Sarma
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
Messaging architecture @FB (Fifth Elephant Conference)
Messaging architecture @FB (Fifth Elephant Conference)
Joydeep Sen Sarma
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
liuknag
Hadoop Primer
Hadoop Primer
Steve Staso
Hadoop - Overview
Hadoop - Overview
Jay
Big Data Journey
Big Data Journey
Tugdual Grall
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Cloudera, Inc.
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Databricks
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
An intriduction to hive
An intriduction to hive
Reza Ameri
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
royans
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
Apache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
Hadoop User Group
מיכאל
מיכאל
sqlserver.co.il
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Bill Liu
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
Vinoth Chandar
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
DataWorks Summit/Hadoop Summit
Apache Hadoop 1.1
Apache Hadoop 1.1
Sperasoft
Big Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
Rajkumar Singh
Hadoop Tutorial
Hadoop Tutorial
awesomesos
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Zheng Shao
Hive
Hive
Srinath Reddy
More Related Content
What's hot
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
liuknag
Hadoop Primer
Hadoop Primer
Steve Staso
Hadoop - Overview
Hadoop - Overview
Jay
Big Data Journey
Big Data Journey
Tugdual Grall
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Cloudera, Inc.
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Databricks
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
An intriduction to hive
An intriduction to hive
Reza Ameri
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
royans
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
Apache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
Hadoop User Group
מיכאל
מיכאל
sqlserver.co.il
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Bill Liu
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
Vinoth Chandar
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
DataWorks Summit/Hadoop Summit
Apache Hadoop 1.1
Apache Hadoop 1.1
Sperasoft
Big Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
Rajkumar Singh
Hadoop Tutorial
Hadoop Tutorial
awesomesos
What's hot
(20)
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
Hadoop Primer
Hadoop Primer
Hadoop - Overview
Hadoop - Overview
Big Data Journey
Big Data Journey
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
An intriduction to hive
An intriduction to hive
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Apache Hadoop and HBase
Apache Hadoop and HBase
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
מיכאל
מיכאל
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
Apache Hadoop 1.1
Apache Hadoop 1.1
Big Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
Hadoop Tutorial
Hadoop Tutorial
Similar to Hadoop Hive Talk At IIT-Delhi
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Zheng Shao
Hive
Hive
Srinath Reddy
Hive Apachecon 2008
Hive Apachecon 2008
athusoo
Hadoop and Hive
Hadoop and Hive
Zheng Shao
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
Jeff Hammerbacher
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
nzhang
Hive ICDE 2010
Hive ICDE 2010
ragho
Hive Percona 2009
Hive Percona 2009
prasadc
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
nzhang
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Yahoo Developer Network
An introduction to Hadoop for large scale data analysis
An introduction to Hadoop for large scale data analysis
Abhijit Sharma
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Namit Jain
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Zheng Shao
Meethadoop
Meethadoop
IIIT-H
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Xiao Qin
02 data warehouse applications with hive
02 data warehouse applications with hive
Subhas Kumar Ghosh
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
ragho
Hive User Meeting 2009 8 Facebook
Hive User Meeting 2009 8 Facebook
Zheng Shao
Hadoop institutes in hyderabad
Hadoop institutes in hyderabad
Kelly Technologies
Stratosphere with big_data_analytics
Stratosphere with big_data_analytics
Avinash Pandu
Similar to Hadoop Hive Talk At IIT-Delhi
(20)
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Hive
Hive
Hive Apachecon 2008
Hive Apachecon 2008
Hadoop and Hive
Hadoop and Hive
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
Hive ICDE 2010
Hive ICDE 2010
Hive Percona 2009
Hive Percona 2009
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
An introduction to Hadoop for large scale data analysis
An introduction to Hadoop for large scale data analysis
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Meethadoop
Meethadoop
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
02 data warehouse applications with hive
02 data warehouse applications with hive
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
Hive User Meeting 2009 8 Facebook
Hive User Meeting 2009 8 Facebook
Hadoop institutes in hyderabad
Hadoop institutes in hyderabad
Stratosphere with big_data_analytics
Stratosphere with big_data_analytics
Recently uploaded
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
sammart93
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
rafiqahmad00786416
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
apidays
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Deepika Singh
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
apidays
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
DianaGray10
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Juan lago vázquez
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
The Digital Insurer
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Zilliz
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
Overkill Security
Recently uploaded
(20)
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
Hadoop Hive Talk At IIT-Delhi
1.
Hadoop and Hive
Large Scale Data Processing using Commodity HW/SW Joydeep Sen Sarma
2.
3.
4.
5.
Looks like this
.. Disks Node Disks Node Disks Node Disks Node Disks Node Disks Node 1 Gigabit 4-8 Gigabit Node = DataNode + Map-Reduce
6.
7.
In pictures ..
NameNode Disks 32GB RAM Secondary NameNode Disks 32GB RAM DataNode DataNode DataNode DFS Client DataNode DataNode DataNode getLocations locations
8.
9.
10.
Map/Reduce DataFLow
11.
12.
13.
HIVE: Components HDFS
Hive CLI DDL Queries Browsing Map Reduce MetaStore Thrift API SerDe Thrift Jute JSON.. Execution Hive QL Parser Planner Mgmt. Web UI
14.
Data Model Logical
Partitioning Hash Partitioning Schema Library clicks HDFS MetaStore / hive/clicks /hive/clicks/ds=2008-03-25 /hive/clicks/ds=2008-03-25/0 … Tables #Buckets=32 Bucketing Info Partitioning Cols
15.
16.
17.
18.
Hive QL –
Join in Map Reduce page_view user pv_users Map Shuffle Sort Reduce key value 111 < 1, 1> 111 < 1, 2> 222 < 1, 1> pageid userid time 1 111 9:08:01 2 111 9:08:13 1 222 9:08:14 userid age gender 111 25 female 222 32 male key value 111 < 2, 25> 222 < 2, 32> key value 111 < 1, 1> 111 < 1, 2> 111 < 2, 25> key value 222 < 1, 1> 222 < 2, 32> pageid age 1 25 2 25 pageid age 1 32
19.
20.
21.
22.
23.
Hive QL –
Group By in Map Reduce pv_users Map Shuffle Sort Reduce pageid age 1 25 2 25 pageid age count 1 25 1 1 32 1 pageid age 1 32 2 25 key value <1,25> 1 <2,25> 1 key value <1,32> 1 <2,25> 1 key value <1,25> 1 <1,32> 1 key value <2,25> 1 <2,25> 1 pageid age count 2 25 2
24.
25.
Hive QL –
Group By with Distinct in Map Reduce page_view Shuffle and Sort Reduce Map Reduce pageid count 1 1 2 1 pageid count 1 1 pageid userid time 1 111 9:08:01 2 111 9:08:13 pageid userid time 1 222 9:08:14 2 111 9:08:20 key v <1,111> <2,111> <2,111> key v <1,222> pageid count 1 2 pageid count 2 1
26.
27.
28.
29.
30.
31.
32.
Data Warehousing at
Facebook Today Web Servers Scribe Servers Filers Hive on Hadoop Cluster Oracle RAC Federated MySQL
33.
34.
In Pictures
35.
36.
37.
Editor's Notes
Offline and Near-Real time data processing Not online
Download now