Submit Search
Upload
Hadoop Hive Talk At IIT-Delhi
•
Download as PPT, PDF
•
14 likes
•
3,829 views
Joydeep Sen Sarma
Follow
Talk at the CS department in IIT 04/02/09.
Read less
Read more
Technology
Report
Share
Report
Share
1 of 37
Download now
Recommended
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
Joydeep Sen Sarma
Nextag talk
Nextag talk
Joydeep Sen Sarma
Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012
Joydeep Sen Sarma
Cloud Optimized Big Data
Cloud Optimized Big Data
Joydeep Sen Sarma
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
Joydeep Sen Sarma
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
Messaging architecture @FB (Fifth Elephant Conference)
Messaging architecture @FB (Fifth Elephant Conference)
Joydeep Sen Sarma
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet
Recommended
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
Joydeep Sen Sarma
Nextag talk
Nextag talk
Joydeep Sen Sarma
Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012
Joydeep Sen Sarma
Cloud Optimized Big Data
Cloud Optimized Big Data
Joydeep Sen Sarma
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
Joydeep Sen Sarma
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
Messaging architecture @FB (Fifth Elephant Conference)
Messaging architecture @FB (Fifth Elephant Conference)
Joydeep Sen Sarma
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
liuknag
Hadoop Primer
Hadoop Primer
Steve Staso
Hadoop - Overview
Hadoop - Overview
Jay
Big Data Journey
Big Data Journey
Tugdual Grall
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Cloudera, Inc.
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Databricks
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
An intriduction to hive
An intriduction to hive
Reza Ameri
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
royans
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
Apache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
Hadoop User Group
מיכאל
מיכאל
sqlserver.co.il
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Bill Liu
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
Vinoth Chandar
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
DataWorks Summit/Hadoop Summit
Apache Hadoop 1.1
Apache Hadoop 1.1
Sperasoft
Big Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
Rajkumar Singh
Hadoop Tutorial
Hadoop Tutorial
awesomesos
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Zheng Shao
Hive
Hive
Srinath Reddy
More Related Content
What's hot
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
liuknag
Hadoop Primer
Hadoop Primer
Steve Staso
Hadoop - Overview
Hadoop - Overview
Jay
Big Data Journey
Big Data Journey
Tugdual Grall
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Cloudera, Inc.
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Databricks
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
An intriduction to hive
An intriduction to hive
Reza Ameri
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
royans
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
Apache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
Hadoop User Group
מיכאל
מיכאל
sqlserver.co.il
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Bill Liu
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
Vinoth Chandar
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
DataWorks Summit/Hadoop Summit
Apache Hadoop 1.1
Apache Hadoop 1.1
Sperasoft
Big Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
Rajkumar Singh
Hadoop Tutorial
Hadoop Tutorial
awesomesos
What's hot
(20)
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
Hadoop Primer
Hadoop Primer
Hadoop - Overview
Hadoop - Overview
Big Data Journey
Big Data Journey
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
An intriduction to hive
An intriduction to hive
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Apache Hadoop and HBase
Apache Hadoop and HBase
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
מיכאל
מיכאל
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
Apache Hadoop 1.1
Apache Hadoop 1.1
Big Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
Hadoop Tutorial
Hadoop Tutorial
Similar to Hadoop Hive Talk At IIT-Delhi
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Zheng Shao
Hive
Hive
Srinath Reddy
Hive Apachecon 2008
Hive Apachecon 2008
athusoo
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
Jeff Hammerbacher
Hadoop and Hive
Hadoop and Hive
Zheng Shao
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
nzhang
Hive ICDE 2010
Hive ICDE 2010
ragho
Hive Percona 2009
Hive Percona 2009
prasadc
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
nzhang
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Yahoo Developer Network
An introduction to Hadoop for large scale data analysis
An introduction to Hadoop for large scale data analysis
Abhijit Sharma
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Zheng Shao
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Namit Jain
Meethadoop
Meethadoop
IIIT-H
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Xiao Qin
02 data warehouse applications with hive
02 data warehouse applications with hive
Subhas Kumar Ghosh
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
ragho
Hive User Meeting 2009 8 Facebook
Hive User Meeting 2009 8 Facebook
Zheng Shao
Hadoop institutes in hyderabad
Hadoop institutes in hyderabad
Kelly Technologies
Stratosphere with big_data_analytics
Stratosphere with big_data_analytics
Avinash Pandu
Similar to Hadoop Hive Talk At IIT-Delhi
(20)
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Hive
Hive
Hive Apachecon 2008
Hive Apachecon 2008
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
Hadoop and Hive
Hadoop and Hive
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
Hive ICDE 2010
Hive ICDE 2010
Hive Percona 2009
Hive Percona 2009
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
An introduction to Hadoop for large scale data analysis
An introduction to Hadoop for large scale data analysis
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Meethadoop
Meethadoop
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
02 data warehouse applications with hive
02 data warehouse applications with hive
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
Hive User Meeting 2009 8 Facebook
Hive User Meeting 2009 8 Facebook
Hadoop institutes in hyderabad
Hadoop institutes in hyderabad
Stratosphere with big_data_analytics
Stratosphere with big_data_analytics
Recently uploaded
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Wonjun Hwang
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
2toLead Limited
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Rizwan Syed
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
charlottematthew16
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Mark Billinghurst
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Manik S Magar
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
gvaughan
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
SeasiaInfotech2
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
Alfredo García Lavilla
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Florian Wilhelm
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Miki Katsuragi
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Alex Barbosa Coqueiro
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Hervé Boutemy
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Padma Pradeep
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Kalema Edgar
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Zilliz
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Commit University
Training state-of-the-art general text embedding
Training state-of-the-art general text embedding
Zilliz
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Addepto
Recently uploaded
(20)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Training state-of-the-art general text embedding
Training state-of-the-art general text embedding
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Hadoop Hive Talk At IIT-Delhi
1.
Hadoop and Hive
Large Scale Data Processing using Commodity HW/SW Joydeep Sen Sarma
2.
3.
4.
5.
Looks like this
.. Disks Node Disks Node Disks Node Disks Node Disks Node Disks Node 1 Gigabit 4-8 Gigabit Node = DataNode + Map-Reduce
6.
7.
In pictures ..
NameNode Disks 32GB RAM Secondary NameNode Disks 32GB RAM DataNode DataNode DataNode DFS Client DataNode DataNode DataNode getLocations locations
8.
9.
10.
Map/Reduce DataFLow
11.
12.
13.
HIVE: Components HDFS
Hive CLI DDL Queries Browsing Map Reduce MetaStore Thrift API SerDe Thrift Jute JSON.. Execution Hive QL Parser Planner Mgmt. Web UI
14.
Data Model Logical
Partitioning Hash Partitioning Schema Library clicks HDFS MetaStore / hive/clicks /hive/clicks/ds=2008-03-25 /hive/clicks/ds=2008-03-25/0 … Tables #Buckets=32 Bucketing Info Partitioning Cols
15.
16.
17.
18.
Hive QL –
Join in Map Reduce page_view user pv_users Map Shuffle Sort Reduce key value 111 < 1, 1> 111 < 1, 2> 222 < 1, 1> pageid userid time 1 111 9:08:01 2 111 9:08:13 1 222 9:08:14 userid age gender 111 25 female 222 32 male key value 111 < 2, 25> 222 < 2, 32> key value 111 < 1, 1> 111 < 1, 2> 111 < 2, 25> key value 222 < 1, 1> 222 < 2, 32> pageid age 1 25 2 25 pageid age 1 32
19.
20.
21.
22.
23.
Hive QL –
Group By in Map Reduce pv_users Map Shuffle Sort Reduce pageid age 1 25 2 25 pageid age count 1 25 1 1 32 1 pageid age 1 32 2 25 key value <1,25> 1 <2,25> 1 key value <1,32> 1 <2,25> 1 key value <1,25> 1 <1,32> 1 key value <2,25> 1 <2,25> 1 pageid age count 2 25 2
24.
25.
Hive QL –
Group By with Distinct in Map Reduce page_view Shuffle and Sort Reduce Map Reduce pageid count 1 1 2 1 pageid count 1 1 pageid userid time 1 111 9:08:01 2 111 9:08:13 pageid userid time 1 222 9:08:14 2 111 9:08:20 key v <1,111> <2,111> <2,111> key v <1,222> pageid count 1 2 pageid count 2 1
26.
27.
28.
29.
30.
31.
32.
Data Warehousing at
Facebook Today Web Servers Scribe Servers Filers Hive on Hadoop Cluster Oracle RAC Federated MySQL
33.
34.
In Pictures
35.
36.
37.
Editor's Notes
Offline and Near-Real time data processing Not online
Download now