SlideShare ist ein Scribd-Unternehmen logo
1 von 25
BIG DATA PROJECT –
ROLLER COASTER TYCOON
DATA ANALYSIS
Priyansh Gupta –
19CSU225
Radhika Mongia–
19CSU233
Rashi Gupta – 19CSU241
OBJECTIVE
The objective of the project is to analyze the
Roller Coaster Dataset and solve various
queries using HIVE.
DATA FORMAT: ALL THE FIELDS ARE COMMA-DELIMITED
1. park_id int,
2. theme string,
3. rollercoaster_type string,
4. custom_design int,
5. excitement double,
6. excitement_rating string,
7. intensity double,
8. intensity_rating string,
9. nausea double,
10. nausea_rating string,
11. max_speed double,
12. avg_speed double,
13. ride_time int,
14. ride_length int,
15. max_pos_gs double,
16. max_neg_gs double,
17. max_lateral_gs double,
18. total_air_time double,
19. drops int,
20. highest_drop_height int,
21. inversions int
DATA SET
● Load data into Hive table from our local file system
Load Data Local Inpath
'/home/cloudera/Desktop/rollercoasters.csv'
Overwrite Into table rollercoaster;
Question 1
● Number of rollercoaster type based on excitement and nausea and also print theme name
select theme, excitement_rating, nausea_rating , count(rollercoaster_type) from
rollercoaster group by
excitement_rating, nausea_rating,theme;
Question 2
● No. of rollercoaster where grouping based on excitement level and drop height
a) where excitement level is highest(very high) and drop_height>50
b) where excitement level is high and drop_height>50 and also print the park_id.
a) select excitement_rating, highest_drop_height , count(rollercoaster_type) from
rollercoaster
where excitement_rating = 'Very High' and
highest_drop_height > 50 group by excitement_rating, highest_drop_height;Q
b) select park_id,excitement_rating,highest_drop_height ,
count(rollercoaster_type)
from rollercoaster group by park_id, excitement_rating,
highest_drop_height
having excitement_rating = 'High' and highest_drop_height > 50;
Question 3
a) Find out the name of rollercoaster_type, excitement_level intensity _level and nausea_level where
total_air_time is max and
b) Find out the total_air_time of that rows whose excitement_level intensity _level and nausea_level is
similar to row where total_air_time is maximum.
a) select distinct rollercoaster_type,excitement_rating,intensity_rating ,
nausea_rating
from rollercoaster as r1 where r1.total_air_time in (select
max(r2.total_air_time) from rollercoaster as r2);
b) select r1.total_air_time from rollercoaster as r1 inner join
(select distinct excitement_rating,intensity_rating , nausea_rating from
rollercoaster as rc where
rc.total_air_time in (select max(total_air_time) from rollercoaster)) as t
on t.excitement_rating = r1.excitement_rating
and t.intensity_rating = r1.intensity_rating and t.nausea_rating = r1.nausea_rating;
Question 4
a) Find out the name of rollercoaster_type, excitement_level ,intensity _level and
nausea_level where avg_speed is max and
b) Compare the max_speed of those rows whose excitement_level intensity _level and
nausea_level is similar to row where avg_speed is maximum.
a) select rollercoaster_type,excitement_rating,intensity_rating,
nausea_rating from rollercoaster r1 where r1.avg_speed in
(select max(r2.avg_speed) from rollercoaster r2);
b) select r3.max_speed from rollercoaster as r3 inner join (select
excitement_rating,intensity_rating,
nausea_rating from rollercoaster r1 where r1.avg_speed
in(select max(r2.avg_speed) from rollercoaster r2) ) as t
on r3.intensity_rating = t.intensity_rating
and r3.excitement_rating = t.excitement_rating and r3.nausea_rating =
t.nausea_rating;
Question 5
● Find out the parkid and rollercoaster type where no of drop is greater than
10 and have same excitement _level.
select x.park_id,x.rollercoaster_type from (select park_id,rollercoaster_type
,excitement_rating
from rollercoaster where drops>10 group by park_id,rollercoaster_type
,excitement_rating) as x;
Question 6
● Group rollercoaster_type based on custom_design where excitement level
is high.
select custom_design ,rollercoaster_type from rollercoaster where
excitement_rating = 'High' group by
custom_design,rollercoaster_type;
Question 7
● If ride_length is greater than 2000 and max_speed is greater than 50 so what is the level of
excitement and nausea.
Select distinct excitement_rating,nausea_rating from rollercoaster where
ride_length > 2000 and max_speed >50;
Question 8
● Park_name(theme) where atleast 2 rides excitement level is high.
Select x.theme from (select theme,count(excitement_rating) from rollercoaster
where excitement_rating = 'High'
group by theme having count(excitement_rating)>=2 ) as x;
Question 9
● In which roller coaster ride excitement level and avg_speed is highest.
Select rollercoaster_type from rollercoaster r1 where excitement_rating = 'Very
High'
and r1.avg_speed in (select max(r2.avg_speed ) from rollercoaster as r2 );
Question 10
● Name of Rollercoaster where total_air_time is greater than 5 but still excitement_level is not very
high.
Select rollercoaster_type from rollercoaster where total_air_time>5 and
excitement_rating <> 'Very High';
Question 11
● If ride_length is greater than 2000 then find out avg_speed and excitement_level , group
excitement_level based on avg_speed >10.
select excitement_rating,avg_speed from rollercoaster
where ride_length >2000 and avg_speed>10 group by excitement_rating,avg_speed;
Question 12
● When max_pos> 3 and max_neg is >-2 then find out the name of rollercoaster where
intensity_level is greater than excitement_level.
Select rollercoaster_type from rollercoaster where max_pos_gs>3 and
max_neg_gs>-2 and intensity > excitement;
Question 13
● When max_pos>= 3 and max_neg is >=-2 count the no of rollercoaster grouping based on
a) Intensity_level greater than equal or less than excitement_level and
b) Find out the same when max_pos>= 4 and max_neg is >=1 condition is not true.
a) Select count(distinct(rollercoaster_type)) ,intensity_rating from rollercoaster
where max_pos_gs>=3 and max_neg_gs>=-2 and intensity > excitement group by
intensity_rating;
b) Select count(distinct(rollercoaster_type)) ,intensity_rating from
rollercoaster
where max_pos_gs<4 and max_neg_gs<1 and intensity > excitement group by
intensity_rating;
Question 14
● When nausea_level is low that what is the value of excitement_level.
select distinct excitement_rating from rollercoaster where nausea_rating =
'Low';
Question 15
● Group rollercoaster_type based on custom_design where intensity level is very high and ride_length is
greater than 2000.
Select custom_design, rollercoaster_type from rollercoaster where intensity_rating=
'Very High'
and ride_length>2000 group by custom_design , rollercoaster_type ;
THANK YOU

Weitere ähnliche Inhalte

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

Empfohlen

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

BIG DATA PROJECT – ROLLER COASTER DATA ANALYSIS.pptx

  • 1. BIG DATA PROJECT – ROLLER COASTER TYCOON DATA ANALYSIS Priyansh Gupta – 19CSU225 Radhika Mongia– 19CSU233 Rashi Gupta – 19CSU241
  • 2. OBJECTIVE The objective of the project is to analyze the Roller Coaster Dataset and solve various queries using HIVE.
  • 3. DATA FORMAT: ALL THE FIELDS ARE COMMA-DELIMITED 1. park_id int, 2. theme string, 3. rollercoaster_type string, 4. custom_design int, 5. excitement double, 6. excitement_rating string, 7. intensity double, 8. intensity_rating string, 9. nausea double, 10. nausea_rating string, 11. max_speed double, 12. avg_speed double, 13. ride_time int, 14. ride_length int, 15. max_pos_gs double, 16. max_neg_gs double, 17. max_lateral_gs double, 18. total_air_time double, 19. drops int, 20. highest_drop_height int, 21. inversions int
  • 5. ● Load data into Hive table from our local file system Load Data Local Inpath '/home/cloudera/Desktop/rollercoasters.csv' Overwrite Into table rollercoaster;
  • 6. Question 1 ● Number of rollercoaster type based on excitement and nausea and also print theme name select theme, excitement_rating, nausea_rating , count(rollercoaster_type) from rollercoaster group by excitement_rating, nausea_rating,theme;
  • 7. Question 2 ● No. of rollercoaster where grouping based on excitement level and drop height a) where excitement level is highest(very high) and drop_height>50 b) where excitement level is high and drop_height>50 and also print the park_id. a) select excitement_rating, highest_drop_height , count(rollercoaster_type) from rollercoaster where excitement_rating = 'Very High' and highest_drop_height > 50 group by excitement_rating, highest_drop_height;Q
  • 8. b) select park_id,excitement_rating,highest_drop_height , count(rollercoaster_type) from rollercoaster group by park_id, excitement_rating, highest_drop_height having excitement_rating = 'High' and highest_drop_height > 50;
  • 9. Question 3 a) Find out the name of rollercoaster_type, excitement_level intensity _level and nausea_level where total_air_time is max and b) Find out the total_air_time of that rows whose excitement_level intensity _level and nausea_level is similar to row where total_air_time is maximum. a) select distinct rollercoaster_type,excitement_rating,intensity_rating , nausea_rating from rollercoaster as r1 where r1.total_air_time in (select max(r2.total_air_time) from rollercoaster as r2);
  • 10. b) select r1.total_air_time from rollercoaster as r1 inner join (select distinct excitement_rating,intensity_rating , nausea_rating from rollercoaster as rc where rc.total_air_time in (select max(total_air_time) from rollercoaster)) as t on t.excitement_rating = r1.excitement_rating and t.intensity_rating = r1.intensity_rating and t.nausea_rating = r1.nausea_rating;
  • 11. Question 4 a) Find out the name of rollercoaster_type, excitement_level ,intensity _level and nausea_level where avg_speed is max and b) Compare the max_speed of those rows whose excitement_level intensity _level and nausea_level is similar to row where avg_speed is maximum. a) select rollercoaster_type,excitement_rating,intensity_rating, nausea_rating from rollercoaster r1 where r1.avg_speed in (select max(r2.avg_speed) from rollercoaster r2);
  • 12. b) select r3.max_speed from rollercoaster as r3 inner join (select excitement_rating,intensity_rating, nausea_rating from rollercoaster r1 where r1.avg_speed in(select max(r2.avg_speed) from rollercoaster r2) ) as t on r3.intensity_rating = t.intensity_rating and r3.excitement_rating = t.excitement_rating and r3.nausea_rating = t.nausea_rating;
  • 13. Question 5 ● Find out the parkid and rollercoaster type where no of drop is greater than 10 and have same excitement _level. select x.park_id,x.rollercoaster_type from (select park_id,rollercoaster_type ,excitement_rating from rollercoaster where drops>10 group by park_id,rollercoaster_type ,excitement_rating) as x;
  • 14. Question 6 ● Group rollercoaster_type based on custom_design where excitement level is high. select custom_design ,rollercoaster_type from rollercoaster where excitement_rating = 'High' group by custom_design,rollercoaster_type;
  • 15. Question 7 ● If ride_length is greater than 2000 and max_speed is greater than 50 so what is the level of excitement and nausea. Select distinct excitement_rating,nausea_rating from rollercoaster where ride_length > 2000 and max_speed >50;
  • 16. Question 8 ● Park_name(theme) where atleast 2 rides excitement level is high. Select x.theme from (select theme,count(excitement_rating) from rollercoaster where excitement_rating = 'High' group by theme having count(excitement_rating)>=2 ) as x;
  • 17. Question 9 ● In which roller coaster ride excitement level and avg_speed is highest. Select rollercoaster_type from rollercoaster r1 where excitement_rating = 'Very High' and r1.avg_speed in (select max(r2.avg_speed ) from rollercoaster as r2 );
  • 18. Question 10 ● Name of Rollercoaster where total_air_time is greater than 5 but still excitement_level is not very high. Select rollercoaster_type from rollercoaster where total_air_time>5 and excitement_rating <> 'Very High';
  • 19. Question 11 ● If ride_length is greater than 2000 then find out avg_speed and excitement_level , group excitement_level based on avg_speed >10. select excitement_rating,avg_speed from rollercoaster where ride_length >2000 and avg_speed>10 group by excitement_rating,avg_speed;
  • 20. Question 12 ● When max_pos> 3 and max_neg is >-2 then find out the name of rollercoaster where intensity_level is greater than excitement_level. Select rollercoaster_type from rollercoaster where max_pos_gs>3 and max_neg_gs>-2 and intensity > excitement;
  • 21. Question 13 ● When max_pos>= 3 and max_neg is >=-2 count the no of rollercoaster grouping based on a) Intensity_level greater than equal or less than excitement_level and b) Find out the same when max_pos>= 4 and max_neg is >=1 condition is not true. a) Select count(distinct(rollercoaster_type)) ,intensity_rating from rollercoaster where max_pos_gs>=3 and max_neg_gs>=-2 and intensity > excitement group by intensity_rating;
  • 22. b) Select count(distinct(rollercoaster_type)) ,intensity_rating from rollercoaster where max_pos_gs<4 and max_neg_gs<1 and intensity > excitement group by intensity_rating;
  • 23. Question 14 ● When nausea_level is low that what is the value of excitement_level. select distinct excitement_rating from rollercoaster where nausea_rating = 'Low';
  • 24. Question 15 ● Group rollercoaster_type based on custom_design where intensity level is very high and ride_length is greater than 2000. Select custom_design, rollercoaster_type from rollercoaster where intensity_rating= 'Very High' and ride_length>2000 group by custom_design , rollercoaster_type ;