SlideShare ist ein Scribd-Unternehmen logo
1 von 14
FlyData: Amazon Redshift
BENCHMARK Series 01
Amazon Redshift is
10x faster and cheaper
than Hadoop + Hive
Comparisons of speed and cost efficiency
www.flydata.com
Amazon Redshift took 155 seconds to run our queries for
1.2TB data
Hadoop + Hive took 1491 seconds to run our queries for
1.2TB data
Amazon Redshift was 10X faster
Amazon Redshift cost $20 to run a query every 30 minutes
Hadoop + Hive took $210 to run a query every 30 minutes
Amazon Redshift was 10X cost effective
www.flydata.com
Amazon Redshift is a new data warehouse for big
data on the cloud. Before Redshift, users had to turn
to Hadoop for querying over TBs of data.
We have run benchmarks to compare Redshift to
Hadoop (Amazon Elastic MapReduce), both on
AWS environments, specifically to show differences
for advertisement agencies.
• Between 100GB to ~50TB
• Frequent query (more than once an hour)
• Short turn around time required
www.flydata.com
Prerequisite - Data
TSV files, gzip compressed
Imp_lo
g
1) 300GB / 300M
record
2) 1.2TB / 1.2B record date datetime
publisher_id integer
ad_campaign_id integer
bid_price real
country varchar(30)
attr1-4 varchar(255)
click_l
og
1) 1.4GB / 1.5M
record
2) 5.6GB / 6M recorddate datetime
publisher_id integer
ad_campaign_id integer
country varchar(30)
attr1-4 varchar(255)
1) for 1 month
2) for 4
months
ad_campai
gn
100MB / 100k
record
publish
er
10MB / 10k
record
advertis
er
10MB / 10k
record
We use 5 tables to run a query which join tables and creates a report.
www.flydata.com
1. Query Speed
• Redshift takes 155
seconds to
complete our query
for 1.2TB
• Hadoop takes
1491 seconds to
complete our query
for 1.2TB
• Redshift is about
10 times faster
than Hadoop for
this query
Here, we are comparing Hadoop and Redshift servers of the same cost. (Hadoop: c1.xlarge vs Redshift:
dw.hs1.xlarge).
672sec
38sec
155sec
1491sec
* The query used can be referenced in our Appendix
www.flydata.com
2. Total Cost
• Redshift costs $20
per month to run
queries every 30
minutes
• Hadoop costs $210
per month to run
queries every 30
minutes
• Redshift is about
10 times cheaper
than Hadoop to run
this job
Here, we are comparing Hadoop and Redshift servers running the same query for the same duration of
time.
* The query used can be referenced in our Appendix
www.flydata.com
Redshift Query Result
Data Size Instance Type
Number of
Instances
Trial
Processing
Time
Average Server Cost Per Day
300GB dw.hs1.xlarge 1
1 58s
38s $20.40
2 43s
3 31s
4 30s
5 30s
1.2TB dw.hs1.xlarge 1
1 164s
155s $20.40
2 149s
3 158s
4 156s
5 150s
* The query used can be referenced in our Appendix
www.flydata.com
Hadoop Query Result
Data Size Instance Type Instance Number Processing Time Server Cost Per Day
300GB
c1.xlarge 1 1h 23m 2s $0.80
c1.medium 10 37m 48s $0.89
c1.xlarge 10 11m 12s $1.06
1.2TB
m1.xlarge 1 6h 43m 24s $3.22
c1.medium 4 5h 14m 0s $3.04
c1.xlarge 10 37m 7s $3.58
c1.xlarge 20 24m 51s $4.64
* The query used can be referenced in our Appendix
www.flydata.com
Discussion
• Consider Redshift
– If your data is big (>TB) and you need to run your
queries more than once an hour
– If you want to get quick results
• Consider Hadoop (EMR)
– If your data is too big (>PB)
– If your job queries are once a day, week or month
– If you already have invested in Hadoop
technology specialists
www.flydata.com
appendix – Sample Query
select
ac.ad_campaign_id as ad_campaign_id,
adv.advertiser_id as advertiser_id,
cs.spending as spending,
ims.imp_total as imp_total,
cs.click_total as click_total,
click_total/imp_total as CTR,
spending/click_total as CPC,
spending/(imp_total/1000) as CPM
from
ad_campaigns ac
join
advertisers adv
on (ac.advertiser_id = adv.advertiser_id)
join
(select
il.ad_campaign_id,
count(*) as imp_total
from
imp_logs il
group by
il.ad_campaign_id
) ims on (ims.ad_campaign_id =
ac.ad_campaign_id)
join
(select
cl.ad_campaign_id,
sum(cl.bid_price) as spending,
count(*) as click_total
from
click_logs cl
group by
cl.ad_campaign_id
) cs on (cs.ad_campaign_id = ac.ad_campaign_id);
The query generates a basic report for ad campaigns performance, imp, click numbers,
advertiser spending, CTR, CPC and CPM.
www.flydata.com
APPENDIX - Additional Comments
• Redshift is good for an aggregate calculation such
as sum, average, max, min, etc. because it is a
columnar database
• Importing large amounts of data takes a lot of time
– 17 hours for 1.2TB in our case
– Continuous importing is useful
• Redshift supports only “Separated” formats like
CSV, TSV
– JSON is not supported
• Redshift supports only primitive data types
– 11 types, INT, DOUBLE, BOOLEAN, VARCHAR, DATE..
(as of Feb. 17,
2013)
www.flydata.com
APPENDIX – Additional Information
• All resources for our benchmark are on
our github repository
– https://github.com/hapyrus/redshift-
benchmark
– The dataset we use is open on S3, so you
can reproduce the benchmark
www.flydata.com
About Us - FlyData
• FlyData Enterprise
– Enables continuous loading to Amazon Redshift,
with real-time data loading
– Automated ETL process with multiple supported
data formats
– Auto scaling, data Integrity and high durability
– FlyData Sync feature allows real-time replication
from RDBMS to Amazon Redshift
Contact us at: info@flydata.com
We are an official data
integration partner of
Amazon Redshift
Formerly known as Hapyrus
www.flydata.com
www.flydata.com www.flydata.com
Check us out!
-> http://flydata.com
sales@flydata.com
Toll Free: 1-855-427-9787
http://flydata.com
We are an official data integration
partner of Amazon Redshift

Weitere ähnliche Inhalte

Andere mochten auch

Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHortonworks
 
Презентация Цейтлин Русинномед 26 сент 2011
Презентация Цейтлин  Русинномед 26 сент 2011Презентация Цейтлин  Русинномед 26 сент 2011
Презентация Цейтлин Русинномед 26 сент 2011Dmitry Tseitlin
 
Twitter Channel Presentation
Twitter Channel PresentationTwitter Channel Presentation
Twitter Channel PresentationLougan Bishop
 
How to 10X your Conversion
How to 10X your ConversionHow to 10X your Conversion
How to 10X your ConversionMatt Lerner
 
Better Together - Using Spark and Redshift to Combine Your Data with Public D...
Better Together - Using Spark and Redshift to Combine Your Data with Public D...Better Together - Using Spark and Redshift to Combine Your Data with Public D...
Better Together - Using Spark and Redshift to Combine Your Data with Public D...C4Media
 
Nielsen research: Social media impressions in Facebook ads
Nielsen research: Social media impressions in Facebook adsNielsen research: Social media impressions in Facebook ads
Nielsen research: Social media impressions in Facebook adsMitya Voskresensky
 
Business Intelligence on Hadoop Benchmark
Business Intelligence on Hadoop BenchmarkBusiness Intelligence on Hadoop Benchmark
Business Intelligence on Hadoop Benchmarkatscaleinc
 
Oracle 12c r1 installation on solaris 11.1
Oracle 12c r1 installation on solaris 11.1Oracle 12c r1 installation on solaris 11.1
Oracle 12c r1 installation on solaris 11.1Laurent Leturgez
 
10x Thinking - Leadership Development Session
10x Thinking - Leadership Development Session10x Thinking - Leadership Development Session
10x Thinking - Leadership Development SessionKarina Ananta
 
Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)
Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)
Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)Laurent Leturgez
 
Oracle 12c in memory en action
Oracle 12c in memory en actionOracle 12c in memory en action
Oracle 12c in memory en actionLaurent Leturgez
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionTanel Poder
 
NASA Commercial Crew Program 2014_04_14
NASA Commercial Crew Program 2014_04_14 NASA Commercial Crew Program 2014_04_14
NASA Commercial Crew Program 2014_04_14 Dmitry Tseitlin
 
AWS Innovate 2016 : Closing Keynote - Glenn Gore
AWS Innovate 2016 : Closing Keynote - Glenn GoreAWS Innovate 2016 : Closing Keynote - Glenn Gore
AWS Innovate 2016 : Closing Keynote - Glenn GoreAmazon Web Services Korea
 
AWS Innovate: Smart Deployment on AWS - Andy Kim
AWS Innovate: Smart Deployment on AWS - Andy KimAWS Innovate: Smart Deployment on AWS - Andy Kim
AWS Innovate: Smart Deployment on AWS - Andy KimAmazon Web Services Korea
 

Andere mochten auch (18)

Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Презентация Цейтлин Русинномед 26 сент 2011
Презентация Цейтлин  Русинномед 26 сент 2011Презентация Цейтлин  Русинномед 26 сент 2011
Презентация Цейтлин Русинномед 26 сент 2011
 
Twitter Channel Presentation
Twitter Channel PresentationTwitter Channel Presentation
Twitter Channel Presentation
 
How to 10X your Conversion
How to 10X your ConversionHow to 10X your Conversion
How to 10X your Conversion
 
Better Together - Using Spark and Redshift to Combine Your Data with Public D...
Better Together - Using Spark and Redshift to Combine Your Data with Public D...Better Together - Using Spark and Redshift to Combine Your Data with Public D...
Better Together - Using Spark and Redshift to Combine Your Data with Public D...
 
Nielsen research: Social media impressions in Facebook ads
Nielsen research: Social media impressions in Facebook adsNielsen research: Social media impressions in Facebook ads
Nielsen research: Social media impressions in Facebook ads
 
Hanganalyze presentation
Hanganalyze presentationHanganalyze presentation
Hanganalyze presentation
 
Business Intelligence on Hadoop Benchmark
Business Intelligence on Hadoop BenchmarkBusiness Intelligence on Hadoop Benchmark
Business Intelligence on Hadoop Benchmark
 
Oracle 12c r1 installation on solaris 11.1
Oracle 12c r1 installation on solaris 11.1Oracle 12c r1 installation on solaris 11.1
Oracle 12c r1 installation on solaris 11.1
 
10x Thinking - Leadership Development Session
10x Thinking - Leadership Development Session10x Thinking - Leadership Development Session
10x Thinking - Leadership Development Session
 
Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)
Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)
Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)
 
Oracle 12c in memory en action
Oracle 12c in memory en actionOracle 12c in memory en action
Oracle 12c in memory en action
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in Action
 
NASA Commercial Crew Program 2014_04_14
NASA Commercial Crew Program 2014_04_14 NASA Commercial Crew Program 2014_04_14
NASA Commercial Crew Program 2014_04_14
 
AWS Innovate 2016 : Closing Keynote - Glenn Gore
AWS Innovate 2016 : Closing Keynote - Glenn GoreAWS Innovate 2016 : Closing Keynote - Glenn Gore
AWS Innovate 2016 : Closing Keynote - Glenn Gore
 
AWS Innovate: Smart Deployment on AWS - Andy Kim
AWS Innovate: Smart Deployment on AWS - Andy KimAWS Innovate: Smart Deployment on AWS - Andy Kim
AWS Innovate: Smart Deployment on AWS - Andy Kim
 

Mehr von FlyData Inc.

What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?FlyData Inc.
 
What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?FlyData Inc.
 
Three Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data InfrastructureThree Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data InfrastructureFlyData Inc.
 
Cognitive Biases in Data Science
Cognitive Biases in Data ScienceCognitive Biases in Data Science
Cognitive Biases in Data ScienceFlyData Inc.
 
How to Extract Data from Amazon Redshift
How to Extract Data from Amazon RedshiftHow to Extract Data from Amazon Redshift
How to Extract Data from Amazon RedshiftFlyData Inc.
 
Amazon Redshift - Create an Amazon Redshift Cluster
Amazon Redshift - Create an Amazon Redshift ClusterAmazon Redshift - Create an Amazon Redshift Cluster
Amazon Redshift - Create an Amazon Redshift ClusterFlyData Inc.
 
The Internet of Things
The Internet of ThingsThe Internet of Things
The Internet of ThingsFlyData Inc.
 
Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!FlyData Inc.
 
Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData FlyData Inc.
 
FlyData Autoload: 事例集
FlyData Autoload: 事例集FlyData Autoload: 事例集
FlyData Autoload: 事例集FlyData Inc.
 
Scalability of Amazon Redshift Data Loading and Query Speed
Scalability of Amazon Redshift Data Loading and Query SpeedScalability of Amazon Redshift Data Loading and Query Speed
Scalability of Amazon Redshift Data Loading and Query SpeedFlyData Inc.
 
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
Amazon Redshift ベンチマーク  Hadoop + Hiveと比較 Amazon Redshift ベンチマーク  Hadoop + Hiveと比較
Amazon Redshift ベンチマーク Hadoop + Hiveと比較 FlyData Inc.
 

Mehr von FlyData Inc. (12)

What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
 
What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?
 
Three Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data InfrastructureThree Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data Infrastructure
 
Cognitive Biases in Data Science
Cognitive Biases in Data ScienceCognitive Biases in Data Science
Cognitive Biases in Data Science
 
How to Extract Data from Amazon Redshift
How to Extract Data from Amazon RedshiftHow to Extract Data from Amazon Redshift
How to Extract Data from Amazon Redshift
 
Amazon Redshift - Create an Amazon Redshift Cluster
Amazon Redshift - Create an Amazon Redshift ClusterAmazon Redshift - Create an Amazon Redshift Cluster
Amazon Redshift - Create an Amazon Redshift Cluster
 
The Internet of Things
The Internet of ThingsThe Internet of Things
The Internet of Things
 
Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!
 
Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData
 
FlyData Autoload: 事例集
FlyData Autoload: 事例集FlyData Autoload: 事例集
FlyData Autoload: 事例集
 
Scalability of Amazon Redshift Data Loading and Query Speed
Scalability of Amazon Redshift Data Loading and Query SpeedScalability of Amazon Redshift Data Loading and Query Speed
Scalability of Amazon Redshift Data Loading and Query Speed
 
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
Amazon Redshift ベンチマーク  Hadoop + Hiveと比較 Amazon Redshift ベンチマーク  Hadoop + Hiveと比較
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
 

Kürzlich hochgeladen

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Amazon Redshift is 10x faster and cheaper than Hadoop + Hive

  • 1. FlyData: Amazon Redshift BENCHMARK Series 01 Amazon Redshift is 10x faster and cheaper than Hadoop + Hive Comparisons of speed and cost efficiency www.flydata.com
  • 2. Amazon Redshift took 155 seconds to run our queries for 1.2TB data Hadoop + Hive took 1491 seconds to run our queries for 1.2TB data Amazon Redshift was 10X faster Amazon Redshift cost $20 to run a query every 30 minutes Hadoop + Hive took $210 to run a query every 30 minutes Amazon Redshift was 10X cost effective www.flydata.com
  • 3. Amazon Redshift is a new data warehouse for big data on the cloud. Before Redshift, users had to turn to Hadoop for querying over TBs of data. We have run benchmarks to compare Redshift to Hadoop (Amazon Elastic MapReduce), both on AWS environments, specifically to show differences for advertisement agencies. • Between 100GB to ~50TB • Frequent query (more than once an hour) • Short turn around time required www.flydata.com
  • 4. Prerequisite - Data TSV files, gzip compressed Imp_lo g 1) 300GB / 300M record 2) 1.2TB / 1.2B record date datetime publisher_id integer ad_campaign_id integer bid_price real country varchar(30) attr1-4 varchar(255) click_l og 1) 1.4GB / 1.5M record 2) 5.6GB / 6M recorddate datetime publisher_id integer ad_campaign_id integer country varchar(30) attr1-4 varchar(255) 1) for 1 month 2) for 4 months ad_campai gn 100MB / 100k record publish er 10MB / 10k record advertis er 10MB / 10k record We use 5 tables to run a query which join tables and creates a report. www.flydata.com
  • 5. 1. Query Speed • Redshift takes 155 seconds to complete our query for 1.2TB • Hadoop takes 1491 seconds to complete our query for 1.2TB • Redshift is about 10 times faster than Hadoop for this query Here, we are comparing Hadoop and Redshift servers of the same cost. (Hadoop: c1.xlarge vs Redshift: dw.hs1.xlarge). 672sec 38sec 155sec 1491sec * The query used can be referenced in our Appendix www.flydata.com
  • 6. 2. Total Cost • Redshift costs $20 per month to run queries every 30 minutes • Hadoop costs $210 per month to run queries every 30 minutes • Redshift is about 10 times cheaper than Hadoop to run this job Here, we are comparing Hadoop and Redshift servers running the same query for the same duration of time. * The query used can be referenced in our Appendix www.flydata.com
  • 7. Redshift Query Result Data Size Instance Type Number of Instances Trial Processing Time Average Server Cost Per Day 300GB dw.hs1.xlarge 1 1 58s 38s $20.40 2 43s 3 31s 4 30s 5 30s 1.2TB dw.hs1.xlarge 1 1 164s 155s $20.40 2 149s 3 158s 4 156s 5 150s * The query used can be referenced in our Appendix www.flydata.com
  • 8. Hadoop Query Result Data Size Instance Type Instance Number Processing Time Server Cost Per Day 300GB c1.xlarge 1 1h 23m 2s $0.80 c1.medium 10 37m 48s $0.89 c1.xlarge 10 11m 12s $1.06 1.2TB m1.xlarge 1 6h 43m 24s $3.22 c1.medium 4 5h 14m 0s $3.04 c1.xlarge 10 37m 7s $3.58 c1.xlarge 20 24m 51s $4.64 * The query used can be referenced in our Appendix www.flydata.com
  • 9. Discussion • Consider Redshift – If your data is big (>TB) and you need to run your queries more than once an hour – If you want to get quick results • Consider Hadoop (EMR) – If your data is too big (>PB) – If your job queries are once a day, week or month – If you already have invested in Hadoop technology specialists www.flydata.com
  • 10. appendix – Sample Query select ac.ad_campaign_id as ad_campaign_id, adv.advertiser_id as advertiser_id, cs.spending as spending, ims.imp_total as imp_total, cs.click_total as click_total, click_total/imp_total as CTR, spending/click_total as CPC, spending/(imp_total/1000) as CPM from ad_campaigns ac join advertisers adv on (ac.advertiser_id = adv.advertiser_id) join (select il.ad_campaign_id, count(*) as imp_total from imp_logs il group by il.ad_campaign_id ) ims on (ims.ad_campaign_id = ac.ad_campaign_id) join (select cl.ad_campaign_id, sum(cl.bid_price) as spending, count(*) as click_total from click_logs cl group by cl.ad_campaign_id ) cs on (cs.ad_campaign_id = ac.ad_campaign_id); The query generates a basic report for ad campaigns performance, imp, click numbers, advertiser spending, CTR, CPC and CPM. www.flydata.com
  • 11. APPENDIX - Additional Comments • Redshift is good for an aggregate calculation such as sum, average, max, min, etc. because it is a columnar database • Importing large amounts of data takes a lot of time – 17 hours for 1.2TB in our case – Continuous importing is useful • Redshift supports only “Separated” formats like CSV, TSV – JSON is not supported • Redshift supports only primitive data types – 11 types, INT, DOUBLE, BOOLEAN, VARCHAR, DATE.. (as of Feb. 17, 2013) www.flydata.com
  • 12. APPENDIX – Additional Information • All resources for our benchmark are on our github repository – https://github.com/hapyrus/redshift- benchmark – The dataset we use is open on S3, so you can reproduce the benchmark www.flydata.com
  • 13. About Us - FlyData • FlyData Enterprise – Enables continuous loading to Amazon Redshift, with real-time data loading – Automated ETL process with multiple supported data formats – Auto scaling, data Integrity and high durability – FlyData Sync feature allows real-time replication from RDBMS to Amazon Redshift Contact us at: info@flydata.com We are an official data integration partner of Amazon Redshift Formerly known as Hapyrus www.flydata.com
  • 14. www.flydata.com www.flydata.com Check us out! -> http://flydata.com sales@flydata.com Toll Free: 1-855-427-9787 http://flydata.com We are an official data integration partner of Amazon Redshift