SlideShare ist ein Scribd-Unternehmen logo
1 von 30
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
Dancing Elephants:
Working with Object Storage in
Apache Spark and Hive
Steve Loughran
Sanjay Radia
April 2017
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
Why?
• No upfront hardware costs
• Data ingress for IoT, mobile apps
• Cost effective if sufficiently agile
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
Object Stores are a key part of agile cloud applications
⬢ It's the only persistent store for in-cloud clusters
⬢ Object stores are the source and final destination of work
⬢ Cross-application data storage
⬢ Asynchronous data exchange
⬢ External data sources
Also useful in physical clusters!
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
Cloud Storage Integration: Evolution for Agility
HDFS
Application
HDFS
Application
GoalEvolution towards cloud storage as the persistent Data Lake
Input Output
Backup Restore
Input
Output
Upload
HDFS
Application
Input
Output
tmp
AzureAWS –today
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
ORC
datasets
inbound
Elastic ETL
HDFS
external
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
datasets
external
Notebooks
library
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
Streaming
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
Danger:
Object stores are not filesystems∗
Cost & Geo-distribution over
Consistency and Performance
∗for more information, please re-read
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
A Filesystem: Directories, Files  Data
/
work
pending
part-00
part-01
00
00
00
01
01
01
complete
part-01
rename("/work/pending/part-01", "/work/complete")
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
Object Store: hash(name)⇒data
00
00
00
01
01
s01 s02
s03 s04
hash("/work/pending/part-01")
["s02", "s03", "s04"]
copy("/work/pending/part-01",
"/work/complete/part01")
01
01
01
01
delete("/work/pending/part-01")
hash("/work/pending/part-00")
["s01", "s02", "s04"]
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
Often: Eventually Consistent
00
00
00
01
01
s01 s02
s03 s04
01
DELETE /work/pending/part-00
GET /work/pending/part-00
GET /work/pending/part-00
200
200
200
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
The dangers of Eventual Consistency
⬢ Temp Data leftovers
⬢ List inconsistency means new data may not be visible
⬢ Lack of atomic rename() can leave output directories inconsistent
You can get bad data and not even notice
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
org.apache.hadoop.fs.FileSystem
hdfs s3awasb adlswift gs
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
s3:// —“inode on S3”
s3n://
“Native” S3
s3a:// Replaces s3n
swift://
OpenStack
wasb://
Azure WASB
Phase I: Stabilize S3A
oss://
Aliyun
gs://
Google Cloud
Phase II: speed & scale
adl://
Azure Data Lake
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
s3://
Amazon EMR S3
History of Object Storage Support
Phase III: scale & consistency
(proprietary)
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
Make Apache Hadoop
at home in the cloud
Step 1: Hadoop runs great on Azure
Step 2: Beat EMR on EC2
✔ ✔
✔
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
Problem: S3 Analytics
is too slow/broken
1. Analyze benchmarks and bug-reports
2. Fix Read path for Columnar Data
3. Fix Write path
4. Improve query partitioning
5. The Commitment Problem
getFileStatus()
read()
LLAP (single node) on AWS
TPC-DS queries at 200 GB
scale
readFully(pos)
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
HDP 2.6/Hadoop 2.8 transforms I/O performance!
// forward seek by skipping stream
fs.s3a.readahead.range=256K
// faster backward seek for Columnar Storage
fs.s3a.experimental.input.fadvise=random
// enhanced data upload
fs.s3a.fast.output.enabled=true
—see HADOOP-11694 for lots more!
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
benchmarks !=
your queries
your data
your VMs
your directory tree
…but we think we've made a good start
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
S3 Data Source 1TB TPCDS LLAP- vs Hive 1.x:
0
500
1,000
1,500
2,000
2,500
LLAP-1TB-TPCDS
Hive-1-1TB-TPCDS
1 TB TPC-DS ORC DataSet
3 x i2x4x Large (16 CPU x 122 GB RAM x 4 SSD)
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
Apache Spark
Object store work applies
Needs tuning
Commits to S3 "trouble"
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
spark-default.conf
spark.hadoop.fs.s3a.readahead.range 256K
spark.hadoop.fs.s3a.experimental.input.fadvise random
spark.sql.orc.filterPushdown true
spark.sql.orc.splits.include.file.footer true
spark.sql.orc.cache.stripe.details.size 10000
spark.sql.hive.metastorePartitionPruning true
spark.sql.parquet.filterPushdown true
spark.sql.parquet.mergeSchema false
spark.hadoop.parquet.enable.summary-metadata false
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
The S3 Commitment Problem
⬢ rename() depended upon for atomic transaction
⬢ Time to copy() + delete() proportional to data * files
⬢ Compared to Azure Storage, S3 is slow (6-10+ MB/s)
⬢ Intermediate data may be visible
⬢ Failures leave storage in unknown state
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
Spark's Direct Output Committer? Risk of Corruption of data
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
S3guard
Fast, consistent S3 metadata
HADOOP-13445
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
DynamoDB as fast, consistent metadata store
00
00
00
01
01
s01 s02
s03 s04
01
DELETE part-00
200
HEAD part-00
200
HEAD part-00
404
PUT part-00
200
00
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
Netflix Staging Committer
1. Saves output to local files file://
2. Task commit: upload to S3A as multipart PUT —but do not complete it
3. Job committer completes all uploads from successful tasks; cancels others.
Outcome:
⬢ No work visible until job is commited
⬢ Task commit time = data/bandwidth
⬢ Job commit time = POST * #files
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
Availability
 Read + Write in HDP 2.6 and Apache Hadoop 2.8
 S3Guard: preview of DDB integration soon
 Zero-rename commit: work in progress
Look to HDCloud for the latest work!
© Hortonworks Inc. 2011 – 2017 All Rights Reserved
Big thanks to:
Rajesh Balamohan
Mingliang Liu
Chris Nauroth
Dominik Bialek
Ram Venkatesh
Everyone in QE, RE
+ everyone who reviewed/tested, patches and
added their own, filed bug reports and measured
performance
© Hortonworks Inc. 2011 – 2017 All Rights Reserved3
1
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
Questions?
stevel@hortonworks.com @steveloughran
sanjay@hortonworks.com @srr

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
 
HDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceHDP-1 introduction for HUG France
HDP-1 introduction for HUG France
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Performance Update: When Apache ORC Met Apache Spark
Performance Update: When Apache ORC Met Apache SparkPerformance Update: When Apache ORC Met Apache Spark
Performance Update: When Apache ORC Met Apache Spark
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
ORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, SmallerORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, Smaller
 
ORC 2015
ORC 2015ORC 2015
ORC 2015
 
Apache Hive ACID Project
Apache Hive ACID ProjectApache Hive ACID Project
Apache Hive ACID Project
 
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
 
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
 

Ähnlich wie Dancing Elephants: Working with Object Storage in Apache Spark and Hive

Ähnlich wie Dancing Elephants: Working with Object Storage in Apache Spark and Hive (20)

Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseCloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
 
Accelerate Spark Workloads on S3
Accelerate Spark Workloads on S3Accelerate Spark Workloads on S3
Accelerate Spark Workloads on S3
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit Edition
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoop
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 

Mehr von Steve Loughran

2013 11-19-hoya-status
2013 11-19-hoya-status2013 11-19-hoya-status
2013 11-19-hoya-status
Steve Loughran
 

Mehr von Steve Loughran (20)

Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming Deployed
 
Testing
TestingTesting
Testing
 
I hate mocking
I hate mockingI hate mocking
I hate mocking
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony Era
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
 
YARN Services
YARN ServicesYARN Services
YARN Services
 
Datacentre stack
Datacentre stackDatacentre stack
Datacentre stack
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
 
Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!
 
2014 01-02-patching-workflow
2014 01-02-patching-workflow2014 01-02-patching-workflow
2014 01-02-patching-workflow
 
2013 11-19-hoya-status
2013 11-19-hoya-status2013 11-19-hoya-status
2013 11-19-hoya-status
 
Hoya for Code Review
Hoya for Code ReviewHoya for Code Review
Hoya for Code Review
 
Hadoop: Beyond MapReduce
Hadoop: Beyond MapReduceHadoop: Beyond MapReduce
Hadoop: Beyond MapReduce
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed Filesystem
 

Kürzlich hochgeladen

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 

Kürzlich hochgeladen (20)

%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 

Dancing Elephants: Working with Object Storage in Apache Spark and Hive

  • 1. © Hortonworks Inc. 2011 – 2017 All Rights Reserved Dancing Elephants: Working with Object Storage in Apache Spark and Hive Steve Loughran Sanjay Radia April 2017
  • 2. © Hortonworks Inc. 2011 – 2017 All Rights Reserved Why? • No upfront hardware costs • Data ingress for IoT, mobile apps • Cost effective if sufficiently agile
  • 3. © Hortonworks Inc. 2011 – 2017 All Rights Reserved Object Stores are a key part of agile cloud applications ⬢ It's the only persistent store for in-cloud clusters ⬢ Object stores are the source and final destination of work ⬢ Cross-application data storage ⬢ Asynchronous data exchange ⬢ External data sources Also useful in physical clusters!
  • 4. © Hortonworks Inc. 2011 – 2017 All Rights Reserved Cloud Storage Integration: Evolution for Agility HDFS Application HDFS Application GoalEvolution towards cloud storage as the persistent Data Lake Input Output Backup Restore Input Output Upload HDFS Application Input Output tmp AzureAWS –today
  • 5. © Hortonworks Inc. 2011 – 2017 All Rights Reserved ORC datasets inbound Elastic ETL HDFS external
  • 6. © Hortonworks Inc. 2011 – 2017 All Rights Reserved datasets external Notebooks library
  • 7. © Hortonworks Inc. 2011 – 2017 All Rights Reserved Streaming
  • 8. © Hortonworks Inc. 2011 – 2017 All Rights Reserved Danger: Object stores are not filesystems∗ Cost & Geo-distribution over Consistency and Performance ∗for more information, please re-read
  • 9. © Hortonworks Inc. 2011 – 2017 All Rights Reserved A Filesystem: Directories, Files  Data / work pending part-00 part-01 00 00 00 01 01 01 complete part-01 rename("/work/pending/part-01", "/work/complete")
  • 10. © Hortonworks Inc. 2011 – 2017 All Rights Reserved Object Store: hash(name)⇒data 00 00 00 01 01 s01 s02 s03 s04 hash("/work/pending/part-01") ["s02", "s03", "s04"] copy("/work/pending/part-01", "/work/complete/part01") 01 01 01 01 delete("/work/pending/part-01") hash("/work/pending/part-00") ["s01", "s02", "s04"]
  • 11. © Hortonworks Inc. 2011 – 2017 All Rights Reserved Often: Eventually Consistent 00 00 00 01 01 s01 s02 s03 s04 01 DELETE /work/pending/part-00 GET /work/pending/part-00 GET /work/pending/part-00 200 200 200
  • 12. © Hortonworks Inc. 2011 – 2017 All Rights Reserved The dangers of Eventual Consistency ⬢ Temp Data leftovers ⬢ List inconsistency means new data may not be visible ⬢ Lack of atomic rename() can leave output directories inconsistent You can get bad data and not even notice
  • 13. © Hortonworks Inc. 2011 – 2017 All Rights Reserved org.apache.hadoop.fs.FileSystem hdfs s3awasb adlswift gs
  • 14. © Hortonworks Inc. 2011 – 2017 All Rights Reserved s3:// —“inode on S3” s3n:// “Native” S3 s3a:// Replaces s3n swift:// OpenStack wasb:// Azure WASB Phase I: Stabilize S3A oss:// Aliyun gs:// Google Cloud Phase II: speed & scale adl:// Azure Data Lake 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 s3:// Amazon EMR S3 History of Object Storage Support Phase III: scale & consistency (proprietary)
  • 15. © Hortonworks Inc. 2011 – 2017 All Rights Reserved Make Apache Hadoop at home in the cloud Step 1: Hadoop runs great on Azure Step 2: Beat EMR on EC2 ✔ ✔ ✔
  • 16. © Hortonworks Inc. 2011 – 2017 All Rights Reserved Problem: S3 Analytics is too slow/broken 1. Analyze benchmarks and bug-reports 2. Fix Read path for Columnar Data 3. Fix Write path 4. Improve query partitioning 5. The Commitment Problem
  • 17. getFileStatus() read() LLAP (single node) on AWS TPC-DS queries at 200 GB scale readFully(pos)
  • 18. © Hortonworks Inc. 2011 – 2017 All Rights Reserved HDP 2.6/Hadoop 2.8 transforms I/O performance! // forward seek by skipping stream fs.s3a.readahead.range=256K // faster backward seek for Columnar Storage fs.s3a.experimental.input.fadvise=random // enhanced data upload fs.s3a.fast.output.enabled=true —see HADOOP-11694 for lots more!
  • 19. © Hortonworks Inc. 2011 – 2017 All Rights Reserved benchmarks != your queries your data your VMs your directory tree …but we think we've made a good start
  • 20. © Hortonworks Inc. 2011 – 2017 All Rights Reserved S3 Data Source 1TB TPCDS LLAP- vs Hive 1.x: 0 500 1,000 1,500 2,000 2,500 LLAP-1TB-TPCDS Hive-1-1TB-TPCDS 1 TB TPC-DS ORC DataSet 3 x i2x4x Large (16 CPU x 122 GB RAM x 4 SSD)
  • 21. © Hortonworks Inc. 2011 – 2017 All Rights Reserved Apache Spark Object store work applies Needs tuning Commits to S3 "trouble"
  • 22. © Hortonworks Inc. 2011 – 2017 All Rights Reserved spark-default.conf spark.hadoop.fs.s3a.readahead.range 256K spark.hadoop.fs.s3a.experimental.input.fadvise random spark.sql.orc.filterPushdown true spark.sql.orc.splits.include.file.footer true spark.sql.orc.cache.stripe.details.size 10000 spark.sql.hive.metastorePartitionPruning true spark.sql.parquet.filterPushdown true spark.sql.parquet.mergeSchema false spark.hadoop.parquet.enable.summary-metadata false
  • 23. © Hortonworks Inc. 2011 – 2017 All Rights Reserved The S3 Commitment Problem ⬢ rename() depended upon for atomic transaction ⬢ Time to copy() + delete() proportional to data * files ⬢ Compared to Azure Storage, S3 is slow (6-10+ MB/s) ⬢ Intermediate data may be visible ⬢ Failures leave storage in unknown state
  • 24. © Hortonworks Inc. 2011 – 2017 All Rights Reserved Spark's Direct Output Committer? Risk of Corruption of data
  • 25. © Hortonworks Inc. 2011 – 2017 All Rights Reserved S3guard Fast, consistent S3 metadata HADOOP-13445
  • 26. © Hortonworks Inc. 2011 – 2017 All Rights Reserved DynamoDB as fast, consistent metadata store 00 00 00 01 01 s01 s02 s03 s04 01 DELETE part-00 200 HEAD part-00 200 HEAD part-00 404 PUT part-00 200 00
  • 27. © Hortonworks Inc. 2011 – 2017 All Rights Reserved Netflix Staging Committer 1. Saves output to local files file:// 2. Task commit: upload to S3A as multipart PUT —but do not complete it 3. Job committer completes all uploads from successful tasks; cancels others. Outcome: ⬢ No work visible until job is commited ⬢ Task commit time = data/bandwidth ⬢ Job commit time = POST * #files
  • 28. © Hortonworks Inc. 2011 – 2017 All Rights Reserved Availability  Read + Write in HDP 2.6 and Apache Hadoop 2.8  S3Guard: preview of DDB integration soon  Zero-rename commit: work in progress Look to HDCloud for the latest work!
  • 29. © Hortonworks Inc. 2011 – 2017 All Rights Reserved Big thanks to: Rajesh Balamohan Mingliang Liu Chris Nauroth Dominik Bialek Ram Venkatesh Everyone in QE, RE + everyone who reviewed/tested, patches and added their own, filed bug reports and measured performance
  • 30. © Hortonworks Inc. 2011 – 2017 All Rights Reserved3 1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Questions? stevel@hortonworks.com @steveloughran sanjay@hortonworks.com @srr

Hinweis der Redaktion

  1. on prem = 1 azure @ 3 AWS @ 2
  2. This is one of the simplest deployments in cloud: scheduled/dynamic ETL. Incoming data sources saving to an object store; spark cluster brought up for ETL. Either direct cleanup/filter or multistep operations, but either way: an ETL pipeline. HDFS on the VMs for transient storage, the object store used as the destination for data —now in a more efficient format such as ORC or Parquet
  3. Notebooks on demand. ; it talks to spark in cloud which then does the work against external and internal data; Your notebook itself can be saved to the object store, for persistence and sharing.
  4. Example: streaming on Azure + on LHS add streaming
  5. In all the examples, object stores take a role which replaces HDFS. But this is dangerous, because...
  6. Everything usies the Hadoop APIs to talk to both HDFS, Hadoop Compatible Filesystems and object stores; the Hadoop FS API. There's actually two: the one with a clean split between client side and "driver side", and the older one which is a direct connect. Most use the latter and actually, in terms of opportunities for object store integration tweaking, this is actually the one where can innovate with the most easily. That is: there's nothing in the way. Under the FS API go filesystems and object stores. HDFS is "real" filesystem; WASB/Azure close enough. What is "real?". Best test: can support HBase.
  7. This is the history
  8. Simple goal. Make ASF hadoop at home in cloud infra. It's always been a bit of a mixed bag, and there's a lot with agility we need to address: things fail differently. Step 1: Azure. That's the work with Microsoft on wasb://; you can use Azure as a drop-in replacement for HDFS in Azure Step 2: EMR. More specifically, have the ASF Hadoop codebase get higher numbers than EMR
  9. Here's a flamegraph of LLAP (single node) with AWS+HDC for a set of TPC-DS queries at 200 GB scale; we should stick this up online only about 2% of time (optimised code) is doing S3 IO. Something at start partitioning data
  10. without going into the details, here are things you will want for Hadoop 2.8. They are in HDP 2.5, possible in the next CDH release. The first two boost input by reducing the cost of seeking, which is expensive as it breaks then re-opens the HTTPS connection. Readahead means that hundreds of KB can be skipped before that connect (yes, it can take that long to reconnect). The experimental fadvise random feature speeds up backward reads at the expense of pure-forward file reads. It is significantly faster for reading in optimized binary formats like ORC and Parquet The last one is a successor to fast upload in Hadoop 2.7. That buffers on heap and needs careful tuning; its memory needs conflict with RDD caching. The new version defaults to buffering as files on local disk, so won't run out of memory. Offers the potential of significantly more effective use of bandwidth; the resulting partitioned files may also offer higher read perf. (No data there, just hearsay).
  11. Don't run off saying "hey, 2x speedup". I'm confident we got HDP faster, EMR is still something we'd need to look at more. Data layout is still a major problem here; I think we are still understanding the implications of sharding and throttling. What we do know is that deep/shallow trees are pathological for recursive treewalks, and they end up storing data in the same s3 nodes, so throttling adjacent requests.
  12. And the result. Yes, currently we are faster in these benchmarks. Does that match to the outside world? If you use ORC & HIve, you will gain from the work we've done. There are still things which are pathologically bad, especially deep directory trees with few files
  13. This invariably ends up reaching us on JIRA, to the extent I've got a document somewhere explaining the problem in detail. It was taken away because it can corrupt your data, without you noticiing. This is generally considered harmful.