SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Yuan Zhou
Oct, 2018
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Legal Disclaimer & Optimization Notice
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more
complete information visit www.intel.com/benchmarks.
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTYRIGHTS IS GRANTED BY
THIS DOCUMENT. INTEL ASSUMES NO LIABILITYWHATSOEVERAND INTEL DISCLAIMS ANYEXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITYOR
WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY,OR INFRINGEMENT OF ANYPATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTYRIGHT.
Copyright © 2018, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other
countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the
availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent
optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture
are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the
specific instruction sets covered by this notice.
Notice revision #20110804
2
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Agenda
▪ Background
▪ Bigdata Analytics on Cloud Reference Architecture
▪ Bigdata Analytics on Cloud with Alluxio
▪ PersistentMemory for Alluxio
▪ Summary
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
5
background
Get a bigger cluster
for many teams to share.
Give each team their
own dedicated cluster,
each with a copy of
PBs of data.
Give teams ability to
spin-up/spin-down
clusters which can
share data sets.
#1 #2 #3
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
6
Benefits of compute and storage disaggregation
Independent scale
of CPU and storage
capacity
• Rightsized HW for
each layer
• Reduce resource
wastage
• Cost saving
Single copy of data
• Multiple compute
cluster share
common data
repo/lake
• Simplified data
management
• Reduced
provisioning
overhead
• Improve security
Enable Agile
application
development
• In-memory cloning
• Snapshot service
• Quick & efficient
copies
Hybrid cloud
deployment
• Mix and match
resources
depending on
workload nature
and life cycle
Simple and flexible
software
management
• Avoid software
version
management
• Upgrade compute
software only
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
7
Bigdata Analytics over Disaggregate Storage
20162006 201520142008
oss://adl://
wasb://gs://s3:// s3n://
s3a://
Hadoop Compatible File System abstraction layer: Unified storage API interface Hadoop fs –ls s3a://job/
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Workloads
Simple Read/Write
▪ DFSIO: TestDFSIO is the canonical example of a benchmarkthat attempts to measure the Storage's capacity for
reading and writing bulk data.
▪ Terasort: a popular benchmark that measures the amount of time to sort one terabyte of randomly distributed data
on a given computer system.
Batch Analytics
▪ To consistently executinganalytical process to process large set of data.
▪ Leveraging54 derivedfrom TPC-DS * queries with intensivereads across objects in different buckets
Data Transformation(not covered in this talk)
▪ ETL: Taking data as it is originally generatedand transforming it to a format (Parquet, ORC) that more tuned for
analytical workloads.
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
5x Compute Node
• Intel® Xeon™ processor E5-2699 v4 @
2.2GHz, 128GB mem
• 2x10G 82599 10Gb NIC
• 2x SSDs
• 3x Data storage (can be emliminated)
Software:
• Hadoop 2.8.1
• Spark 2.2.0
• Hive 2.2.1
• CentOS 7.5
5x Storage Node, 5 RGW nodes(co-located)
• Intel(R) Xeon(R) CPU E5-2699v4 2.20GHz
• 128GB Memory
• 3x 82599 10Gb NIC
• 7x 1TB HDD as data drive
• 1 OSD instances one each HDD
• CentOS 7.5
• Ceph Mimic
*Other names and brands may be
claimed as the property of others.
RGW1
OSD1
MON
OSD1 OSD7…
Hadoop
Hive
Spark
1x10Gb NIC
RGW2
OSD2
RGW3
OSD3
RGW4
OSD4
RGW5
OSD5
2x10Gb NIC
Hadoop
Hive
Spark
Hadoop
Hive
Spark
Hadoop
Hive
Spark
Hadoop
Hive
Spark
Head
DNS Server
10
System configurations
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Bigdata on Object Storage Performance Overview
• In Terasort workload, S3 showed 60% perf gap comparing with local HDFS since all
it’s very IO intensive
• In TPC-DS workload, S3 showed 30% perf gap, and in Kmeans workload, S3
showed 40% perf gap
1.0 1.0 1.0
0.7
0.4
0.6
0.9
1.3
1.0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
TPC-DS (54 quiries) TERASORT 1T KMEANS 374g
Performance Overview(Normalized)
spark(yarn) + LOCAL HDFS (HDD) spark(yarn) + S3 (HDD) spark(yarn) + REMOTE HDFS (HDD)
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
System Configuration(Alluxio) 5x Compute Node
Hardware:
• ntel® Xeon™ processor Gold 6140
@ 2.3GHz, 384GB Memory
• 1x 82599 10Gb NIC
• 5x P4500 SSD (2 for spark-shuffle)
Software:
• Hadoop 2.8.1
• Spark 2.2.0
• Hive 2.2.1
• CentOS 7.5
5x Storage Node
• Intel(R) Xeon(R) CPU Gold 6140 @
2.30GHz, 192GB Memory
• 2x 82599 10Gb NIC
• 7x 1TB HDD for Ceph bluestore or
HDFS namenode and datanode
Software:
• Hadoop 2.8.1
• Ceph 12.2.7
• CentOS 7.5
*Other names and brands may be
claimed as the property of others.
Hadoop
Hive
Spark
Alluxio
DNS Hadoop
Hive
Spark
Hadoop
Hive
Spark
Hadoop
Hive
Spark
Hadoop
Hive
Spark
CEPH
MON
RGW
REMOTE
HDFS
NN
1x10Gb NIC
Alluxio Alluxio Alluxio Alluxio
OSD DN
CEPH
RGW
REMOTE
HDFS
OSD DN
CEPH
RGW
REMOTE
HDFS
OSD DN
CEPH
RGW
REMOTE
HDFS
OSD DN
CEPH
RGW
REMOTE
HDFS
OSD DN
Alluxio Acceleration Layer
• 200GB Mem for mem mode
• 1TB SSD(P4500) for SSD mode
Software:
• Alluxio 1.8.0
Intel Confidential 13
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Performance w/ Alluxio Acceleration
• Alluxio based in memory acceleration layers provides significant performance boost for analytics
workloads with disaggregated storage
• Up to 3.25x for terasort
• Up to 1.8x compared with local HDFS
Intel Confidential 14
1.0 1.0 1.0
0.7
0.4
0.6
0.9
1.3
1.0
1.1
1.3 1.31.2
0.0
1.0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
Batch Query (54 quiries) TERASORT 1T KMEANS 374g
Alluxio Accleration for Disaggregated analytics storage with different workloads (Normalized)
spark(yarn) + LOCAL HDFS (HDD) spark(yarn) + S3 (HDD) spark(yarn) + REMOTE HDFS (HDD)
spark(yarn) + alluxio(SSD) + S3 (HDD) spark(yarn) + alluxio(MEM) + S3 (HDD)
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Spark-SQL Performance Overview
• The performance of first several queries in Warmup are worse than S3 baseline
• For most queries SSD cache does improved the performance
Spark-SQL performance W/ S3
S3 Warmup Hot
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Compute-side caching for I/O intensive queries
16
• Compute-side caching brings better efficiency(10% - 30%) for I/O intensive
queries
1.00 1.00 1.00 1.00 1.00 1.00 1.00
0.85
0.71
0.90
0.75 0.75
0.78
0.66
0.00
0.20
0.40
0.60
0.80
1.00
1.20
query19.sql query42.sql query43.sql query52.sql query55.sql query63.sql query68.sql
Seconds
I/O intensive query performance(normalized, lower is better)
S3 Compute-side Caching
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
0
20
40
60
80
100
120
0
365
730
1095
1460
1825
2190
2555
2920
3285
3650
4015
4380
AxisTitle
CPU Utilization
Average of
Average of
Average of
Average of
Average of
Average of
0
20
40
60
80
100
120
0
555
1110
1665
2220
2775
3330
3885
4440
4995
5551
6106
6661
AxisTitle
Compute CPU Utilization
Average of %idle
Average of %steal
Average of %iowait
Average of %nice
Average of %system
Average of %user
0
20
40
60
80
100
120
0
475
950
1425
1900
2375
2850
3325
3800
4275
4750
5225
5700
AxisTitle
Compute CPU Utilization
Average of %idle
Average of %steal
Average of %iowait
Average of %nice
Average of %system
Average of %user
Spark-SQL Performance Analysis
-CPU utilization
0
20
40
60
80
100
120
0
555
1110
1665
2220
2775
3331
3886
4441
4996
5551
6106
6661
AxisTitle
Storage CPU Utilization
Average of %idle
Average of %steal
Average of %iowait
Average of %nice
Average of %system
Average of %user
0
20
40
60
80
100
120
0
476
951
1426
1902
2377
2852
3327
3802
4277
4752
5227
5702
AxisTitle
Storage CPU Utilization
Average of %idle
Average of %steal
Average of %iowait
Average of %nice
Average of %system
Average of %user
0
20
40
60
80
100
120
0
365
730
1095
1460
1825
2190
2555
2920
3285
3650
4015
4380
AxisTitle
Storage CPU Utilization
Average of
Average of
Average of
Average of
Average of
Average of
S3 Warmup Hot
• CPU% on Compute nodes are higher
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
0
20
40
60
80
100
120
5
350
695
1040
1385
1730
2076
2421
2766
3111
3456
3801
4146
4491
4836
5181
5526
5871
AxisTitle
Compute Memory Utilization
Total
Spark-SQL Performance Analysis
-Memory utilization
21.4
21.6
21.8
22
22.2
22.4
22.6
22.8
5
405
805
1205
1605
2005
2405
2805
3205
3605
4005
4405
4805
5205
5605
6005
6405
6805
AxisTitle
Storage Memory Utilization
Total
21.4
21.6
21.8
22
22.2
22.4
22.6
22.8
23
23.2
5
350
695
1040
1385
1730
2075
2421
2766
3111
3456
3801
4146
4491
4836
5181
5526
5871
AxisTitle
Storage Memory Utilization
Total
21.89
21.9
21.91
21.92
21.93
21.94
21.95
21.96
21.97
21.98
5
285
565
845
1125
1405
1685
1965
2245
2525
2805
3085
3365
3645
3925
4205
4485
AxisTitle
Storage Memory Utilization
Total
0
20
40
60
80
100
120
5
270
535
800
1065
1330
1595
1860
2125
2390
2655
2920
3185
3450
3715
3980
4245
4510
AxisTitle
Compute Memory Utilization
Total
S3 Warmup Hot
0
10
20
30
40
50
60
70
80
90
5
385
765
1145
1525
1905
2285
2665
3045
3425
3805
4185
4565
4946
5326
5706
6086
6466
6846
AxisTitle
Compute Memory Utilization
Total
• MEM% on Compute nodes are higher
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Spark-SQL Performance Analysis
-Disk utilization
0
100000
200000
300000
400000
500000
600000
700000
-5
550
1105
1660
2215
2770
3325
3880
4435
4990
5545
6100
6655
AxisTitle
Compute Disk Bandwidth
Sum of rkB/s
Sum of wkB/s
0
50000
100000
150000
200000
250000
300000
-5
550
1105
1660
2215
2770
3325
3881
4436
4991
5546
6101
6656
AxisTitle
Storage Disk Bandwidth
Sum of rkB/s
Sum of wkB/s
0
50000
100000
150000
200000
250000
300000
350000
400000
-5
470
945
1420
1895
2371
2846
3321
3796
4271
4746
5221
5696
AxisTitle
Storage Disk Bandwidth
Sum of rkB/s
Sum of wkB/s
0
10000
20000
30000
40000
50000
60000
70000
80000
-5
360
725
1090
1455
1820
2185
2550
2915
3280
3645
4010
4375
AxisTitle
Storage Disk Bandwidth
Sum of rkB/s
Sum of wkB/s
0
200000
400000
600000
800000
1000000
1200000
-5
390
785
1180
1575
1970
2365
2760
3155
3550
3945
4340
AxisTitle
Compute Disk Bandwidth
Sum of rkB/s
Sum of wkB/s
0
200000
400000
600000
800000
1000000
1200000
-5
510
1025
1540
2055
2570
3085
3600
4115
4630
5146
5661
AxisTitle
Compute Disk Bandwidth
Sum of rkB/s
Sum of wkB/s
S3 Warmup Hot
• Local cache saved lots disk IO on
storage nodes
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Spark-SQL Performance Analysis
-Network utilization
0
200000
400000
600000
800000
1000000
1200000
0
555
1110
1665
2220
2775
3330
3885
4440
4995
5551
6106
6661
AxisTitle
Storage Network IO
Sum of rxkB/s
Sum of txkB/s
0
200000
400000
600000
800000
1000000
1200000
1400000
0
516
1031
1546
2061
2577
3092
3607
4122
4637
5152
5667
AxisTitle
Storage Network IO
Sum of rxkB/s
Sum of txkB/s
0
500
1000
1500
2000
2500
3000
0
340
680
1020
1360
1700
2040
2380
2720
3060
3400
3741
4081
4421
AxisTitle
Storage Network IO
Sum
Sum
0
200000
400000
600000
800000
1000000
1200000
0
365
730
1095
1460
1825
2190
2555
2920
3285
3650
4015
4380
AxisTitle
Compute Network IO
Sum
Sum
0
200000
400000
600000
800000
1000000
1200000
0
555
1110
1665
2220
2775
3330
3885
4440
4995
5550
6101
6656
AxisTitle
Compute Network IO
Sum of rxkB/s
Sum of txkB/s
0
200000
400000
600000
800000
1000000
1200000
0
515
1030
1545
2060
2575
3090
3605
4120
4635
5150
5665
AxisTitle
Compute Network IO
Sum of rxkB/s
Sum of txkB/s
S3 Warmup Hot
• Local cache saved a lots Network IO
between compute and storage nodes
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
• New S3a committer is designed for the “rename” issue
• Instead of using COPY + DELETE, the new committers relies MPU mechanism to
implement atomic rename on S3
• Staging committer is more simpler in general and also more mature as it’s from
production code of Netflix
• Hadoop 3.1.0 is released with “staging” and “magic” committer
• For the staging committer, Netflix has been used in production for years
• The Magic committer should be better by design, but it’s still WIP
S3a committer
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
• New S3a staging committer improved performance by ~50% than w/
original committer
S3a staging committer performance
1.0
1.3
0.4
0.6
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
spark(yarn) + LOCAL HDFS (HDD) spark(yarn) + REMOTE HDFS (HDD) spark(yarn) + S3 (HDD) spark(yarn) + S3 (HDD) + S3a
committer
S3a committer performance with Terasort 1T(normalized)
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Persistent Memory
• New media technologies that can offer byte-addressability persistence, along
with DRAM-like performance
• Different types
• NVDIMM
• Intel Optane DC Persistent Memory
• Intel Optane DC SSD(disk)
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Intel Optane DC Memory
Source:
https://www.snia.org/sites/default/files/SDC/2018/presentations/PM/Andy_Rudoff_Update
_SNIA_Persistent_Memory_Programming_Model.pdf
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Performance of DCPPM vs. NAND
Source: https://software.intel.com/en-us/persistent-memory
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
A new PMEM tier for Alluxio
Mem
SSD
HDD
Mem
PMEM
SSD
HDD
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
New PMEM tier for Alluxio
PMEM
Our Approach
worker
POSIX
IO
filesystem
Pagecache
DAX
filesystem
Memory
mapped file
client
Context Switches
POSIX
IO
Context Switches
workerclient workerclient
Userspace
Load/Store
PMDK
To bypass filesystem overhead and context switches
reader/writer reader/writer
reader/writer
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Summary
• Bigdata on cloud is functionality ready validated by industry standard
decision making workloads TPC-DS
• Bigdata on the Cloud delivers on-par performance with remote HDFS for
batch analytics, intensive write operations still need further optimizations
• The new S3a committer does improve the write performance a lot
• Compute-side caching(Alluxio) can provide significant performance improve
for various workloads
• Persistent memory opens a new era for storage and it can further improve
the performance
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Data Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and CloudData Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and Cloud
 
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data StoresPresto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
 
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + FluidSpeeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
Orchestrate a Data Symphony
Orchestrate a Data SymphonyOrchestrate a Data Symphony
Orchestrate a Data Symphony
 
How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
Presto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectivePresto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspective
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
 
Presto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
Presto: SQL-on-Anything. Netherlands Hadoop User Group MeetupPresto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
Presto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBMPowering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
 
Alluxio + Spark: Accelerating Auto Data Tagging in WeRide
Alluxio + Spark: Accelerating Auto Data Tagging in WeRideAlluxio + Spark: Accelerating Auto Data Tagging in WeRide
Alluxio + Spark: Accelerating Auto Data Tagging in WeRide
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTApache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Introduction to Dremio
Introduction to DremioIntroduction to Dremio
Introduction to Dremio
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 

Ähnlich wie Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and New Opportunities with Persistent Memory

Light-weighted HDFS disaster recovery
Light-weighted HDFS disaster recoveryLight-weighted HDFS disaster recovery
Light-weighted HDFS disaster recovery
DataWorks Summit
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Databricks
 
IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...
IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...
IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...
IBM India Smarter Computing
 

Ähnlich wie Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and New Opportunities with Persistent Memory (20)

QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and future
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?
 
Light-weighted HDFS disaster recovery
Light-weighted HDFS disaster recoveryLight-weighted HDFS disaster recovery
Light-weighted HDFS disaster recovery
 
Intel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Technologies for High Performance Computing
Intel Technologies for High Performance Computing
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
 
Accelerate Ceph performance via SPDK related techniques
Accelerate Ceph performance via SPDK related techniques Accelerate Ceph performance via SPDK related techniques
Accelerate Ceph performance via SPDK related techniques
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
 
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive... Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 
Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph
 
Ceph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and CephCeph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and Ceph
 
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
 
NFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkNFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function Framework
 
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
 
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
 
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
 
The Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPCThe Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPC
 
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
 
Big Data Intel® Platform
Big Data Intel® PlatformBig Data Intel® Platform
Big Data Intel® Platform
 
IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...
IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...
IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...
 

Mehr von Alluxio, Inc.

Mehr von Alluxio, Inc. (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 

Kürzlich hochgeladen

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 

Kürzlich hochgeladen (20)

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 

Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and New Opportunities with Persistent Memory

  • 2. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Legal Disclaimer & Optimization Notice Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks. INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTYRIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITYWHATSOEVERAND INTEL DISCLAIMS ANYEXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITYOR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY,OR INFRINGEMENT OF ANYPATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTYRIGHT. Copyright © 2018, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 2
  • 3. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Agenda ▪ Background ▪ Bigdata Analytics on Cloud Reference Architecture ▪ Bigdata Analytics on Cloud with Alluxio ▪ PersistentMemory for Alluxio ▪ Summary
  • 4. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
  • 5. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 5 background Get a bigger cluster for many teams to share. Give each team their own dedicated cluster, each with a copy of PBs of data. Give teams ability to spin-up/spin-down clusters which can share data sets. #1 #2 #3
  • 6. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 6 Benefits of compute and storage disaggregation Independent scale of CPU and storage capacity • Rightsized HW for each layer • Reduce resource wastage • Cost saving Single copy of data • Multiple compute cluster share common data repo/lake • Simplified data management • Reduced provisioning overhead • Improve security Enable Agile application development • In-memory cloning • Snapshot service • Quick & efficient copies Hybrid cloud deployment • Mix and match resources depending on workload nature and life cycle Simple and flexible software management • Avoid software version management • Upgrade compute software only
  • 7. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 7 Bigdata Analytics over Disaggregate Storage 20162006 201520142008 oss://adl:// wasb://gs://s3:// s3n:// s3a:// Hadoop Compatible File System abstraction layer: Unified storage API interface Hadoop fs –ls s3a://job/
  • 8. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
  • 9. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Workloads Simple Read/Write ▪ DFSIO: TestDFSIO is the canonical example of a benchmarkthat attempts to measure the Storage's capacity for reading and writing bulk data. ▪ Terasort: a popular benchmark that measures the amount of time to sort one terabyte of randomly distributed data on a given computer system. Batch Analytics ▪ To consistently executinganalytical process to process large set of data. ▪ Leveraging54 derivedfrom TPC-DS * queries with intensivereads across objects in different buckets Data Transformation(not covered in this talk) ▪ ETL: Taking data as it is originally generatedand transforming it to a format (Parquet, ORC) that more tuned for analytical workloads.
  • 10. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 5x Compute Node • Intel® Xeon™ processor E5-2699 v4 @ 2.2GHz, 128GB mem • 2x10G 82599 10Gb NIC • 2x SSDs • 3x Data storage (can be emliminated) Software: • Hadoop 2.8.1 • Spark 2.2.0 • Hive 2.2.1 • CentOS 7.5 5x Storage Node, 5 RGW nodes(co-located) • Intel(R) Xeon(R) CPU E5-2699v4 2.20GHz • 128GB Memory • 3x 82599 10Gb NIC • 7x 1TB HDD as data drive • 1 OSD instances one each HDD • CentOS 7.5 • Ceph Mimic *Other names and brands may be claimed as the property of others. RGW1 OSD1 MON OSD1 OSD7… Hadoop Hive Spark 1x10Gb NIC RGW2 OSD2 RGW3 OSD3 RGW4 OSD4 RGW5 OSD5 2x10Gb NIC Hadoop Hive Spark Hadoop Hive Spark Hadoop Hive Spark Hadoop Hive Spark Head DNS Server 10 System configurations
  • 11. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Bigdata on Object Storage Performance Overview • In Terasort workload, S3 showed 60% perf gap comparing with local HDFS since all it’s very IO intensive • In TPC-DS workload, S3 showed 30% perf gap, and in Kmeans workload, S3 showed 40% perf gap 1.0 1.0 1.0 0.7 0.4 0.6 0.9 1.3 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 TPC-DS (54 quiries) TERASORT 1T KMEANS 374g Performance Overview(Normalized) spark(yarn) + LOCAL HDFS (HDD) spark(yarn) + S3 (HDD) spark(yarn) + REMOTE HDFS (HDD)
  • 12. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
  • 13. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice System Configuration(Alluxio) 5x Compute Node Hardware: • ntel® Xeon™ processor Gold 6140 @ 2.3GHz, 384GB Memory • 1x 82599 10Gb NIC • 5x P4500 SSD (2 for spark-shuffle) Software: • Hadoop 2.8.1 • Spark 2.2.0 • Hive 2.2.1 • CentOS 7.5 5x Storage Node • Intel(R) Xeon(R) CPU Gold 6140 @ 2.30GHz, 192GB Memory • 2x 82599 10Gb NIC • 7x 1TB HDD for Ceph bluestore or HDFS namenode and datanode Software: • Hadoop 2.8.1 • Ceph 12.2.7 • CentOS 7.5 *Other names and brands may be claimed as the property of others. Hadoop Hive Spark Alluxio DNS Hadoop Hive Spark Hadoop Hive Spark Hadoop Hive Spark Hadoop Hive Spark CEPH MON RGW REMOTE HDFS NN 1x10Gb NIC Alluxio Alluxio Alluxio Alluxio OSD DN CEPH RGW REMOTE HDFS OSD DN CEPH RGW REMOTE HDFS OSD DN CEPH RGW REMOTE HDFS OSD DN CEPH RGW REMOTE HDFS OSD DN Alluxio Acceleration Layer • 200GB Mem for mem mode • 1TB SSD(P4500) for SSD mode Software: • Alluxio 1.8.0 Intel Confidential 13
  • 14. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Performance w/ Alluxio Acceleration • Alluxio based in memory acceleration layers provides significant performance boost for analytics workloads with disaggregated storage • Up to 3.25x for terasort • Up to 1.8x compared with local HDFS Intel Confidential 14 1.0 1.0 1.0 0.7 0.4 0.6 0.9 1.3 1.0 1.1 1.3 1.31.2 0.0 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 Batch Query (54 quiries) TERASORT 1T KMEANS 374g Alluxio Accleration for Disaggregated analytics storage with different workloads (Normalized) spark(yarn) + LOCAL HDFS (HDD) spark(yarn) + S3 (HDD) spark(yarn) + REMOTE HDFS (HDD) spark(yarn) + alluxio(SSD) + S3 (HDD) spark(yarn) + alluxio(MEM) + S3 (HDD)
  • 15. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Spark-SQL Performance Overview • The performance of first several queries in Warmup are worse than S3 baseline • For most queries SSD cache does improved the performance Spark-SQL performance W/ S3 S3 Warmup Hot
  • 16. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Compute-side caching for I/O intensive queries 16 • Compute-side caching brings better efficiency(10% - 30%) for I/O intensive queries 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.85 0.71 0.90 0.75 0.75 0.78 0.66 0.00 0.20 0.40 0.60 0.80 1.00 1.20 query19.sql query42.sql query43.sql query52.sql query55.sql query63.sql query68.sql Seconds I/O intensive query performance(normalized, lower is better) S3 Compute-side Caching
  • 17. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 0 20 40 60 80 100 120 0 365 730 1095 1460 1825 2190 2555 2920 3285 3650 4015 4380 AxisTitle CPU Utilization Average of Average of Average of Average of Average of Average of 0 20 40 60 80 100 120 0 555 1110 1665 2220 2775 3330 3885 4440 4995 5551 6106 6661 AxisTitle Compute CPU Utilization Average of %idle Average of %steal Average of %iowait Average of %nice Average of %system Average of %user 0 20 40 60 80 100 120 0 475 950 1425 1900 2375 2850 3325 3800 4275 4750 5225 5700 AxisTitle Compute CPU Utilization Average of %idle Average of %steal Average of %iowait Average of %nice Average of %system Average of %user Spark-SQL Performance Analysis -CPU utilization 0 20 40 60 80 100 120 0 555 1110 1665 2220 2775 3331 3886 4441 4996 5551 6106 6661 AxisTitle Storage CPU Utilization Average of %idle Average of %steal Average of %iowait Average of %nice Average of %system Average of %user 0 20 40 60 80 100 120 0 476 951 1426 1902 2377 2852 3327 3802 4277 4752 5227 5702 AxisTitle Storage CPU Utilization Average of %idle Average of %steal Average of %iowait Average of %nice Average of %system Average of %user 0 20 40 60 80 100 120 0 365 730 1095 1460 1825 2190 2555 2920 3285 3650 4015 4380 AxisTitle Storage CPU Utilization Average of Average of Average of Average of Average of Average of S3 Warmup Hot • CPU% on Compute nodes are higher
  • 18. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 0 20 40 60 80 100 120 5 350 695 1040 1385 1730 2076 2421 2766 3111 3456 3801 4146 4491 4836 5181 5526 5871 AxisTitle Compute Memory Utilization Total Spark-SQL Performance Analysis -Memory utilization 21.4 21.6 21.8 22 22.2 22.4 22.6 22.8 5 405 805 1205 1605 2005 2405 2805 3205 3605 4005 4405 4805 5205 5605 6005 6405 6805 AxisTitle Storage Memory Utilization Total 21.4 21.6 21.8 22 22.2 22.4 22.6 22.8 23 23.2 5 350 695 1040 1385 1730 2075 2421 2766 3111 3456 3801 4146 4491 4836 5181 5526 5871 AxisTitle Storage Memory Utilization Total 21.89 21.9 21.91 21.92 21.93 21.94 21.95 21.96 21.97 21.98 5 285 565 845 1125 1405 1685 1965 2245 2525 2805 3085 3365 3645 3925 4205 4485 AxisTitle Storage Memory Utilization Total 0 20 40 60 80 100 120 5 270 535 800 1065 1330 1595 1860 2125 2390 2655 2920 3185 3450 3715 3980 4245 4510 AxisTitle Compute Memory Utilization Total S3 Warmup Hot 0 10 20 30 40 50 60 70 80 90 5 385 765 1145 1525 1905 2285 2665 3045 3425 3805 4185 4565 4946 5326 5706 6086 6466 6846 AxisTitle Compute Memory Utilization Total • MEM% on Compute nodes are higher
  • 19. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Spark-SQL Performance Analysis -Disk utilization 0 100000 200000 300000 400000 500000 600000 700000 -5 550 1105 1660 2215 2770 3325 3880 4435 4990 5545 6100 6655 AxisTitle Compute Disk Bandwidth Sum of rkB/s Sum of wkB/s 0 50000 100000 150000 200000 250000 300000 -5 550 1105 1660 2215 2770 3325 3881 4436 4991 5546 6101 6656 AxisTitle Storage Disk Bandwidth Sum of rkB/s Sum of wkB/s 0 50000 100000 150000 200000 250000 300000 350000 400000 -5 470 945 1420 1895 2371 2846 3321 3796 4271 4746 5221 5696 AxisTitle Storage Disk Bandwidth Sum of rkB/s Sum of wkB/s 0 10000 20000 30000 40000 50000 60000 70000 80000 -5 360 725 1090 1455 1820 2185 2550 2915 3280 3645 4010 4375 AxisTitle Storage Disk Bandwidth Sum of rkB/s Sum of wkB/s 0 200000 400000 600000 800000 1000000 1200000 -5 390 785 1180 1575 1970 2365 2760 3155 3550 3945 4340 AxisTitle Compute Disk Bandwidth Sum of rkB/s Sum of wkB/s 0 200000 400000 600000 800000 1000000 1200000 -5 510 1025 1540 2055 2570 3085 3600 4115 4630 5146 5661 AxisTitle Compute Disk Bandwidth Sum of rkB/s Sum of wkB/s S3 Warmup Hot • Local cache saved lots disk IO on storage nodes
  • 20. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Spark-SQL Performance Analysis -Network utilization 0 200000 400000 600000 800000 1000000 1200000 0 555 1110 1665 2220 2775 3330 3885 4440 4995 5551 6106 6661 AxisTitle Storage Network IO Sum of rxkB/s Sum of txkB/s 0 200000 400000 600000 800000 1000000 1200000 1400000 0 516 1031 1546 2061 2577 3092 3607 4122 4637 5152 5667 AxisTitle Storage Network IO Sum of rxkB/s Sum of txkB/s 0 500 1000 1500 2000 2500 3000 0 340 680 1020 1360 1700 2040 2380 2720 3060 3400 3741 4081 4421 AxisTitle Storage Network IO Sum Sum 0 200000 400000 600000 800000 1000000 1200000 0 365 730 1095 1460 1825 2190 2555 2920 3285 3650 4015 4380 AxisTitle Compute Network IO Sum Sum 0 200000 400000 600000 800000 1000000 1200000 0 555 1110 1665 2220 2775 3330 3885 4440 4995 5550 6101 6656 AxisTitle Compute Network IO Sum of rxkB/s Sum of txkB/s 0 200000 400000 600000 800000 1000000 1200000 0 515 1030 1545 2060 2575 3090 3605 4120 4635 5150 5665 AxisTitle Compute Network IO Sum of rxkB/s Sum of txkB/s S3 Warmup Hot • Local cache saved a lots Network IO between compute and storage nodes
  • 21. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice • New S3a committer is designed for the “rename” issue • Instead of using COPY + DELETE, the new committers relies MPU mechanism to implement atomic rename on S3 • Staging committer is more simpler in general and also more mature as it’s from production code of Netflix • Hadoop 3.1.0 is released with “staging” and “magic” committer • For the staging committer, Netflix has been used in production for years • The Magic committer should be better by design, but it’s still WIP S3a committer
  • 22. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice • New S3a staging committer improved performance by ~50% than w/ original committer S3a staging committer performance 1.0 1.3 0.4 0.6 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 spark(yarn) + LOCAL HDFS (HDD) spark(yarn) + REMOTE HDFS (HDD) spark(yarn) + S3 (HDD) spark(yarn) + S3 (HDD) + S3a committer S3a committer performance with Terasort 1T(normalized)
  • 23. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
  • 24. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Persistent Memory • New media technologies that can offer byte-addressability persistence, along with DRAM-like performance • Different types • NVDIMM • Intel Optane DC Persistent Memory • Intel Optane DC SSD(disk)
  • 25. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Intel Optane DC Memory Source: https://www.snia.org/sites/default/files/SDC/2018/presentations/PM/Andy_Rudoff_Update _SNIA_Persistent_Memory_Programming_Model.pdf
  • 26. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Performance of DCPPM vs. NAND Source: https://software.intel.com/en-us/persistent-memory
  • 27. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
  • 28. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice A new PMEM tier for Alluxio Mem SSD HDD Mem PMEM SSD HDD
  • 29. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice New PMEM tier for Alluxio PMEM Our Approach worker POSIX IO filesystem Pagecache DAX filesystem Memory mapped file client Context Switches POSIX IO Context Switches workerclient workerclient Userspace Load/Store PMDK To bypass filesystem overhead and context switches reader/writer reader/writer reader/writer
  • 30. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
  • 31. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Summary • Bigdata on cloud is functionality ready validated by industry standard decision making workloads TPC-DS • Bigdata on the Cloud delivers on-par performance with remote HDFS for batch analytics, intensive write operations still need further optimizations • The new S3a committer does improve the write performance a lot • Compute-side caching(Alluxio) can provide significant performance improve for various workloads • Persistent memory opens a new era for storage and it can further improve the performance
  • 32. Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice