Weitere ähnliche Inhalte
Ähnlich wie Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and New Opportunities with Persistent Memory (20)
Mehr von Alluxio, Inc. (20)
Kürzlich hochgeladen (20)
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and New Opportunities with Persistent Memory
- 2. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Legal Disclaimer & Optimization Notice
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more
complete information visit www.intel.com/benchmarks.
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTYRIGHTS IS GRANTED BY
THIS DOCUMENT. INTEL ASSUMES NO LIABILITYWHATSOEVERAND INTEL DISCLAIMS ANYEXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITYOR
WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY,OR INFRINGEMENT OF ANYPATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTYRIGHT.
Copyright © 2018, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other
countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the
availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent
optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture
are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the
specific instruction sets covered by this notice.
Notice revision #20110804
2
- 3. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Agenda
▪ Background
▪ Bigdata Analytics on Cloud Reference Architecture
▪ Bigdata Analytics on Cloud with Alluxio
▪ PersistentMemory for Alluxio
▪ Summary
- 4. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
- 5. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
5
background
Get a bigger cluster
for many teams to share.
Give each team their
own dedicated cluster,
each with a copy of
PBs of data.
Give teams ability to
spin-up/spin-down
clusters which can
share data sets.
#1 #2 #3
- 6. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
6
Benefits of compute and storage disaggregation
Independent scale
of CPU and storage
capacity
• Rightsized HW for
each layer
• Reduce resource
wastage
• Cost saving
Single copy of data
• Multiple compute
cluster share
common data
repo/lake
• Simplified data
management
• Reduced
provisioning
overhead
• Improve security
Enable Agile
application
development
• In-memory cloning
• Snapshot service
• Quick & efficient
copies
Hybrid cloud
deployment
• Mix and match
resources
depending on
workload nature
and life cycle
Simple and flexible
software
management
• Avoid software
version
management
• Upgrade compute
software only
- 7. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
7
Bigdata Analytics over Disaggregate Storage
20162006 201520142008
oss://adl://
wasb://gs://s3:// s3n://
s3a://
Hadoop Compatible File System abstraction layer: Unified storage API interface Hadoop fs –ls s3a://job/
- 8. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
- 9. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Workloads
Simple Read/Write
▪ DFSIO: TestDFSIO is the canonical example of a benchmarkthat attempts to measure the Storage's capacity for
reading and writing bulk data.
▪ Terasort: a popular benchmark that measures the amount of time to sort one terabyte of randomly distributed data
on a given computer system.
Batch Analytics
▪ To consistently executinganalytical process to process large set of data.
▪ Leveraging54 derivedfrom TPC-DS * queries with intensivereads across objects in different buckets
Data Transformation(not covered in this talk)
▪ ETL: Taking data as it is originally generatedand transforming it to a format (Parquet, ORC) that more tuned for
analytical workloads.
- 10. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
5x Compute Node
• Intel® Xeon™ processor E5-2699 v4 @
2.2GHz, 128GB mem
• 2x10G 82599 10Gb NIC
• 2x SSDs
• 3x Data storage (can be emliminated)
Software:
• Hadoop 2.8.1
• Spark 2.2.0
• Hive 2.2.1
• CentOS 7.5
5x Storage Node, 5 RGW nodes(co-located)
• Intel(R) Xeon(R) CPU E5-2699v4 2.20GHz
• 128GB Memory
• 3x 82599 10Gb NIC
• 7x 1TB HDD as data drive
• 1 OSD instances one each HDD
• CentOS 7.5
• Ceph Mimic
*Other names and brands may be
claimed as the property of others.
RGW1
OSD1
MON
OSD1 OSD7…
Hadoop
Hive
Spark
1x10Gb NIC
RGW2
OSD2
RGW3
OSD3
RGW4
OSD4
RGW5
OSD5
2x10Gb NIC
Hadoop
Hive
Spark
Hadoop
Hive
Spark
Hadoop
Hive
Spark
Hadoop
Hive
Spark
Head
DNS Server
10
System configurations
- 11. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Bigdata on Object Storage Performance Overview
• In Terasort workload, S3 showed 60% perf gap comparing with local HDFS since all
it’s very IO intensive
• In TPC-DS workload, S3 showed 30% perf gap, and in Kmeans workload, S3
showed 40% perf gap
1.0 1.0 1.0
0.7
0.4
0.6
0.9
1.3
1.0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
TPC-DS (54 quiries) TERASORT 1T KMEANS 374g
Performance Overview(Normalized)
spark(yarn) + LOCAL HDFS (HDD) spark(yarn) + S3 (HDD) spark(yarn) + REMOTE HDFS (HDD)
- 12. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
- 13. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
System Configuration(Alluxio) 5x Compute Node
Hardware:
• ntel® Xeon™ processor Gold 6140
@ 2.3GHz, 384GB Memory
• 1x 82599 10Gb NIC
• 5x P4500 SSD (2 for spark-shuffle)
Software:
• Hadoop 2.8.1
• Spark 2.2.0
• Hive 2.2.1
• CentOS 7.5
5x Storage Node
• Intel(R) Xeon(R) CPU Gold 6140 @
2.30GHz, 192GB Memory
• 2x 82599 10Gb NIC
• 7x 1TB HDD for Ceph bluestore or
HDFS namenode and datanode
Software:
• Hadoop 2.8.1
• Ceph 12.2.7
• CentOS 7.5
*Other names and brands may be
claimed as the property of others.
Hadoop
Hive
Spark
Alluxio
DNS Hadoop
Hive
Spark
Hadoop
Hive
Spark
Hadoop
Hive
Spark
Hadoop
Hive
Spark
CEPH
MON
RGW
REMOTE
HDFS
NN
1x10Gb NIC
Alluxio Alluxio Alluxio Alluxio
OSD DN
CEPH
RGW
REMOTE
HDFS
OSD DN
CEPH
RGW
REMOTE
HDFS
OSD DN
CEPH
RGW
REMOTE
HDFS
OSD DN
CEPH
RGW
REMOTE
HDFS
OSD DN
Alluxio Acceleration Layer
• 200GB Mem for mem mode
• 1TB SSD(P4500) for SSD mode
Software:
• Alluxio 1.8.0
Intel Confidential 13
- 14. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Performance w/ Alluxio Acceleration
• Alluxio based in memory acceleration layers provides significant performance boost for analytics
workloads with disaggregated storage
• Up to 3.25x for terasort
• Up to 1.8x compared with local HDFS
Intel Confidential 14
1.0 1.0 1.0
0.7
0.4
0.6
0.9
1.3
1.0
1.1
1.3 1.31.2
0.0
1.0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
Batch Query (54 quiries) TERASORT 1T KMEANS 374g
Alluxio Accleration for Disaggregated analytics storage with different workloads (Normalized)
spark(yarn) + LOCAL HDFS (HDD) spark(yarn) + S3 (HDD) spark(yarn) + REMOTE HDFS (HDD)
spark(yarn) + alluxio(SSD) + S3 (HDD) spark(yarn) + alluxio(MEM) + S3 (HDD)
- 15. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Spark-SQL Performance Overview
• The performance of first several queries in Warmup are worse than S3 baseline
• For most queries SSD cache does improved the performance
Spark-SQL performance W/ S3
S3 Warmup Hot
- 16. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Compute-side caching for I/O intensive queries
16
• Compute-side caching brings better efficiency(10% - 30%) for I/O intensive
queries
1.00 1.00 1.00 1.00 1.00 1.00 1.00
0.85
0.71
0.90
0.75 0.75
0.78
0.66
0.00
0.20
0.40
0.60
0.80
1.00
1.20
query19.sql query42.sql query43.sql query52.sql query55.sql query63.sql query68.sql
Seconds
I/O intensive query performance(normalized, lower is better)
S3 Compute-side Caching
- 17. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
0
20
40
60
80
100
120
0
365
730
1095
1460
1825
2190
2555
2920
3285
3650
4015
4380
AxisTitle
CPU Utilization
Average of
Average of
Average of
Average of
Average of
Average of
0
20
40
60
80
100
120
0
555
1110
1665
2220
2775
3330
3885
4440
4995
5551
6106
6661
AxisTitle
Compute CPU Utilization
Average of %idle
Average of %steal
Average of %iowait
Average of %nice
Average of %system
Average of %user
0
20
40
60
80
100
120
0
475
950
1425
1900
2375
2850
3325
3800
4275
4750
5225
5700
AxisTitle
Compute CPU Utilization
Average of %idle
Average of %steal
Average of %iowait
Average of %nice
Average of %system
Average of %user
Spark-SQL Performance Analysis
-CPU utilization
0
20
40
60
80
100
120
0
555
1110
1665
2220
2775
3331
3886
4441
4996
5551
6106
6661
AxisTitle
Storage CPU Utilization
Average of %idle
Average of %steal
Average of %iowait
Average of %nice
Average of %system
Average of %user
0
20
40
60
80
100
120
0
476
951
1426
1902
2377
2852
3327
3802
4277
4752
5227
5702
AxisTitle
Storage CPU Utilization
Average of %idle
Average of %steal
Average of %iowait
Average of %nice
Average of %system
Average of %user
0
20
40
60
80
100
120
0
365
730
1095
1460
1825
2190
2555
2920
3285
3650
4015
4380
AxisTitle
Storage CPU Utilization
Average of
Average of
Average of
Average of
Average of
Average of
S3 Warmup Hot
• CPU% on Compute nodes are higher
- 18. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
0
20
40
60
80
100
120
5
350
695
1040
1385
1730
2076
2421
2766
3111
3456
3801
4146
4491
4836
5181
5526
5871
AxisTitle
Compute Memory Utilization
Total
Spark-SQL Performance Analysis
-Memory utilization
21.4
21.6
21.8
22
22.2
22.4
22.6
22.8
5
405
805
1205
1605
2005
2405
2805
3205
3605
4005
4405
4805
5205
5605
6005
6405
6805
AxisTitle
Storage Memory Utilization
Total
21.4
21.6
21.8
22
22.2
22.4
22.6
22.8
23
23.2
5
350
695
1040
1385
1730
2075
2421
2766
3111
3456
3801
4146
4491
4836
5181
5526
5871
AxisTitle
Storage Memory Utilization
Total
21.89
21.9
21.91
21.92
21.93
21.94
21.95
21.96
21.97
21.98
5
285
565
845
1125
1405
1685
1965
2245
2525
2805
3085
3365
3645
3925
4205
4485
AxisTitle
Storage Memory Utilization
Total
0
20
40
60
80
100
120
5
270
535
800
1065
1330
1595
1860
2125
2390
2655
2920
3185
3450
3715
3980
4245
4510
AxisTitle
Compute Memory Utilization
Total
S3 Warmup Hot
0
10
20
30
40
50
60
70
80
90
5
385
765
1145
1525
1905
2285
2665
3045
3425
3805
4185
4565
4946
5326
5706
6086
6466
6846
AxisTitle
Compute Memory Utilization
Total
• MEM% on Compute nodes are higher
- 19. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Spark-SQL Performance Analysis
-Disk utilization
0
100000
200000
300000
400000
500000
600000
700000
-5
550
1105
1660
2215
2770
3325
3880
4435
4990
5545
6100
6655
AxisTitle
Compute Disk Bandwidth
Sum of rkB/s
Sum of wkB/s
0
50000
100000
150000
200000
250000
300000
-5
550
1105
1660
2215
2770
3325
3881
4436
4991
5546
6101
6656
AxisTitle
Storage Disk Bandwidth
Sum of rkB/s
Sum of wkB/s
0
50000
100000
150000
200000
250000
300000
350000
400000
-5
470
945
1420
1895
2371
2846
3321
3796
4271
4746
5221
5696
AxisTitle
Storage Disk Bandwidth
Sum of rkB/s
Sum of wkB/s
0
10000
20000
30000
40000
50000
60000
70000
80000
-5
360
725
1090
1455
1820
2185
2550
2915
3280
3645
4010
4375
AxisTitle
Storage Disk Bandwidth
Sum of rkB/s
Sum of wkB/s
0
200000
400000
600000
800000
1000000
1200000
-5
390
785
1180
1575
1970
2365
2760
3155
3550
3945
4340
AxisTitle
Compute Disk Bandwidth
Sum of rkB/s
Sum of wkB/s
0
200000
400000
600000
800000
1000000
1200000
-5
510
1025
1540
2055
2570
3085
3600
4115
4630
5146
5661
AxisTitle
Compute Disk Bandwidth
Sum of rkB/s
Sum of wkB/s
S3 Warmup Hot
• Local cache saved lots disk IO on
storage nodes
- 20. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Spark-SQL Performance Analysis
-Network utilization
0
200000
400000
600000
800000
1000000
1200000
0
555
1110
1665
2220
2775
3330
3885
4440
4995
5551
6106
6661
AxisTitle
Storage Network IO
Sum of rxkB/s
Sum of txkB/s
0
200000
400000
600000
800000
1000000
1200000
1400000
0
516
1031
1546
2061
2577
3092
3607
4122
4637
5152
5667
AxisTitle
Storage Network IO
Sum of rxkB/s
Sum of txkB/s
0
500
1000
1500
2000
2500
3000
0
340
680
1020
1360
1700
2040
2380
2720
3060
3400
3741
4081
4421
AxisTitle
Storage Network IO
Sum
Sum
0
200000
400000
600000
800000
1000000
1200000
0
365
730
1095
1460
1825
2190
2555
2920
3285
3650
4015
4380
AxisTitle
Compute Network IO
Sum
Sum
0
200000
400000
600000
800000
1000000
1200000
0
555
1110
1665
2220
2775
3330
3885
4440
4995
5550
6101
6656
AxisTitle
Compute Network IO
Sum of rxkB/s
Sum of txkB/s
0
200000
400000
600000
800000
1000000
1200000
0
515
1030
1545
2060
2575
3090
3605
4120
4635
5150
5665
AxisTitle
Compute Network IO
Sum of rxkB/s
Sum of txkB/s
S3 Warmup Hot
• Local cache saved a lots Network IO
between compute and storage nodes
- 21. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
• New S3a committer is designed for the “rename” issue
• Instead of using COPY + DELETE, the new committers relies MPU mechanism to
implement atomic rename on S3
• Staging committer is more simpler in general and also more mature as it’s from
production code of Netflix
• Hadoop 3.1.0 is released with “staging” and “magic” committer
• For the staging committer, Netflix has been used in production for years
• The Magic committer should be better by design, but it’s still WIP
S3a committer
- 22. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
• New S3a staging committer improved performance by ~50% than w/
original committer
S3a staging committer performance
1.0
1.3
0.4
0.6
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
spark(yarn) + LOCAL HDFS (HDD) spark(yarn) + REMOTE HDFS (HDD) spark(yarn) + S3 (HDD) spark(yarn) + S3 (HDD) + S3a
committer
S3a committer performance with Terasort 1T(normalized)
- 23. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
- 24. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Persistent Memory
• New media technologies that can offer byte-addressability persistence, along
with DRAM-like performance
• Different types
• NVDIMM
• Intel Optane DC Persistent Memory
• Intel Optane DC SSD(disk)
- 25. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Intel Optane DC Memory
Source:
https://www.snia.org/sites/default/files/SDC/2018/presentations/PM/Andy_Rudoff_Update
_SNIA_Persistent_Memory_Programming_Model.pdf
- 26. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Performance of DCPPM vs. NAND
Source: https://software.intel.com/en-us/persistent-memory
- 27. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
- 28. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
A new PMEM tier for Alluxio
Mem
SSD
HDD
Mem
PMEM
SSD
HDD
- 29. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
New PMEM tier for Alluxio
PMEM
Our Approach
worker
POSIX
IO
filesystem
Pagecache
DAX
filesystem
Memory
mapped file
client
Context Switches
POSIX
IO
Context Switches
workerclient workerclient
Userspace
Load/Store
PMDK
To bypass filesystem overhead and context switches
reader/writer reader/writer
reader/writer
- 30. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
- 31. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Summary
• Bigdata on cloud is functionality ready validated by industry standard
decision making workloads TPC-DS
• Bigdata on the Cloud delivers on-par performance with remote HDFS for
batch analytics, intensive write operations still need further optimizations
• The new S3a committer does improve the write performance a lot
• Compute-side caching(Alluxio) can provide significant performance improve
for various workloads
• Persistent memory opens a new era for storage and it can further improve
the performance
- 32. Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice