SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
Memory-Speed Virtual Distributed Storage: Latest Features and Use Cases
August 2016 @ Strata+Hadoop World Beijing
Bin Fan, Yupeng Fu
1
About Us
•  Bin Fan 
•  Software Engineer @ Alluxio, Inc.
•  Alluxio PMC
•  Worked at Google, MSR
•  CMU, CUHK, USTC
•  Wechat @ apc999_fb
•  Yupeng Fu
•  Software Engineer @ Alluxio, Inc.
•  Alluxio PMC
•  Worked at Palantir, Google
•  UCSD, Tsinghua 
•  Wechat @richbird9
2
•  Team consists of Alluxio creators and top committers
•  Invested by

•  Committed to Alluxio Open Source

•  http://www.alluxio.com
Alluxio Inc.
3
Outline
•  What is Alluxio
•  Example use cases
1.  Off-heap memory to alleviate resource pressure
2.  Fast data sharing between jobs
3.  Accelerate access to remote storage
4.  Unified namespace
•  Summary

4
Non-persistent
data-storage
software.
ATTRIBUTES
Memory-Speed
 Virtual
 Distributed
 Storage
Scale out
architecture.
Virtualized across different
storage types under a
unified namespace.
Memory-speed
access to data.
5
OPEN SOURCE ALLUXIO: History
• Started in the AMPLab at UC Berkeley
–  Summer of 2012
• Open Sourced as Tachyon (renamed to Alluxio)
–  Apache 2.0 License
–  Latest Release: version 1.2.0 (July, 2016)
6
OPEN SOURCE ALLUXIO
v0.2
 v0.3
v0.4
v0.5
v0.6
v0.7
v0.8
v1.0
v1.1
7
OPEN SOURCE ALLUXIO
The fastest growing
open-source project
in the big data
ecosystem.

Currently over 300
contributors from
over 100
organizations.
8
Outline
•  What is Alluxio
•  Example use cases
1.  Off-heap memory to alleviate resource pressure
2.  Fast data sharing between jobs
3.  Accelerate access to remote storage
4.  Unified namespace
•  Summary

9
Application 1
Alluxio
CASE 1:
Off-heap Memory Storage to
Alleviate Resource Pressure
10
HDFS / Amazon S3
Consolidating Memory
Spark Job1
Spark
Memory
block 1
block 3
Spark Job2
Spark
Memory
block 3
block 1
block 1
block 3
block 2
block 4
storage engine & 
execution engine
same process
Data duplicated at memory-level
11
Spark Job1
Spark mem
Spark Job2
Spark mem
HDFS / Amazon S3
block 1
block 3
block 2
block 4
HDFS
disk
block 1
block 3
block 2
block 4
Alluxio
In-Memory
block 1
block 3
 block 4
Data not duplicated at memory-level
Consolidating Memory
storage engine & 
execution engine
same process
12
Data Resilience during Crashes
Spark Task
Spark Memory
block manager
block 1
block 3
HDFS / Amazon S3
block 1
block 3
block 2
block 4
Process crash requires network and/or disk I/O to re-read the
data
storage engine & 
execution engine
same process
13
Data Resilience during Crashes
Crash
Spark Memory
block manager
block 1
block 3
HDFS / Amazon S3
block 1
block 3
block 2
block 4
Process crash requires network and/or disk I/O to re-read the
data
storage engine & 
execution engine
same process
14
HDFS / Amazon S3
Data Resilience during Crashes
block 1
block 3
block 2
block 4
Crash
storage engine & 
execution engine
same process
Process crash requires network and/or disk I/O to re-read the
data
15
Spark Task
Spark Memory
block manager
HDFS
disk
block 1
block 3
block 2
block 4
Alluxio
In-Memory
block 1
block 3
 block 4
Process crash only needs memory I/O to re-read the
data
Data Resilience during Crashes
storage engine & 
execution engine
same process
16
Crash
Process crash only needs memory I/O to re-read the data
HDFS
disk
block 1
block 3
block 2
block 4
Alluxio
In-Memory
block 1
block 3
 block 4
Data Resilience during Crashes
storage engine & 
execution engine
same process
17
CASE STUDY
Thanks to Alluxio, we now have the raw
data immediately available at every
iteration and we can skip the costs of
loading in terms of time waiting,
network traffic, and RDBMS activity.

- Henry Powell, Barclays
RESULTS
•  Barclays workflow iteration time decreased
from hours to seconds

•  Alluxio enabled workflows that were
impossible before

•  By keeping data only in memory, the I/O cost of
loading and storing in Alluxio is now on the
order of seconds
Relational Database
Share Data Across Jobs at Memory
Speed
•  Reducing time from hours to
seconds
•  Solving issues of Teradata as
under storage
18
Barclays Previous Architecture
Spark
Teradata
Slow ETL,
30+ minutes
Each context restart
requires ETL process
again
h"ps://dzone.com/ar1cles/Accelerate-­‐In-­‐Memory-­‐Processing-­‐with-­‐Spark-­‐from-­‐Hours-­‐to-­‐Seconds-­‐With-­‐Tachyon	
  
19
Barclays Architecture with Alluxio
Spark
Teradata
Alluxio
ETL only once
Data still available in Alluxio
memory after Job crash
20
CASE 2:
Fast Data sharing between apps
Application 1
Alluxio
Application2
HDFS
21
Spark Job
Spark 
Memory
block 1
block 3
Hadoop MR Job
YARN
HDFS / Amazon S3
block 1
block 3
block 2
block 4
storage engine & 
execution engine
same process
Data Sharing Between Frameworks
Inter-process sharing slowed down by network and/or disk I/O
22
Data Sharing Between Frameworks
Spark Job
Spark Memory
Hadoop MR Job
YARN
HDFS / Amazon S3
block 1
block 3
block 2
block 4
HDFS
disk
block 1
block 3
block 2
block 4
Alluxio
In-Memory
block 1
block 3
 block 4
storage engine & 
execution engine
same process
Inter-process sharing can happen at memory
speed
23
CASE STUDY
We’ve been running Alluxio in
production for over 9 months, resulting
in 15x speedup on average, and 300x
speedup at peak service times.

- Xueyan Li, Qunar
RESULTS

•  Multiple computing frameworks can share data
at memory speed with Alluxio

•  Improved the performance of their system with
15x – 300x speedups

•  Alluxio provides various user-friendly APIs,
which simplifies the transition to Alluxio.

Sharing Data Across Different
Compute Frameworks
•  6 billion logs (4.5 TB) daily
•  Mix of Memory + HDD
24
Qunar Previous Architecture
Flink
Streaming
Spark
Spark
Streaming
Flink
HDFS Cluster 
Some jobs cannot finish
Slow access to the storage
25
Qunar Architecture with Alluxio
Flink
Streaming
HDFS Cluster 
Spark
Spark
Streaming
Flink
Alluxio
Share in-memory data
across frameworks
26
CASE 3:
Accelerate access to remote storage
Application 
Alluxio
Remote
Storage
27
•  Different compute and storage hardware requirements
•  Scale compute and storage resources independently
•  Cost effective object stores (Amazon S3, Google GCS, Microsoft Azure
Blob Store) are inherently decoupled
Decoupling Compute from Storage
Accessing data
requires remote I/O!
28
Remote I/O problems
Application
Remote
Storage
every data operation requires
data transfer, sometimes over
the WAN
high latency, network
throughput
29
Visualizing the Stack with Alluxio
FAST 
104 - 105 MB/s
Application
Remote Storage
MODERATE 
103 - 104 MB/s
SLOW
102 - 103 MB/s
Alluxio MEM
Often
Only When Necessary

Alluxio SSD/HDD
Limited

30
What if memory capacity is still
not enough?
31
Alluxio Manages All Local Storage
MEM
SSD
HDD
Faster
Higher Capacity
32
Configurable Storage Tiers
MEM only
MEM + HDD
SSD only
33
Pluggable Tier Management Policies
Evict stale data to
slower tier
Promote hot data
to faster tier
34
CASE STUDY
Baidu File System
The performance was amazing. With
Spark SQL alone, it took 100-150 seconds
to finish a query; using Alluxio, where
data may hit local or remote Alluxio
nodes, it took 10-15 seconds.

- Shaoshan Liu, Baidu
RESULTS
•  Data queries are now 30x faster with Alluxio

•  1000-worker Alluxio cluster run stably,
providing over 50TB of RAM space

•  By using Alluxio, batch queries usually lasting
over 15 minutes were transformed into an
interactive query taking less than 30 seconds
Accelerate Access to Remote
Storage
•  1000+ nodes deployment
•  2+ petabytes of storage
•  Mix of memory + HDD
•  1000-worker cluster
35
CASE 4:
Unified Name Space
36	
  
Application 
Alluxio
AWS S3
 HDFS
Common Scenarios
•  At large organizations, data spans many storage
systems

•  Legacy storage systems
•  Application logic needs to integrate with different
storage systems

37
Unified Name Space
An abstraction for applications to access different storage
systems through the same interface

•  Benefits
–  Application written once with Allluxio API
–  Transparently add new storage systems
–  Seamlessly integrate with existing systems

38
Transparent Naming
Operations over persisted Alluxio objects mapped
transparently to underlying storage

alluxio://Data/Sales <-> hdfs://Data/Sales
Alluxio
 Storage System (HDFS, S3, …)
alluxio://host:port/
Data
 Users
Reports
 Sales
 Alice
 Bob
hdfs://host:port/
Data
 Users
Reports
 Sales
 Alice
 Bob
39
mount
Multiple Storage Systems
•  Unified namespace for multiple data sources
alluxio://Data/Sales -> s3://bucket/Sales
alluxio://Users/Alice -> hdfs://Users/Alice
•  API for on-the-fly mounting / unmounting

Alluxio
 Storage System A

alluxio://host:port/
Data
 Users
Alice
 Bob
hdfs://host:port/
Users
Alice
 Bob
Storage System B

s3://host/bucket
Reports
 Sales
Reports
 Sales
40
mount
mount
A Proliferation of Under Storage Systems
MORE…
41
CASE STUDY
RESULTS
•  Alluxio’s unified namespace enables different
applications and frameworks to easily interact
with their data from different storage systems

•  Global data centers across North America and
Asia

•  Tiered storage feature manages various
storage resources including memory, SSD and
disk
Transparently Manage Data
Across Different Storage Systems
•  Storage medium: MEM
•  Clusters in different regions
An Internet Company
Machine Learning Pipelines
42
42
Outline
•  What is Alluxio
•  Example use cases
1.  Off-heap memory to alleviate resource pressure
2.  Fast data sharing between jobs
3.  Accelerate access to remote storage
4.  Unified namespace
•  Summary

43
Takeaway: When to use Alluxio
•  Two or more jobs access the same dataset
•  Job(s) may not always succeed
•  Spark with memory/GC pressure
•  Jobs/Apps are pipelined
•  Resulting data does not need to be immediately
persisted
•  Interact with remote data
•  Data stored across different storage systems


44
Try out Alluxio 1.2.0
http://www.alluxio.org/releases
45
First China Meetups!
•  Nanjing (Saturday, 7/30)
•  Shanghai (Sunday, 7/31)


•  Beijing (Sunday, 8/7)
46
Resources
•  Alluxio Project: http://www.alluxio.org
•  Development: https://github.com/Alluxio/alluxio
•  Meet Friends: http://www.meetup.com/Alluxio
•  Alluxio, Inc.: http://www.alluxio.com
•  Training: http://www.alluxio.com/alluxio-training/
•  Contact us: info@alluxio.com
•  Alluxio Wechat: Alluxio
47
Alluxio Use Cases at Strata+Hadoop World Beijing 2016

Weitere ähnliche Inhalte

Was ist angesagt?

Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017Alluxio, Inc.
 
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio, Inc.
 
Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016Alluxio, Inc.
 
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016Alluxio, Inc.
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit
 
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017Alluxio, Inc.
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...Alluxio, Inc.
 
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016Alluxio, Inc.
 
The Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand ClustersThe Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand ClustersAlluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkAlluxio, Inc.
 
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Alluxio, Inc.
 
Tachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage SystemTachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage SystemTachyon Nexus, Inc.
 
Alluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for DaskAlluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for DaskAlluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkAlluxio, Inc.
 
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...Alluxio, Inc.
 
Accessing Data Anywhere with Unified Namespace
Accessing Data Anywhere with Unified NamespaceAccessing Data Anywhere with Unified Namespace
Accessing Data Anywhere with Unified NamespaceAlluxio, Inc.
 
Accelerating Spark Workloads in a Mesos Environment with Alluxio
Accelerating Spark Workloads in a Mesos Environment with AlluxioAccelerating Spark Workloads in a Mesos Environment with Alluxio
Accelerating Spark Workloads in a Mesos Environment with AlluxioAlluxio, Inc.
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioAlluxio, Inc.
 
Alluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACKAlluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACKAlluxio, Inc.
 

Was ist angesagt? (20)

Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
 
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
 
Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016
 
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
 
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
 
The Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand ClustersThe Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand Clusters
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
 
Tachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage SystemTachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage System
 
Alluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for DaskAlluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for Dask
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
 
Accessing Data Anywhere with Unified Namespace
Accessing Data Anywhere with Unified NamespaceAccessing Data Anywhere with Unified Namespace
Accessing Data Anywhere with Unified Namespace
 
Accelerating Spark Workloads in a Mesos Environment with Alluxio
Accelerating Spark Workloads in a Mesos Environment with AlluxioAccelerating Spark Workloads in a Mesos Environment with Alluxio
Accelerating Spark Workloads in a Mesos Environment with Alluxio
 
Tachyon workshop 2015-07-19
Tachyon workshop 2015-07-19Tachyon workshop 2015-07-19
Tachyon workshop 2015-07-19
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+Alluxio
 
Alluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACKAlluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACK
 

Andere mochten auch

Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Alluxio, Inc.
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Tachyon Nexus, Inc.
 
Presentation by TachyonNexus & Baidu at Strata Singapore 2015
Presentation by TachyonNexus & Baidu at Strata Singapore 2015Presentation by TachyonNexus & Baidu at Strata Singapore 2015
Presentation by TachyonNexus & Baidu at Strata Singapore 2015Tachyon Nexus, Inc.
 
Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon Presentation at AMPCamp 6 (November, 2015)Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon Presentation at AMPCamp 6 (November, 2015)Tachyon Nexus, Inc.
 
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleGPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleSpark Summit
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaAlluxio, Inc.
 
U2 product For Wiseeco
U2 product For WiseecoU2 product For Wiseeco
U2 product For Wiseeco호진 하
 
AWS의 하둡 관련 서비스 - EMR/S3
AWS의 하둡 관련 서비스 - EMR/S3AWS의 하둡 관련 서비스 - EMR/S3
AWS의 하둡 관련 서비스 - EMR/S3Keeyong Han
 

Andere mochten auch (9)

Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015
 
Presentation by TachyonNexus & Baidu at Strata Singapore 2015
Presentation by TachyonNexus & Baidu at Strata Singapore 2015Presentation by TachyonNexus & Baidu at Strata Singapore 2015
Presentation by TachyonNexus & Baidu at Strata Singapore 2015
 
Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon Presentation at AMPCamp 6 (November, 2015)Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon Presentation at AMPCamp 6 (November, 2015)
 
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleGPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 
U2 product For Wiseeco
U2 product For WiseecoU2 product For Wiseeco
U2 product For Wiseeco
 
AWS의 하둡 관련 서비스 - EMR/S3
AWS의 하둡 관련 서비스 - EMR/S3AWS의 하둡 관련 서비스 - EMR/S3
AWS의 하둡 관련 서비스 - EMR/S3
 

Ähnlich wie Alluxio Use Cases at Strata+Hadoop World Beijing 2016

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Data Con LA
 
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...Spark Summit
 
Running Solr in the Cloud at Memory Speed with Alluxio
Running Solr in the Cloud at Memory Speed with AlluxioRunning Solr in the Cloud at Memory Speed with Alluxio
Running Solr in the Cloud at Memory Speed with Alluxiothelabdude
 
Improving Memory Utilization of Spark Jobs Using Alluxio
Improving Memory Utilization of Spark Jobs Using AlluxioImproving Memory Utilization of Spark Jobs Using Alluxio
Improving Memory Utilization of Spark Jobs Using AlluxioAlluxio, Inc.
 
Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudAlluxio, Inc.
 
Accelerate Cloud Training with Alluxio
Accelerate Cloud Training with AlluxioAccelerate Cloud Training with Alluxio
Accelerate Cloud Training with AlluxioAlluxio, Inc.
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Community
 
A Reliable Memory-Centric Distributed Storage System
A Reliable Memory-Centric Distributed Storage SystemA Reliable Memory-Centric Distributed Storage System
A Reliable Memory-Centric Distributed Storage SystemAlluxio, Inc.
 
Running Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, LucidworksRunning Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, LucidworksLucidworks
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio, Inc.
 
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene PangBest Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene PangSpark Summit
 
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast AnalyticsGetting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast AnalyticsAlluxio, Inc.
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio, Inc.
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsDataWorks Summit
 
Speeding Up Spark Performance using Alluxio at China Unicom
Speeding Up Spark Performance using Alluxio at China UnicomSpeeding Up Spark Performance using Alluxio at China Unicom
Speeding Up Spark Performance using Alluxio at China UnicomAlluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkAlluxio, Inc.
 
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Alluxio, Inc.
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio, Inc.
 

Ähnlich wie Alluxio Use Cases at Strata+Hadoop World Beijing 2016 (18)

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
 
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
 
Running Solr in the Cloud at Memory Speed with Alluxio
Running Solr in the Cloud at Memory Speed with AlluxioRunning Solr in the Cloud at Memory Speed with Alluxio
Running Solr in the Cloud at Memory Speed with Alluxio
 
Improving Memory Utilization of Spark Jobs Using Alluxio
Improving Memory Utilization of Spark Jobs Using AlluxioImproving Memory Utilization of Spark Jobs Using Alluxio
Improving Memory Utilization of Spark Jobs Using Alluxio
 
Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any Cloud
 
Accelerate Cloud Training with Alluxio
Accelerate Cloud Training with AlluxioAccelerate Cloud Training with Alluxio
Accelerate Cloud Training with Alluxio
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
 
A Reliable Memory-Centric Distributed Storage System
A Reliable Memory-Centric Distributed Storage SystemA Reliable Memory-Centric Distributed Storage System
A Reliable Memory-Centric Distributed Storage System
 
Running Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, LucidworksRunning Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene PangBest Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene Pang
 
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast AnalyticsGetting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Speeding Up Spark Performance using Alluxio at China Unicom
Speeding Up Spark Performance using Alluxio at China UnicomSpeeding Up Spark Performance using Alluxio at China Unicom
Speeding Up Spark Performance using Alluxio at China Unicom
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
 

Mehr von Alluxio, Inc.

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioAlluxio, Inc.
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingAlluxio, Inc.
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio, Inc.
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...Alluxio, Inc.
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionAlluxio, Inc.
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeAlluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudAlluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderAlluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionAlluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio, Inc.
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAlluxio, Inc.
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...Alluxio, Inc.
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...Alluxio, Inc.
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAlluxio, Inc.
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAlluxio, Inc.
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio, Inc.
 

Mehr von Alluxio, Inc. (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 

Kürzlich hochgeladen

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Alluxio Use Cases at Strata+Hadoop World Beijing 2016

  • 1. Memory-Speed Virtual Distributed Storage: Latest Features and Use Cases August 2016 @ Strata+Hadoop World Beijing Bin Fan, Yupeng Fu 1
  • 2. About Us •  Bin Fan •  Software Engineer @ Alluxio, Inc. •  Alluxio PMC •  Worked at Google, MSR •  CMU, CUHK, USTC •  Wechat @ apc999_fb •  Yupeng Fu •  Software Engineer @ Alluxio, Inc. •  Alluxio PMC •  Worked at Palantir, Google •  UCSD, Tsinghua •  Wechat @richbird9 2
  • 3. •  Team consists of Alluxio creators and top committers •  Invested by •  Committed to Alluxio Open Source •  http://www.alluxio.com Alluxio Inc. 3
  • 4. Outline •  What is Alluxio •  Example use cases 1.  Off-heap memory to alleviate resource pressure 2.  Fast data sharing between jobs 3.  Accelerate access to remote storage 4.  Unified namespace •  Summary 4
  • 5. Non-persistent data-storage software. ATTRIBUTES Memory-Speed Virtual Distributed Storage Scale out architecture. Virtualized across different storage types under a unified namespace. Memory-speed access to data. 5
  • 6. OPEN SOURCE ALLUXIO: History • Started in the AMPLab at UC Berkeley –  Summer of 2012 • Open Sourced as Tachyon (renamed to Alluxio) –  Apache 2.0 License –  Latest Release: version 1.2.0 (July, 2016) 6
  • 7. OPEN SOURCE ALLUXIO v0.2 v0.3 v0.4 v0.5 v0.6 v0.7 v0.8 v1.0 v1.1 7
  • 8. OPEN SOURCE ALLUXIO The fastest growing open-source project in the big data ecosystem. Currently over 300 contributors from over 100 organizations. 8
  • 9. Outline •  What is Alluxio •  Example use cases 1.  Off-heap memory to alleviate resource pressure 2.  Fast data sharing between jobs 3.  Accelerate access to remote storage 4.  Unified namespace •  Summary 9
  • 10. Application 1 Alluxio CASE 1: Off-heap Memory Storage to Alleviate Resource Pressure 10
  • 11. HDFS / Amazon S3 Consolidating Memory Spark Job1 Spark Memory block 1 block 3 Spark Job2 Spark Memory block 3 block 1 block 1 block 3 block 2 block 4 storage engine & execution engine same process Data duplicated at memory-level 11
  • 12. Spark Job1 Spark mem Spark Job2 Spark mem HDFS / Amazon S3 block 1 block 3 block 2 block 4 HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 Data not duplicated at memory-level Consolidating Memory storage engine & execution engine same process 12
  • 13. Data Resilience during Crashes Spark Task Spark Memory block manager block 1 block 3 HDFS / Amazon S3 block 1 block 3 block 2 block 4 Process crash requires network and/or disk I/O to re-read the data storage engine & execution engine same process 13
  • 14. Data Resilience during Crashes Crash Spark Memory block manager block 1 block 3 HDFS / Amazon S3 block 1 block 3 block 2 block 4 Process crash requires network and/or disk I/O to re-read the data storage engine & execution engine same process 14
  • 15. HDFS / Amazon S3 Data Resilience during Crashes block 1 block 3 block 2 block 4 Crash storage engine & execution engine same process Process crash requires network and/or disk I/O to re-read the data 15
  • 16. Spark Task Spark Memory block manager HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 Process crash only needs memory I/O to re-read the data Data Resilience during Crashes storage engine & execution engine same process 16
  • 17. Crash Process crash only needs memory I/O to re-read the data HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 Data Resilience during Crashes storage engine & execution engine same process 17
  • 18. CASE STUDY Thanks to Alluxio, we now have the raw data immediately available at every iteration and we can skip the costs of loading in terms of time waiting, network traffic, and RDBMS activity. - Henry Powell, Barclays RESULTS •  Barclays workflow iteration time decreased from hours to seconds •  Alluxio enabled workflows that were impossible before •  By keeping data only in memory, the I/O cost of loading and storing in Alluxio is now on the order of seconds Relational Database Share Data Across Jobs at Memory Speed •  Reducing time from hours to seconds •  Solving issues of Teradata as under storage 18
  • 19. Barclays Previous Architecture Spark Teradata Slow ETL, 30+ minutes Each context restart requires ETL process again h"ps://dzone.com/ar1cles/Accelerate-­‐In-­‐Memory-­‐Processing-­‐with-­‐Spark-­‐from-­‐Hours-­‐to-­‐Seconds-­‐With-­‐Tachyon   19
  • 20. Barclays Architecture with Alluxio Spark Teradata Alluxio ETL only once Data still available in Alluxio memory after Job crash 20
  • 21. CASE 2: Fast Data sharing between apps Application 1 Alluxio Application2 HDFS 21
  • 22. Spark Job Spark Memory block 1 block 3 Hadoop MR Job YARN HDFS / Amazon S3 block 1 block 3 block 2 block 4 storage engine & execution engine same process Data Sharing Between Frameworks Inter-process sharing slowed down by network and/or disk I/O 22
  • 23. Data Sharing Between Frameworks Spark Job Spark Memory Hadoop MR Job YARN HDFS / Amazon S3 block 1 block 3 block 2 block 4 HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 storage engine & execution engine same process Inter-process sharing can happen at memory speed 23
  • 24. CASE STUDY We’ve been running Alluxio in production for over 9 months, resulting in 15x speedup on average, and 300x speedup at peak service times. - Xueyan Li, Qunar RESULTS •  Multiple computing frameworks can share data at memory speed with Alluxio •  Improved the performance of their system with 15x – 300x speedups •  Alluxio provides various user-friendly APIs, which simplifies the transition to Alluxio. Sharing Data Across Different Compute Frameworks •  6 billion logs (4.5 TB) daily •  Mix of Memory + HDD 24
  • 25. Qunar Previous Architecture Flink Streaming Spark Spark Streaming Flink HDFS Cluster Some jobs cannot finish Slow access to the storage 25
  • 26. Qunar Architecture with Alluxio Flink Streaming HDFS Cluster Spark Spark Streaming Flink Alluxio Share in-memory data across frameworks 26
  • 27. CASE 3: Accelerate access to remote storage Application Alluxio Remote Storage 27
  • 28. •  Different compute and storage hardware requirements •  Scale compute and storage resources independently •  Cost effective object stores (Amazon S3, Google GCS, Microsoft Azure Blob Store) are inherently decoupled Decoupling Compute from Storage Accessing data requires remote I/O! 28
  • 29. Remote I/O problems Application Remote Storage every data operation requires data transfer, sometimes over the WAN high latency, network throughput 29
  • 30. Visualizing the Stack with Alluxio FAST 104 - 105 MB/s Application Remote Storage MODERATE 103 - 104 MB/s SLOW 102 - 103 MB/s Alluxio MEM Often Only When Necessary Alluxio SSD/HDD Limited 30
  • 31. What if memory capacity is still not enough? 31
  • 32. Alluxio Manages All Local Storage MEM SSD HDD Faster Higher Capacity 32
  • 33. Configurable Storage Tiers MEM only MEM + HDD SSD only 33
  • 34. Pluggable Tier Management Policies Evict stale data to slower tier Promote hot data to faster tier 34
  • 35. CASE STUDY Baidu File System The performance was amazing. With Spark SQL alone, it took 100-150 seconds to finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. - Shaoshan Liu, Baidu RESULTS •  Data queries are now 30x faster with Alluxio •  1000-worker Alluxio cluster run stably, providing over 50TB of RAM space •  By using Alluxio, batch queries usually lasting over 15 minutes were transformed into an interactive query taking less than 30 seconds Accelerate Access to Remote Storage •  1000+ nodes deployment •  2+ petabytes of storage •  Mix of memory + HDD •  1000-worker cluster 35
  • 36. CASE 4: Unified Name Space 36   Application Alluxio AWS S3 HDFS
  • 37. Common Scenarios •  At large organizations, data spans many storage systems •  Legacy storage systems •  Application logic needs to integrate with different storage systems 37
  • 38. Unified Name Space An abstraction for applications to access different storage systems through the same interface •  Benefits –  Application written once with Allluxio API –  Transparently add new storage systems –  Seamlessly integrate with existing systems 38
  • 39. Transparent Naming Operations over persisted Alluxio objects mapped transparently to underlying storage alluxio://Data/Sales <-> hdfs://Data/Sales Alluxio Storage System (HDFS, S3, …) alluxio://host:port/ Data Users Reports Sales Alice Bob hdfs://host:port/ Data Users Reports Sales Alice Bob 39 mount
  • 40. Multiple Storage Systems •  Unified namespace for multiple data sources alluxio://Data/Sales -> s3://bucket/Sales alluxio://Users/Alice -> hdfs://Users/Alice •  API for on-the-fly mounting / unmounting Alluxio Storage System A alluxio://host:port/ Data Users Alice Bob hdfs://host:port/ Users Alice Bob Storage System B s3://host/bucket Reports Sales Reports Sales 40 mount mount
  • 41. A Proliferation of Under Storage Systems MORE… 41
  • 42. CASE STUDY RESULTS •  Alluxio’s unified namespace enables different applications and frameworks to easily interact with their data from different storage systems •  Global data centers across North America and Asia •  Tiered storage feature manages various storage resources including memory, SSD and disk Transparently Manage Data Across Different Storage Systems •  Storage medium: MEM •  Clusters in different regions An Internet Company Machine Learning Pipelines 42 42
  • 43. Outline •  What is Alluxio •  Example use cases 1.  Off-heap memory to alleviate resource pressure 2.  Fast data sharing between jobs 3.  Accelerate access to remote storage 4.  Unified namespace •  Summary 43
  • 44. Takeaway: When to use Alluxio •  Two or more jobs access the same dataset •  Job(s) may not always succeed •  Spark with memory/GC pressure •  Jobs/Apps are pipelined •  Resulting data does not need to be immediately persisted •  Interact with remote data •  Data stored across different storage systems 44
  • 45. Try out Alluxio 1.2.0 http://www.alluxio.org/releases 45
  • 46. First China Meetups! •  Nanjing (Saturday, 7/30) •  Shanghai (Sunday, 7/31) •  Beijing (Sunday, 8/7) 46
  • 47. Resources •  Alluxio Project: http://www.alluxio.org •  Development: https://github.com/Alluxio/alluxio •  Meet Friends: http://www.meetup.com/Alluxio •  Alluxio, Inc.: http://www.alluxio.com •  Training: http://www.alluxio.com/alluxio-training/ •  Contact us: info@alluxio.com •  Alluxio Wechat: Alluxio 47