4. Challenges of Deep Learning
4
Variety of storage options for users
• More data typically leads to better
modeling performance
• Distributed vs Cloud vs Local
Data Engineers and Data Movement
• Separation of Storage and Compute
• Latency increases costs and training
times
• Limited success of Data Warehouse,
Mart, and Lakes – cost of
copying/moving data is substantial
7. Alluxio: Intelligent Cache
• Local performance from remote data using multi-tier storage
RAM SSD HDD
Hot Warm Cold
Read & Write
Buffering
Transparent to App
Policies for pinning,
promotion/demotion, TTL
3/25/19 7
8. Alluxio: Common Data Access API
• Convert from Client-side Interface to Storage API
Bigdata Filesystem API
HDFS Connector S3A Connector Swift Connector
Google Cloud
Connector
3/25/19 8
POSIX Filesystem API
9. A Common File System Abstraction
9
• Common interface across apps
• HDFS-compatible interface:
change hdfs://foo/bar to
alluxio://foo/bar
• Other interfaces: Native Alluxio Java
FS, POSIX and S3.
• Cloud storage becomes “hidden”
to apps
• Greater Flexibility
Compute Zone
Standalone or managed with Mesos or Yarn
Storage in Different Availability Zone
Either on-prem or cloud
TensorflowPrestoMR
HDFS API POSIX API
10. Alluxio: FUSE
10
Through Alluxio-FUSE, you can mount Alluxio and expose it as
a local file system on Unix
Applications can interact with Alluxio using standard POSIX
APIs (open, write, read) without any custom client integration
Note: Since Alluxio as a write-once/read-many file system, the
mounted file system will not support all POSIX workloads
12. Make Distributed Data Available Locally
• FUSE Interface makes all enterprise data available locally
SUPPORTS
• HDFS
• NFS
• OpenStack
• Ceph
• Amazon S3
• Azure
• Google Cloud
IT OPS FRIENDLY
• Storage mounted into
Alluxio by central IT
• Security in Alluxio mirrors
source data
• Authentication through
LDAP/AD
• Wireline encryption
HDFS #1
Obj Store
NFS
HDFS #2
3/25/19 12
13. Overcomes I/O bottleneck on Cloud
More details at
https://www.alluxio.com/blog/flexible-and-fast-storage-for-deep-learning-with-alluxio
13
14. Conclusion
• Alluxio: Unified data access layer for
big data and ML applications
• Serve ML apps using Fuse-based
POSIX API, presenting and locally
caching large data sets from the cloud
• Try it out: www.alluxio.org/download