SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Pyramid: A large-scale
array-oriented active storage
system
Viet-Trung TRAN, Nicolae Bogdan,
Gabriel Antoniu, Luc Bougé


 KerData Team
 Inria, Rennes, France             02 09 2011
Outline


1. Motivation

2. Architecture

3. Preliminary evaluation

4. Conclusion




Viet-TrungTran              02 09 2011 - 2
1
Motivation
Whyarray-orientedstorage?




Viet-TrungTran              00 MOIS 2011 - 3
Context: Data-intensive large-scale HPC
simulations
• The scalability of data management is becoming
   a critical issue
• Mismatch between storage model and application
   data model
• Application data model
     - Multidimensional typed arrays, images, etc.
• Storage model
     - Parallel file systems: Simple and flat I/O
       model
     - Relational model: ill-suited for Scientifics
• Need additional layers to map the application
   model to the storage model


                                                      
                                                          •Sequence of bytes
Viet-TrungTran                                                   02 09 2011 - 4
[M. Stonebraker] The one-storage-fits-all-
needs has reached its limits
• Parallel I/O stack:
     - Performance of non-contiguous I/O vs data
       atomicity
• Relational data model:
                                                       Application (Visit, Tornado
     - Simulating arrays on top of table is poor in           simulation)
       performance
                                                      Data model (HDF5, NetCDF)
     - Scalability for join queries
• Need to specialize the I/O stack to match the           MPI-IO middleware
   applications requirements
                                                          Parallel file systems
     - Array-oriented storage for array data model
• Example: SciDB with ArrayStore.




Viet-TrungTran                                                           02 09 2011 - 5
Our approach



• Multi-dimensional aware chunking
• Lock-free, distributed chunk indexing
• Array versioning
• Active storage support
• Versioning array-oriented access interface




Viet-TrungTran                                 02 09 2011 - 6
Multi-dimensional aware chunking

• Split array into equal chunks and distributed over storage elements
     - Simplify load balancing among storage elements
     - Keep the neighbors of cells in the same chunk
• Shared nothing architecture
     - Easier to handle data consistency




Viet-TrungTran                                                          02 09 2011 - 7
Lock-free, distributed chunk indexing

• Indexing multi-dimensional information
     - R-tree, XD-tree, Quad-tree, etc
     - Designed and optimized centralized management
• Centralized metadata management scheme may not scale
     - Bottleneck under highly concurrency
• Our approach:
     - Porting quad-tree like structures to distributed environment
     - Using shadowing technique on quad-tree to enable lock-free
       concurrent update




Viet-TrungTran                                                        02 09 2011 - 8
Array versioning



• Scientific applications need array versioning (VLDB 2009)
     - Check pointing
     - Cloning
     - Provenance
• Keep data and metadata immutable
     - Updating a chunk is handled at metadata level using shadowing
       technique




Viet-TrungTran                                                         02 09 2011 - 9
Active storage support

• Move data computation to storage elements
     - Conserve bandwidth
     - Better workload parallelization
• Allow user sending User defined handlers to storage servers




Viet-TrungTran                                                  02 09 2011 - 10
Versioning array-oriented access interface

• Basic primitives
     - id = CREATE(n, sizes[], defval)
     - READ(id, v, offsets[], sizes[], buffer)
     - w = WRITE(id, offsets[], sizes[], buffer)
     - w = SEND_COMPUTATION(id, v, offsets[], sizes[], f)
• Other primitives like cloning, filtering mostly can be implemented based
   on these above primitives




Viet-TrungTran                                                           02 09 2011 - 11
2
Pyramid: Architecture




Viet-TrungTran          02 09 2011 - 12
Architecture

• Pyramid is inspired by our previous work: BlobSeer [JPDC 2011]
• Version managers
     - Ensure concurrency control
• Metadata managers
     - Store index tree nodes
• Storage manager
     - Monitor the storage servers
     - Ensures a load balancing strategy of chunks among storage servers
• Active storage servers
     - Store chunks and perform handlers on chunks
• Clients
     - Perform I/O accesses




Viet-TrungTran                                                        02 09 2011 - 13
Read
                                                                  Storage Metadata Version
                                                         Client   servers managers managers
•   I: optionally ask the version manager for
    the latest published version                   I

                                                   II
•   II: fetch the corresponding metadata from
    the metadata managers

                                                   III
•   III: contact storage servers in parallel and
    fetch the chunks in the local buffer




Viet-TrungTran                                                                02 09 2011 - 14
Write
                                                              Storage Metadata Version Storage
                                                       Client servers managers manager manager
•   I: get a list of storage servers that are
    able to store the chunks, one for each            I
    chunk


•   II: contact storage servers in parallel and    II
    write the chunks to the corresponding
    providers                                     III

                                                  IV
•   III: get a version number for the update
                                                  V
•   IV: add new metadata to consolidate the
    new version


•   V: report the new version is ready for
    publication.

Viet-TrungTran                                                                  02 09 2011 - 15
Lock-free, distributed chunk indexing

• Organized as a Quad-tree to index 2D arrays
• Each tree node has at most 4 children, each covers one of the four quadrants
• Root tree covers the whole array
• Each leaf corresponds to a chunk and holds information about its location
• Tree nodes are immutable, uniquely identified by the version number and the
   sub-domain they cover
• Using DHT to distribute tree nodes over metadata managers




Viet-TrungTran                                                          02 09 2011 - 16
Tree shadowing to update

• Write newly created chunks to storage servers
• Build the quad-tree associated to the new snapshot in bottom-up fashion
     - Writing the leaves to DHT
     - Inner nodes may point to nodes of previous snapshots (imply a
       synchronization of the quad-tree generation)
     - Avoid synchronization by feeding additional information about the other
       concurrent updaters (thank to computational ID of tree nodes)




Viet-TrungTran                                                           02 09 2011 - 17
Efficient parallel updating
                                               Client   Client Storage Metadata Version
                                                 #1       #2 servers managers manager
• Chunks are written concurrently


• Versions are assigned in the order the
   clients finish writing


• Clients get additional information about
   the other concurrent writers


• Tree nodes are written in lock-free manner


• Versions are published in the order they
   were assigned
                                                                                 Publish
                                                                                 Publish




Viet-TrungTran                                                            02 09 2011 - 18
Some more I/O primitives

• Easily implemented thanks to immutable data and metadata blocks
• Cheap I/O operators
• Clone a sub-domain
     - Following the metadata tree of a specific snapshot
     - Creating new metadata tree and publish as a newly created array
• Filtering, compression ca be done locally in parallel at active storage servers by
   introducing user defined handlers




Viet-TrungTran                                                            02 09 2011 - 19
3
Preliminary evaluation
Experimented on G5K (www.grid5000.fr)




Viet-TrungTran                          02 09 2011 - 20
Experimental setup

Simulate common access pattern exhibited by scientific applications: Array Dicing


• Using at most 130 nodes of Graphene cluster on G5K
     - 1 Gbps Ethernet interconnected network
     - 49 nodes deployed our Pyramid and the competitor system PVFS
• Array dicing
     - Each client accesses a dedicated sub-array
     - 1 GB per clients consisting 32x32 chunks (1024x1024 bytes chunk size)
     - Concurrent Reading/Writing
• Measure the performance and scalability




Viet-TrungTran                                                          02 09 2011 - 21
Aggregated throughput achieved under
concurrency
• PVFS suffers from non-
   contiguous access pattern due
   to serialization to flat file
• Pyramid
     - Throughputincreased
       steady
     - Promising good scalability
       on both data and metadata
       organization




Viet-TrungTran                      02 09 2011 - 22
4
Conclusion




Viet-TrungTran   02 09 2011 - 23
Conclusion

• Pyramid is an array-oriented active storage system
• Proposed a system offering support for
     - Parallel array processing for both read and write workloads
     - Versioning data
     - Distributed metadata management, shadowing to reflect updates
• Preliminary evaluation shows promising scalability




• Future work
     - Planed to integrate to HDF5
     - Pyramid as a storage engine for SciDB?
     - Investigate on keeping data at quad-tree nodes
            Could be used for store array at different resolutions (map application)


Viet-TrungTran                                                                         02 09 2011 - 24
Thankyou



   INRIA – KerDataResearch Team

   www.irisa.fr/kerdata

Weitere ähnliche Inhalte

Was ist angesagt?

IBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object StorageIBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object StorageTony Pearson
 
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...xKinAnx
 
IBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object StorageIBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object StorageTony Pearson
 
IBM Spectrum Scale Security
IBM Spectrum Scale Security IBM Spectrum Scale Security
IBM Spectrum Scale Security Sandeep Patil
 
1.2 build cloud_fabric_final
1.2 build cloud_fabric_final1.2 build cloud_fabric_final
1.2 build cloud_fabric_finalPaulo Freitas
 
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...Ibm spectrum scale fundamentals workshop for americas part 1 components archi...
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...xKinAnx
 
IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic StoragePatrick Bouillaud
 
Gluster Blog 11.15.2010
Gluster Blog 11.15.2010Gluster Blog 11.15.2010
Gluster Blog 11.15.2010GlusterFS
 
Geek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An IntroductionGeek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An IntroductionIDERA Software
 
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...xKinAnx
 
IMCSummit 2015 - Day 2 Developer Track - The NVM Revolution
IMCSummit 2015 - Day 2 Developer Track - The NVM RevolutionIMCSummit 2015 - Day 2 Developer Track - The NVM Revolution
IMCSummit 2015 - Day 2 Developer Track - The NVM RevolutionIn-Memory Computing Summit
 
Network Attached Storage (NAS) Initiative
Network Attached Storage (NAS) Initiative Network Attached Storage (NAS) Initiative
Network Attached Storage (NAS) Initiative Gary Wilhelm
 
Maginatics Cloud Storage Platform
Maginatics Cloud Storage PlatformMaginatics Cloud Storage Platform
Maginatics Cloud Storage PlatformMaginatics
 
Storage Enhancements in Windows 2012 R2
Storage Enhancements in Windows 2012 R2Storage Enhancements in Windows 2012 R2
Storage Enhancements in Windows 2012 R2Michael Rüefli
 

Was ist angesagt? (20)

IBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object StorageIBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object Storage
 
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
 
IBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object StorageIBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object Storage
 
IBM Spectrum Scale Security
IBM Spectrum Scale Security IBM Spectrum Scale Security
IBM Spectrum Scale Security
 
1.2 build cloud_fabric_final
1.2 build cloud_fabric_final1.2 build cloud_fabric_final
1.2 build cloud_fabric_final
 
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...Ibm spectrum scale fundamentals workshop for americas part 1 components archi...
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...
 
IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic Storage
 
Gluster Blog 11.15.2010
Gluster Blog 11.15.2010Gluster Blog 11.15.2010
Gluster Blog 11.15.2010
 
Geek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An IntroductionGeek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An Introduction
 
11. dfs
11. dfs11. dfs
11. dfs
 
Storage Managment
Storage ManagmentStorage Managment
Storage Managment
 
Hot sec10 slide-suzaki
Hot sec10 slide-suzakiHot sec10 slide-suzaki
Hot sec10 slide-suzaki
 
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
 
IMCSummit 2015 - Day 2 Developer Track - The NVM Revolution
IMCSummit 2015 - Day 2 Developer Track - The NVM RevolutionIMCSummit 2015 - Day 2 Developer Track - The NVM Revolution
IMCSummit 2015 - Day 2 Developer Track - The NVM Revolution
 
Network Attached Storage (NAS) Initiative
Network Attached Storage (NAS) Initiative Network Attached Storage (NAS) Initiative
Network Attached Storage (NAS) Initiative
 
Maginatics Cloud Storage Platform
Maginatics Cloud Storage PlatformMaginatics Cloud Storage Platform
Maginatics Cloud Storage Platform
 
Storage Enhancements in Windows 2012 R2
Storage Enhancements in Windows 2012 R2Storage Enhancements in Windows 2012 R2
Storage Enhancements in Windows 2012 R2
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Pbasanta@jtres06 extendedportal
Pbasanta@jtres06 extendedportalPbasanta@jtres06 extendedportal
Pbasanta@jtres06 extendedportal
 
Storage system architecture
Storage system architectureStorage system architecture
Storage system architecture
 

Ähnlich wie Pyramid: A large-scale array-oriented active storage system

Openstorage with OpenStack, by Bradley
Openstorage with OpenStack, by BradleyOpenstorage with OpenStack, by Bradley
Openstorage with OpenStack, by BradleyHui Cheng
 
Pm 01 bradley stone_openstorage_openstack
Pm 01 bradley stone_openstorage_openstackPm 01 bradley stone_openstorage_openstack
Pm 01 bradley stone_openstorage_openstackOpenCity Community
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container EcosystemVinay Rao
 
Containers 101 Meetup - VMs vs Containers
Containers 101 Meetup - VMs vs ContainersContainers 101 Meetup - VMs vs Containers
Containers 101 Meetup - VMs vs ContainersTommy Berry
 
Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4Tony Pearson
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
Using Distributed In-Memory Computing for Fast Data Analysis
Using Distributed In-Memory Computing for Fast Data AnalysisUsing Distributed In-Memory Computing for Fast Data Analysis
Using Distributed In-Memory Computing for Fast Data AnalysisScaleOut Software
 
Membase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScyllaDB
 
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...LF_DPDK
 
Oracle SOA suite and Coherence dehydration
Oracle SOA suite and  Coherence dehydrationOracle SOA suite and  Coherence dehydration
Oracle SOA suite and Coherence dehydrationMichel Schildmeijer
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...raghdooosh
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications OpenEBS
 
What is cloud computing
What is cloud computingWhat is cloud computing
What is cloud computingBrian Bullard
 
Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Dharma Shukla
 

Ähnlich wie Pyramid: A large-scale array-oriented active storage system (20)

Openstorage with OpenStack, by Bradley
Openstorage with OpenStack, by BradleyOpenstorage with OpenStack, by Bradley
Openstorage with OpenStack, by Bradley
 
Pm 01 bradley stone_openstorage_openstack
Pm 01 bradley stone_openstorage_openstackPm 01 bradley stone_openstorage_openstack
Pm 01 bradley stone_openstorage_openstack
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Openstorage Openstack
Openstorage OpenstackOpenstorage Openstack
Openstorage Openstack
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container Ecosystem
 
Containers 101 Meetup - VMs vs Containers
Containers 101 Meetup - VMs vs ContainersContainers 101 Meetup - VMs vs Containers
Containers 101 Meetup - VMs vs Containers
 
Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Using Distributed In-Memory Computing for Fast Data Analysis
Using Distributed In-Memory Computing for Fast Data AnalysisUsing Distributed In-Memory Computing for Fast Data Analysis
Using Distributed In-Memory Computing for Fast Data Analysis
 
Membase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase Meetup - Silicon Valley
Membase Meetup - Silicon Valley
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the Database
 
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
 
Oracle SOA suite and Coherence dehydration
Oracle SOA suite and  Coherence dehydrationOracle SOA suite and  Coherence dehydration
Oracle SOA suite and Coherence dehydration
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
What is cloud computing
What is cloud computingWhat is cloud computing
What is cloud computing
 
Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019
 
Hpts 2011 flexible_oltp
Hpts 2011 flexible_oltpHpts 2011 flexible_oltp
Hpts 2011 flexible_oltp
 

Mehr von Viet-Trung TRAN

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Viet-Trung TRAN
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreViet-Trung TRAN
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnViet-Trung TRAN
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processingViet-Trung TRAN
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookViet-Trung TRAN
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studyViet-Trung TRAN
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkViet-Trung TRAN
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkViet-Trung TRAN
 
Large-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkLarge-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkViet-Trung TRAN
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learningViet-Trung TRAN
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposalsViet-Trung TRAN
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents Viet-Trung TRAN
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Viet-Trung TRAN
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Viet-Trung TRAN
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learningViet-Trung TRAN
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 

Mehr von Viet-Trung TRAN (20)

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớn
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processing
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case study
 
Giasan.vn @rstars
Giasan.vn @rstarsGiasan.vn @rstars
Giasan.vn @rstars
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
Large-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkLarge-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on Spark
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposals
 
GPSinsights poster
GPSinsights posterGPSinsights poster
GPSinsights poster
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 

Pyramid: A large-scale array-oriented active storage system

  • 1. Pyramid: A large-scale array-oriented active storage system Viet-Trung TRAN, Nicolae Bogdan, Gabriel Antoniu, Luc Bougé KerData Team Inria, Rennes, France 02 09 2011
  • 2. Outline 1. Motivation 2. Architecture 3. Preliminary evaluation 4. Conclusion Viet-TrungTran 02 09 2011 - 2
  • 4. Context: Data-intensive large-scale HPC simulations • The scalability of data management is becoming a critical issue • Mismatch between storage model and application data model • Application data model - Multidimensional typed arrays, images, etc. • Storage model - Parallel file systems: Simple and flat I/O model - Relational model: ill-suited for Scientifics • Need additional layers to map the application model to the storage model  •Sequence of bytes Viet-TrungTran 02 09 2011 - 4
  • 5. [M. Stonebraker] The one-storage-fits-all- needs has reached its limits • Parallel I/O stack: - Performance of non-contiguous I/O vs data atomicity • Relational data model: Application (Visit, Tornado - Simulating arrays on top of table is poor in simulation) performance Data model (HDF5, NetCDF) - Scalability for join queries • Need to specialize the I/O stack to match the MPI-IO middleware applications requirements Parallel file systems - Array-oriented storage for array data model • Example: SciDB with ArrayStore. Viet-TrungTran 02 09 2011 - 5
  • 6. Our approach • Multi-dimensional aware chunking • Lock-free, distributed chunk indexing • Array versioning • Active storage support • Versioning array-oriented access interface Viet-TrungTran 02 09 2011 - 6
  • 7. Multi-dimensional aware chunking • Split array into equal chunks and distributed over storage elements - Simplify load balancing among storage elements - Keep the neighbors of cells in the same chunk • Shared nothing architecture - Easier to handle data consistency Viet-TrungTran 02 09 2011 - 7
  • 8. Lock-free, distributed chunk indexing • Indexing multi-dimensional information - R-tree, XD-tree, Quad-tree, etc - Designed and optimized centralized management • Centralized metadata management scheme may not scale - Bottleneck under highly concurrency • Our approach: - Porting quad-tree like structures to distributed environment - Using shadowing technique on quad-tree to enable lock-free concurrent update Viet-TrungTran 02 09 2011 - 8
  • 9. Array versioning • Scientific applications need array versioning (VLDB 2009) - Check pointing - Cloning - Provenance • Keep data and metadata immutable - Updating a chunk is handled at metadata level using shadowing technique Viet-TrungTran 02 09 2011 - 9
  • 10. Active storage support • Move data computation to storage elements - Conserve bandwidth - Better workload parallelization • Allow user sending User defined handlers to storage servers Viet-TrungTran 02 09 2011 - 10
  • 11. Versioning array-oriented access interface • Basic primitives - id = CREATE(n, sizes[], defval) - READ(id, v, offsets[], sizes[], buffer) - w = WRITE(id, offsets[], sizes[], buffer) - w = SEND_COMPUTATION(id, v, offsets[], sizes[], f) • Other primitives like cloning, filtering mostly can be implemented based on these above primitives Viet-TrungTran 02 09 2011 - 11
  • 13. Architecture • Pyramid is inspired by our previous work: BlobSeer [JPDC 2011] • Version managers - Ensure concurrency control • Metadata managers - Store index tree nodes • Storage manager - Monitor the storage servers - Ensures a load balancing strategy of chunks among storage servers • Active storage servers - Store chunks and perform handlers on chunks • Clients - Perform I/O accesses Viet-TrungTran 02 09 2011 - 13
  • 14. Read Storage Metadata Version Client servers managers managers • I: optionally ask the version manager for the latest published version I II • II: fetch the corresponding metadata from the metadata managers III • III: contact storage servers in parallel and fetch the chunks in the local buffer Viet-TrungTran 02 09 2011 - 14
  • 15. Write Storage Metadata Version Storage Client servers managers manager manager • I: get a list of storage servers that are able to store the chunks, one for each I chunk • II: contact storage servers in parallel and II write the chunks to the corresponding providers III IV • III: get a version number for the update V • IV: add new metadata to consolidate the new version • V: report the new version is ready for publication. Viet-TrungTran 02 09 2011 - 15
  • 16. Lock-free, distributed chunk indexing • Organized as a Quad-tree to index 2D arrays • Each tree node has at most 4 children, each covers one of the four quadrants • Root tree covers the whole array • Each leaf corresponds to a chunk and holds information about its location • Tree nodes are immutable, uniquely identified by the version number and the sub-domain they cover • Using DHT to distribute tree nodes over metadata managers Viet-TrungTran 02 09 2011 - 16
  • 17. Tree shadowing to update • Write newly created chunks to storage servers • Build the quad-tree associated to the new snapshot in bottom-up fashion - Writing the leaves to DHT - Inner nodes may point to nodes of previous snapshots (imply a synchronization of the quad-tree generation) - Avoid synchronization by feeding additional information about the other concurrent updaters (thank to computational ID of tree nodes) Viet-TrungTran 02 09 2011 - 17
  • 18. Efficient parallel updating Client Client Storage Metadata Version #1 #2 servers managers manager • Chunks are written concurrently • Versions are assigned in the order the clients finish writing • Clients get additional information about the other concurrent writers • Tree nodes are written in lock-free manner • Versions are published in the order they were assigned Publish Publish Viet-TrungTran 02 09 2011 - 18
  • 19. Some more I/O primitives • Easily implemented thanks to immutable data and metadata blocks • Cheap I/O operators • Clone a sub-domain - Following the metadata tree of a specific snapshot - Creating new metadata tree and publish as a newly created array • Filtering, compression ca be done locally in parallel at active storage servers by introducing user defined handlers Viet-TrungTran 02 09 2011 - 19
  • 20. 3 Preliminary evaluation Experimented on G5K (www.grid5000.fr) Viet-TrungTran 02 09 2011 - 20
  • 21. Experimental setup Simulate common access pattern exhibited by scientific applications: Array Dicing • Using at most 130 nodes of Graphene cluster on G5K - 1 Gbps Ethernet interconnected network - 49 nodes deployed our Pyramid and the competitor system PVFS • Array dicing - Each client accesses a dedicated sub-array - 1 GB per clients consisting 32x32 chunks (1024x1024 bytes chunk size) - Concurrent Reading/Writing • Measure the performance and scalability Viet-TrungTran 02 09 2011 - 21
  • 22. Aggregated throughput achieved under concurrency • PVFS suffers from non- contiguous access pattern due to serialization to flat file • Pyramid - Throughputincreased steady - Promising good scalability on both data and metadata organization Viet-TrungTran 02 09 2011 - 22
  • 23. 4 Conclusion Viet-TrungTran 02 09 2011 - 23
  • 24. Conclusion • Pyramid is an array-oriented active storage system • Proposed a system offering support for - Parallel array processing for both read and write workloads - Versioning data - Distributed metadata management, shadowing to reflect updates • Preliminary evaluation shows promising scalability • Future work - Planed to integrate to HDF5 - Pyramid as a storage engine for SciDB? - Investigate on keeping data at quad-tree nodes Could be used for store array at different resolutions (map application) Viet-TrungTran 02 09 2011 - 24
  • 25. Thankyou INRIA – KerDataResearch Team www.irisa.fr/kerdata