SlideShare a Scribd company logo
1 of 22
Download to read offline
Fine-grained Scalability of 
Digital Library Services in the 
Cloud 
Lebeko Poulo, Lighton Phiri 
and Hussein Suleman 
Digital Libraries Laboratory 
Department of Computer Science 
University of Cape Town
Research Overview 
 Digital Libraries (DLs) and Digital Library 
Systems (DLSes) 
 Research objectives 
 Develop techniques for building scalable digital 
information management systems based on efficient 
and on-demand use of generic grid-based 
technologies 
 Explore the use of existing cloud computing 
resources 
 Research questions 
 Can a typical DL architecture be layered over an 
on-demand paradigm such as cloud computing? 
 Is there linear scalability with increasing data and 
service capacity needs?
How Quickly Does Data Scale? 
 Extent of data scalability 
 Data growth rates estimated at 40% per year 
 By 2020, data volumes will have grown to 44 times 
the 2009 size
Scaling Digital Library Systems 
 Key criteria for design/implementation of DLSes 
 Scalability 
 Preservation 
 The promise of cloud computing proven many 
times 
 Feasibility of migrating and hosting DLs evident 
 Investigation of deep integration of DL services 
with cloud services required 
 Investigate efficacy of DL cloud adoption 
 Verify extent of unlimited scale 
 Maximise potential for cloud-service-level scalability
Prototype DLS - Design 
 RQ #1—Can a typical DL architecture be 
layered over an on-demand paradigm? 
 Prior work on potential architectural designs for 
utility clouds 
 Emulation of parallel programming architectures 
 Utility computing offers flexibility of multiple 
architectural models 
 Potential architectures for scalable utility services 
 Two architectural patterns adopted as basis for 
design of prototype architecture 
 Proxy architectures 
 Some aspects of Client-side architecture
Prototype DLS - Architecture 
Browse Module 
Buckets 
Amazon 
S3 
Amazon 
EBS 
Amazon 
SimpleDB 
Search Module 
Domains 
OAI-PMH Harvester Module 
Instance 
A 
Instance 
B 
Instance 
C 
Instance 
D 
Amazon EC2 
REST API
Prototype DLS - Services 
Web User Interface 
Browse Module Search Module OAI-PMH Harvester Module 
 Two typical DL services, accessible via publicly 
available Light-weight process Web interface 
 Browse module—enable access through gradual 
refinement 
 Search module—enable access through search 
queries 
 OAI-PMH endpoint used to ingest data into 
collections
Prototype DLS - Application Server 
Instance 
A 
Instance 
B 
Instance 
C 
Instance 
D 
Amazon EC2 
REST API 
 Amazon Elastic Compute Cloud (EC2) to 
provide sizeable computing capacity 
 32-bit Ubuntu Amazon Machine Images (AMIs) 
 Glassfish 3.1 
 Prototype DLS
Prototype DLS - Data Storage 
Buckets 
Amazon 
S3 
Amazon 
EBS 
Domains 
Amazon 
SimpleDB 
REST API 
 Amazon Simple Storage Service (S3) for storage 
and retrieval of large numbers of data objects 
 Amazon SimpleDB for querying stored 
structured data 
 Amazon Elastic Block Store (EBS) to enable 
storage persistence of EC2 instances
Evaluation - Experimental Design 
 RQ #2—Is there linear scalability with increasing 
capacity needs? 
 Goals 
 Evaluate potential scalability advantages associated 
with cloud-based DLs 
 Evaluation aspects 
 Data/service scalability and load testing 
 Workload 
 Number of user requests, number of users and 
collection sizes 
 Metrics 
 Response time 
 Factors 
 EC2 instances, users, requests, collection size
Evaluation - Experimental Setup 
 Test dataset—NDLTD and NETD portals 
 Ingested using OAI-PMH harvester module 
 Execution environment 
 All experimental test conducted on EC2 cloud 
infrastructure 
 EC2 instance of type t1.micro used for 
server-side processing 
 32-bit Ubuntu Amazon Machine Image (AMI) 
configuration 
 Apache JMeter used to simulate user requests 
 All measurement results based on five-run 
averages
Experiment #1 - Service Scalability 
 Determine the time taken for browse and search 
service requests 
 Assess impact due to variation of multiple server 
front-ends 
 Methodology 
 JMeter used to simulate 50 users for each Web 
service, ten times 
 Web services hosted on four identical EC2 instances 
 Experiments repeated at least five times for each 
service criteria 
 Comparative analysis—browsing categories for 
browse service—by partitioning requests into blocks 
of 50
Experiment #1 - Browse Service 
1200 
ms) 
1000 
(time Response 800 
600 
Number of requests 1 - 50 
51 - 100 
101 - 150 
151 - 200 
201 - 250 
251 - 300 
301 - 350 
351 - 400 
401 - 450 
451 - 500 
Browsing by title Browsing by date Browsing by author
Experiment #1 - Browse Service (2) 
1500 
ms) 
(time 1200 
Response 900 
600 
Timestamp 1356925111123 
1356925111538 
1356925111934 
1356925112362 
1356925112765 
1356925113081 
1356925113491 
1356925113821 
1356925114118 
1356925114515 
1356925114757 
1356925115156 
1356925115455 
1356925115761 
1356925116075 
1356925116568 
1356925116789 
1356925117103 
1356925117528 
1356925117749 
1356925118178 
1356925118446 
1356925118742 
1356925119092 
1356925119425 
Browse by author Browse by date Browse by title
Experiment #1 - Browse Service (3) 
ms) 
(block 600 
Time/500 
Number of requests 700 
1 - 50 
51 - 100 
101 - 150 
151 - 200 
201 - 250 
251 - 300 
301 - 350 
351 - 400 
401 - 450 
451 - 500 
1 instance 2 instances 3 instances 4 instances
Experiment #2 - Data Scalability 
 Determine service performance for varying 
collection sizes for fixed number of servers 
 Ascertain if application can cope with increasing 
data volumes in DL collections 
 Methodology 
 JMeter set up to simulate 50 users accessing a Web 
service ten times 
 Fixed number of identical servers with collection 
sizes of 4k, 8k, 16k and 32k records 
 Experiments repeated at least five times for each 
service 
 Comparative analysis by partitioning requests into 
blocks of 50
Experiment #2 - Browse Service 
1100 
ms) 
(time 1000 
Response 900 
800 
Number of requests 1 - 50 
51 - 100 
101 - 150 
151 - 200 
201 - 250 
251 - 300 
301 - 350 
351 - 400 
401 - 450 
451 - 500 
4000 8000 16000 32000
Experiment #3 - Load Testing 
 Determine volume of requests application could 
process for increasing concurrent users 
 Methodology 
 JMeter set up to varying number of users accessing 
a Web service 
 Fixed number of identical servers used 
 Initially simulate five users, each accessing a Web 
service ten times 
 Subsequent simulation of 20, 50, 100, 250 and 500 
users 
 Experiments repeated at least five times for each 
service
Experiment #3 - All Services 
1000 
900 
800 
700 
5 
20 
50 
100 
250 
500 
Number of users Response time (ms) 
Search operation Browse operation
Conclusion 
 Key findings 
 Redesign of application architectural components to 
conform to cloud service architecture 
 Results indicate that response times are not 
significantly affected by request complexity, 
collection size or request sequencing 
 Noticeable time taken to connect to AWS—ramp up 
time 
 Study Limitations 
 Single EC2 instance type—t1.micro—used 
 Cloud service vendor 
 Experimental dataset size 
 Query optimisation 
 Synthetic load used
Bibliography 
Hussein Suleman (2009). 
Utility-based High Performance Digital Library Systems. 
Pradeep Teregowda et al. (2010). 
Cloud Computing: A Digital Libraries Perspective. 
Pradeep Teregowda et al. (2010). 
CiteSeerx: A Cloud Perspective. 
Byung Chul Tak et al. (2011). 
To Move or Not to Move: The Economics of Cloud 
Computing. 
Jinesh Varia (2011). 
Architecting for The Cloud: Best Practices.
Questions? 
Additional information 
http://dl.cs.uct.ac.za

More Related Content

Similar to Fine-grained Scalability of Digital Library Services in the Cloud

TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6
Sravanthi N
 
SPTECHCON - Rev Your Engines - SharePoint 2013 Performance Enhancements
SPTECHCON - Rev Your Engines - SharePoint 2013 Performance EnhancementsSPTECHCON - Rev Your Engines - SharePoint 2013 Performance Enhancements
SPTECHCON - Rev Your Engines - SharePoint 2013 Performance Enhancements
Eric Shupps
 
% Razones para alta disponibilidad
% Razones para alta disponibilidad% Razones para alta disponibilidad
% Razones para alta disponibilidad
CPVEN
 

Similar to Fine-grained Scalability of Digital Library Services in the Cloud (20)

An Efficient Queuing Model for Resource Sharing in Cloud Computing
	An Efficient Queuing Model for Resource Sharing in Cloud Computing	An Efficient Queuing Model for Resource Sharing in Cloud Computing
An Efficient Queuing Model for Resource Sharing in Cloud Computing
 
Virtual Stress-free Testing in the Cloud
Virtual Stress-free Testing in the CloudVirtual Stress-free Testing in the Cloud
Virtual Stress-free Testing in the Cloud
 
Cloud Overview
Cloud OverviewCloud Overview
Cloud Overview
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
 
Caching review
Caching   reviewCaching   review
Caching review
 
Ieeepro techno solutions 2014 ieee java project - query services in cost ef...
Ieeepro techno solutions   2014 ieee java project - query services in cost ef...Ieeepro techno solutions   2014 ieee java project - query services in cost ef...
Ieeepro techno solutions 2014 ieee java project - query services in cost ef...
 
Ieeepro techno solutions 2014 ieee dotnet project - query services in cost ...
Ieeepro techno solutions   2014 ieee dotnet project - query services in cost ...Ieeepro techno solutions   2014 ieee dotnet project - query services in cost ...
Ieeepro techno solutions 2014 ieee dotnet project - query services in cost ...
 
TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6
 
AWS Webcast - Sumo Logic
AWS Webcast - Sumo LogicAWS Webcast - Sumo Logic
AWS Webcast - Sumo Logic
 
Utah Codecamp Cloud Computing
Utah Codecamp Cloud ComputingUtah Codecamp Cloud Computing
Utah Codecamp Cloud Computing
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
 
spatial data infrastructure : data modelling and web services for data access
spatial data infrastructure : data modelling and web services for data accessspatial data infrastructure : data modelling and web services for data access
spatial data infrastructure : data modelling and web services for data access
 
SPTECHCON - Rev Your Engines - SharePoint 2013 Performance Enhancements
SPTECHCON - Rev Your Engines - SharePoint 2013 Performance EnhancementsSPTECHCON - Rev Your Engines - SharePoint 2013 Performance Enhancements
SPTECHCON - Rev Your Engines - SharePoint 2013 Performance Enhancements
 
170215 msa intro
170215 msa intro170215 msa intro
170215 msa intro
 
% Razones para alta disponibilidad
% Razones para alta disponibilidad% Razones para alta disponibilidad
% Razones para alta disponibilidad
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the Cloud
 
Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...
Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...
Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
 

More from Lighton Phiri

More from Lighton Phiri (20)

Enterprise Medical Imaging for Streamlined Radiological Diagnosis in Zambian...
Enterprise Medical Imaging for Streamlined Radiological Diagnosis  in Zambian...Enterprise Medical Imaging for Streamlined Radiological Diagnosis  in Zambian...
Enterprise Medical Imaging for Streamlined Radiological Diagnosis in Zambian...
 
User Centred Design and Implementation of Useful Picture Archiving and Commun...
User Centred Design and Implementation of Useful Picture Archiving and Commun...User Centred Design and Implementation of Useful Picture Archiving and Commun...
User Centred Design and Implementation of Useful Picture Archiving and Commun...
 
Enterprise Medical Imaging for Improved Radiological Workflows in Zambian Pub...
Enterprise Medical Imaging for Improved Radiological Workflows in Zambian Pub...Enterprise Medical Imaging for Improved Radiological Workflows in Zambian Pub...
Enterprise Medical Imaging for Improved Radiological Workflows in Zambian Pub...
 
Empirical Evaluation of ETD-ms Compliance for ETDs Harvested by the NDLTD Uni...
Empirical Evaluation of ETD-ms Compliance for ETDs Harvested by the NDLTD Uni...Empirical Evaluation of ETD-ms Compliance for ETDs Harvested by the NDLTD Uni...
Empirical Evaluation of ETD-ms Compliance for ETDs Harvested by the NDLTD Uni...
 
Enterprise Medical Imaging in Public Health Facilities in Zambia: Towards a U...
Enterprise Medical Imaging in Public Health Facilities in Zambia: Towards a U...Enterprise Medical Imaging in Public Health Facilities in Zambia: Towards a U...
Enterprise Medical Imaging in Public Health Facilities in Zambia: Towards a U...
 
Enterprise Medical Imaging in the Global South: Challenges and Opportunities
Enterprise Medical Imaging in the Global South: Challenges and OpportunitiesEnterprise Medical Imaging in the Global South: Challenges and Opportunities
Enterprise Medical Imaging in the Global South: Challenges and Opportunities
 
Factors Influencing Co-Creation of Open Education Resources Using Learning Ob...
Factors Influencing Co-Creation of Open Education Resources Using Learning Ob...Factors Influencing Co-Creation of Open Education Resources Using Learning Ob...
Factors Influencing Co-Creation of Open Education Resources Using Learning Ob...
 
Discovering Insight from Scholarly Research Output in Higher Educational Inst...
Discovering Insight from Scholarly Research Output in Higher Educational Inst...Discovering Insight from Scholarly Research Output in Higher Educational Inst...
Discovering Insight from Scholarly Research Output in Higher Educational Inst...
 
DRGS OJS Training: Electronic Publishing Using Open Journal Systems
DRGS OJS Training: Electronic Publishing Using Open Journal SystemsDRGS OJS Training: Electronic Publishing Using Open Journal Systems
DRGS OJS Training: Electronic Publishing Using Open Journal Systems
 
OJS Training: Users and User Roles
OJS Training: Users and User RolesOJS Training: Users and User Roles
OJS Training: Users and User Roles
 
OJS Training: Journal Settings and Configuration
OJS Training: Journal Settings and ConfigurationOJS Training: Journal Settings and Configuration
OJS Training: Journal Settings and Configuration
 
OJS Training: Managing The Submission Process
OJS Training: Managing The Submission ProcessOJS Training: Managing The Submission Process
OJS Training: Managing The Submission Process
 
OJS Training: Creating and Managing Journal Issues
OJS Training: Creating and Managing Journal IssuesOJS Training: Creating and Managing Journal Issues
OJS Training: Creating and Managing Journal Issues
 
Improved Discoverability of Digital Objects in Institutional Repositories Usi...
Improved Discoverability of Digital Objects in Institutional Repositories Usi...Improved Discoverability of Digital Objects in Institutional Repositories Usi...
Improved Discoverability of Digital Objects in Institutional Repositories Usi...
 
Using Machine Learning Techniques for Solving Locally Relevant Problems
Using Machine Learning Techniques for Solving Locally Relevant ProblemsUsing Machine Learning Techniques for Solving Locally Relevant Problems
Using Machine Learning Techniques for Solving Locally Relevant Problems
 
Effective Ingestion of Digital Objects in Institutional Repositories Using Su...
Effective Ingestion of Digital Objects in Institutional Repositories Using Su...Effective Ingestion of Digital Objects in Institutional Repositories Using Su...
Effective Ingestion of Digital Objects in Institutional Repositories Using Su...
 
Institutional Repository Single Sources of Truth
Institutional Repository Single Sources of TruthInstitutional Repository Single Sources of Truth
Institutional Repository Single Sources of Truth
 
Improved Scholarly Communication Using Machine Learning
Improved Scholarly Communication Using Machine LearningImproved Scholarly Communication Using Machine Learning
Improved Scholarly Communication Using Machine Learning
 
Open Access Electronic Publishing for Increased Online Visibility: Tooling Ch...
Open Access Electronic Publishing for Increased Online Visibility: Tooling Ch...Open Access Electronic Publishing for Increased Online Visibility: Tooling Ch...
Open Access Electronic Publishing for Increased Online Visibility: Tooling Ch...
 
A Multi-Faceted Multi-Stakeholder Approach for Increased Visibility of ETDs i...
A Multi-Faceted Multi-Stakeholder Approach for Increased Visibility of ETDs i...A Multi-Faceted Multi-Stakeholder Approach for Increased Visibility of ETDs i...
A Multi-Faceted Multi-Stakeholder Approach for Increased Visibility of ETDs i...
 

Recently uploaded

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 

Fine-grained Scalability of Digital Library Services in the Cloud

  • 1. Fine-grained Scalability of Digital Library Services in the Cloud Lebeko Poulo, Lighton Phiri and Hussein Suleman Digital Libraries Laboratory Department of Computer Science University of Cape Town
  • 2. Research Overview Digital Libraries (DLs) and Digital Library Systems (DLSes) Research objectives Develop techniques for building scalable digital information management systems based on efficient and on-demand use of generic grid-based technologies Explore the use of existing cloud computing resources Research questions Can a typical DL architecture be layered over an on-demand paradigm such as cloud computing? Is there linear scalability with increasing data and service capacity needs?
  • 3. How Quickly Does Data Scale? Extent of data scalability Data growth rates estimated at 40% per year By 2020, data volumes will have grown to 44 times the 2009 size
  • 4. Scaling Digital Library Systems Key criteria for design/implementation of DLSes Scalability Preservation The promise of cloud computing proven many times Feasibility of migrating and hosting DLs evident Investigation of deep integration of DL services with cloud services required Investigate efficacy of DL cloud adoption Verify extent of unlimited scale Maximise potential for cloud-service-level scalability
  • 5. Prototype DLS - Design RQ #1—Can a typical DL architecture be layered over an on-demand paradigm? Prior work on potential architectural designs for utility clouds Emulation of parallel programming architectures Utility computing offers flexibility of multiple architectural models Potential architectures for scalable utility services Two architectural patterns adopted as basis for design of prototype architecture Proxy architectures Some aspects of Client-side architecture
  • 6. Prototype DLS - Architecture Browse Module Buckets Amazon S3 Amazon EBS Amazon SimpleDB Search Module Domains OAI-PMH Harvester Module Instance A Instance B Instance C Instance D Amazon EC2 REST API
  • 7. Prototype DLS - Services Web User Interface Browse Module Search Module OAI-PMH Harvester Module Two typical DL services, accessible via publicly available Light-weight process Web interface Browse module—enable access through gradual refinement Search module—enable access through search queries OAI-PMH endpoint used to ingest data into collections
  • 8. Prototype DLS - Application Server Instance A Instance B Instance C Instance D Amazon EC2 REST API Amazon Elastic Compute Cloud (EC2) to provide sizeable computing capacity 32-bit Ubuntu Amazon Machine Images (AMIs) Glassfish 3.1 Prototype DLS
  • 9. Prototype DLS - Data Storage Buckets Amazon S3 Amazon EBS Domains Amazon SimpleDB REST API Amazon Simple Storage Service (S3) for storage and retrieval of large numbers of data objects Amazon SimpleDB for querying stored structured data Amazon Elastic Block Store (EBS) to enable storage persistence of EC2 instances
  • 10. Evaluation - Experimental Design RQ #2—Is there linear scalability with increasing capacity needs? Goals Evaluate potential scalability advantages associated with cloud-based DLs Evaluation aspects Data/service scalability and load testing Workload Number of user requests, number of users and collection sizes Metrics Response time Factors EC2 instances, users, requests, collection size
  • 11. Evaluation - Experimental Setup Test dataset—NDLTD and NETD portals Ingested using OAI-PMH harvester module Execution environment All experimental test conducted on EC2 cloud infrastructure EC2 instance of type t1.micro used for server-side processing 32-bit Ubuntu Amazon Machine Image (AMI) configuration Apache JMeter used to simulate user requests All measurement results based on five-run averages
  • 12. Experiment #1 - Service Scalability Determine the time taken for browse and search service requests Assess impact due to variation of multiple server front-ends Methodology JMeter used to simulate 50 users for each Web service, ten times Web services hosted on four identical EC2 instances Experiments repeated at least five times for each service criteria Comparative analysis—browsing categories for browse service—by partitioning requests into blocks of 50
  • 13. Experiment #1 - Browse Service 1200 ms) 1000 (time Response 800 600 Number of requests 1 - 50 51 - 100 101 - 150 151 - 200 201 - 250 251 - 300 301 - 350 351 - 400 401 - 450 451 - 500 Browsing by title Browsing by date Browsing by author
  • 14. Experiment #1 - Browse Service (2) 1500 ms) (time 1200 Response 900 600 Timestamp 1356925111123 1356925111538 1356925111934 1356925112362 1356925112765 1356925113081 1356925113491 1356925113821 1356925114118 1356925114515 1356925114757 1356925115156 1356925115455 1356925115761 1356925116075 1356925116568 1356925116789 1356925117103 1356925117528 1356925117749 1356925118178 1356925118446 1356925118742 1356925119092 1356925119425 Browse by author Browse by date Browse by title
  • 15. Experiment #1 - Browse Service (3) ms) (block 600 Time/500 Number of requests 700 1 - 50 51 - 100 101 - 150 151 - 200 201 - 250 251 - 300 301 - 350 351 - 400 401 - 450 451 - 500 1 instance 2 instances 3 instances 4 instances
  • 16. Experiment #2 - Data Scalability Determine service performance for varying collection sizes for fixed number of servers Ascertain if application can cope with increasing data volumes in DL collections Methodology JMeter set up to simulate 50 users accessing a Web service ten times Fixed number of identical servers with collection sizes of 4k, 8k, 16k and 32k records Experiments repeated at least five times for each service Comparative analysis by partitioning requests into blocks of 50
  • 17. Experiment #2 - Browse Service 1100 ms) (time 1000 Response 900 800 Number of requests 1 - 50 51 - 100 101 - 150 151 - 200 201 - 250 251 - 300 301 - 350 351 - 400 401 - 450 451 - 500 4000 8000 16000 32000
  • 18. Experiment #3 - Load Testing Determine volume of requests application could process for increasing concurrent users Methodology JMeter set up to varying number of users accessing a Web service Fixed number of identical servers used Initially simulate five users, each accessing a Web service ten times Subsequent simulation of 20, 50, 100, 250 and 500 users Experiments repeated at least five times for each service
  • 19. Experiment #3 - All Services 1000 900 800 700 5 20 50 100 250 500 Number of users Response time (ms) Search operation Browse operation
  • 20. Conclusion Key findings Redesign of application architectural components to conform to cloud service architecture Results indicate that response times are not significantly affected by request complexity, collection size or request sequencing Noticeable time taken to connect to AWS—ramp up time Study Limitations Single EC2 instance type—t1.micro—used Cloud service vendor Experimental dataset size Query optimisation Synthetic load used
  • 21. Bibliography Hussein Suleman (2009). Utility-based High Performance Digital Library Systems. Pradeep Teregowda et al. (2010). Cloud Computing: A Digital Libraries Perspective. Pradeep Teregowda et al. (2010). CiteSeerx: A Cloud Perspective. Byung Chul Tak et al. (2011). To Move or Not to Move: The Economics of Cloud Computing. Jinesh Varia (2011). Architecting for The Cloud: Best Practices.
  • 22. Questions? Additional information http://dl.cs.uct.ac.za