This document evaluates AppScale, an open-source platform that allows Google App Engine applications to run on distributed data stores like HBase, Cassandra, and MongoDB. It describes AppScale's features for porting applications between data stores and compares the performance of different data stores under light, medium, and heavy loads. While AppScale provides access to many data stores, each varies in how fully it implements the App Engine APIs. The document calls for expanding AppScale's services to new domains and integrating additional databases.
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
An evaluation of distributed datastores using AppScale Cloud Platform
1. An Evaluation of Distributed
Datastores Using
The AppScale Cloud
1
Platform
Presented By- Himanshu Ranjan Vaishnav
TE-42065 (Comp-I)
SEMINAR GUIDE - Prof. Mrs S. S. Sonawani
04/01/13
2. 2 What is AppScale?
AppScale is an open-source implementation of the Google App Engine
cloud platform.
AppScale is an extension of the non-scalable software development kit
that Google makes available for testing and debugging applications.
App-Scale currently supports HBase, Hypertable, Cassandra, Voldemort,
MongoDB, MemcacheDB, Scalaris, and MySQL Cluster datastores.
04/01/13
3. 3 What AppScale Does?
AppScale is a robust, open source implementation of the Google App
Engine APIs that executes over private virtualized cluster resources and
cloud infrastructures including Amazon Web Services and Eucalyptus.
Users can execute their existing Google App Engine applications over
AppScale without modification.
AppScale automates deployment and simplifies configuration of
datastores that implement the API and facilitates their comparison and
evaluation on end-to-end performance using real programs (Google App
Engine applications).
04/01/13
4. 4 AppScale Features
• More Choices of data Stores • MapReduce
• App Engine Portability
• Neptune Language • Fault Tolerance
04/01/13
And More
5. 5 Google App Engine
A software development platform
Platform-as-a-service (PaaS)
GAE Datastore
Big Table
A master/slave relationship
04/01/13
6. 6 Continue….
GAE Datastore API provides the following primitives:
For eg.
• Put (k, v): Add key k and value v to table; creating a table if needed
• Get (k): Return value associated with key k
• Delete (k): Remove key k and its value
• Query (q): Perform query q using the Google Query Language (GQL) on a
single table, returning a list of values
• Count (t): For a given query, returns the size of the list of values returned
04/01/13
7. 7 Google App Engine APIs
Blobstore API Users API
Channel API URL Fetch API
Datastore API XMPP API
Images API MapReduce Streaming API
Memcache API EC2 API
Namespace API
Task Queue API
04/01/13
8. 8 AppScale deployment
AS – App Server
ALB – App Load Balancer
DBS – Data Base Slave Peer
DBM – Data Base Master Peer 04/01/13
9. 9 Multi-tiered approach within AppScale
04/01/13
10. 10 Database Services
Protocol Buffer Server (PBServer)
User/App Server (UAServer)
Blobstore service
Monitoring Services
Neptune
04/01/13
12. 12 1. Cassandra
Facebook engineers designed, implemented, and released
A hybrid approach
Consistent
Written in the Java and exposes its API through the Thrift software
framework
Supports range queries
04/01/13
13. 13 2. HBase
Developed and released by PowerSet
An official Hadoop subproject
Employs a master-slave distributed architecture
Provides flexible column support
Written primarily in Java, with a small portion of the code base in C
HBase is deployed over the Hadoop Distributed File System (HDFS)
04/01/13
14. 14 3. Hypertable
Hypertable was developed by Zvents
Provide an open source version of Google’s BigTable
Written in C++
RangeServer
04/01/13
15. 15 4. MemcacheDB
Developed by Open source developer Steve Chu
Employs a master-slave approach
Runs with a single master node and multiple replica nodes
Written in C and uses Berkeley DB
04/01/13
16. 16 5. MongoDB
Developed and released by 10gen
Provide both the speed and scalability
Written in C++
Queries are performed using hashtable
04/01/13
17. 17 6. Voldemort
Developed by and currently in use internally at LinkedIn
Eventual consistency
More Developer friendly
Written in Java and exposes its API via Thrift
04/01/13
18. 18 7. MySQL
A well-known relational database
Employ MySQL Cluster
Provides concurrent access to the system
Written in C and C++
04/01/13
19. 19 EVALUATION
Load tables in all databases with 1000 items
Test specifics:
– On Each database put, get, delete, no-op performed
– Considered- light load: one thread, medium load: three concurrent thread,
heavy thread: nine concurrent thread
– Repeat each experiment 5 times
Executes this application in an AppScale cloud
Each node executes with 2 virtual processors, 10GB of disk(max), 4GB of
memory
04/01/13
21. 21 Limitations
Persistence Lake of retrieving the entire table
to run a query
Blobstore Max File Size
Not released the source code of
Datastore
the Java App Engine server
Task Queue
Mail
Follow a ”deploy on all nodes”
Limited distribution supported
04/01/13
22. 22 Future Work
Expand out of the web services domain
– Investigating opportunities in streaming
– Integrated MapReduce support for highperformance computing (HPC)
– Co-locate AppEngines and use shared memory
Additional databases:
– MongoDB, Scalaris, CouchDB
04/01/13
23. 23 Continue…
Extending AppScale with new services for
- large-scale data analytics
- data
- computation intensive tasks
Cloud-agnostic
Integration of mobile device
04/01/13
24. 24 CONCLUSION
Presents an open source implementation of the Google App Engine (GAE)
Datastore API with in a cloud platform called AppScale
The implementation unifies access to wide range of open source
distributed database technologies and automates their configuration and
deployment. However, each database differs in the degree to which it
implements the APIs.
04/01/13