These are the slides from my presentation at CLOUDCOMP 2009 on AppScale, an open source platform for running Google App Engine apps on. See our project home page at http://appscale.cs.ucsb.edu or our code page at http://code.google.com/p/appscale
3. Terminology
Software-as-a-Service (SaaS)
e.g., SalesForce, Gmail
Provides remote application access
Platform-as-a-Service (PaaS)
e.g., Google App Engine
Provides scalable runtime stack
Infrastructure-as-a-Service (IaaS)
e.g., Amazon Web Services
Provides full system images
4. • Open-source, Platform-as-a-Service for research
and engineering of cloud computing components,
applications, and services
• Automated deployment of applications to high-
performance databases
• Fine grain control over application environment
• Google App Engine apps hosting on your cluster
– Real applications
– Familiar API (that is extensible for lock-in avoidance)
– Your data and code on your resources
5. From Google App Engine (GAE)
to AppScale
• GAE Application Programming Interface
– Datastore (get/put)
– Memcache
– URL Fetching
– Mail
– Images
– Authentication
• Write Python/Java GAE app
– Use SDK locally to test and generate indexes
• APIs implemented as non-scalable, simple versions
6. From Google App Engine (GAE)
to AppScale
• GAE Application Programming Interface
– Datastore (get/put) BigTable
– Memcache Memcached
– URL Fetching
– Mail GMail
– Images
– Authentication Google Accounts
• Write Python/Java GAE app
– Use SDK locally to test and generate indexes
• APIs implemented as non-scalable, simple versions
– Upload to Google resources
• Highly scalable API implementation
7. Sandboxed Runtime
• Restricted subset of library calls
• No reading/writing from/to file system
• Data persistence only via get/put interface
• Computation bounded: 30 secs per request
• Access web services over via HTTP / HTTPS
only (ports 80 and 443)
8. Recent GAE Additions
• Python and JVM SDKs
– JRuby, Clojure, etc. available through Java
• Task Queue, Cron, XMPP APIs
• New SLAs for paying customers
– $0.10 per CPU core hour
– $0.10 per GB bandwidth in
– $0.12 per GB bandwidth out
– $0.15 per GB data stored per month
9. Protocol Buffers
• Google App Engine’s internal data format
– And AppScale’s
• Similar to C-style structs:
message Person {
required int32 id = 1;
optional string name = 2;
}
10. From Google App Engine (GAE)
to AppScale
• AppScale extends the GAE SDK
– Replaces the simple, non-scalable API implementation
with pluggable, distributed, scalable components
• Using open-source solutions as available/possible
• Communication over SSL
• Available as source and as system image
– Each instance can implement any component
• Self configuring as part of AppScale cloud deployment
– Deploys over
• Virtual machine monitors (Xen, KVM)
• Infrastructure (IaaS) cloud layers
11. IaaS Cloud Systems
• Amazon Web Services (AWS)
– Elastic Compute Cloud (EC2), Persistent Storage (S3, EBS)
– For-fee, as negotiated in SLA (CPU, network, storage)
– Vast resources available
• Users access small (opaque) subset, can scale-out
• Eucalyptus
– Open source implementation of the AWS APIs
– Inspiration for AppScale – familiar, widely-used API
implementation for execution on your cluster
• Limited only by the hardware you have available
12. Differences in AppScale
Deployment Options
• Xen / KVM:
– Static deployment
• Can use as many nodes as are manually configured
• Eucalyptus / EC2
– Dynamic deployment
• Can use as many nodes as the system can support (or pay for
for EC2 deployment)
– As part of ongoing/future work: support for dynamic scaling
• Front-end (user-facing) & back-end (data managment & computation)
• SLA renegotiation
13. AppScale System Layout
• AppLoadBalancer (ALB)
• AppServer (AS)
• Database Master/Slave/Peer (DB M/S/P)
GAE App AppScale
DB M/P
Developer tools
ALB
(AppScale
Admin) App
DB S/P
Controller
GAE App
GAE App
GAE App AS
Users
Users
Users HTTPS
14. AppController (AC)
• SOAP Server written in Ruby
– Runs on all nodes
• Middleware layer
• Controls and sets up a node for use
– Sets up configuration files (data replication)
– Sets up firewall for security
• Master AC “heartbeats” all other nodes
– Collects performance info as well
15. AppLoadBalancer (ALB)
• Ruby on Rails application
• Handles authentication and routing of users
to AppServers
• Three copies are deployed via Mongrel
– Load balanced via nginx
16. Database Management
• Five databases currently available:
– HBase, Hypertable: Master / Slave
– Cassandra, Voldemort: Peer / Peer
– Clustered MySQL: Relational
• Two main components
– Protocol Buffer Server: Data access / storage
– User / App Server: Authentication
17. AppServer (AS)
• Modified Google App Engine SDK
• App requests internally are Protocol Buffers
– Forwards requests to PB Server
• Minimal request set:
– Put(id)
– Get(id)
– Query: Equivalent to get_all_in_table
– Delete(id)
– Count: Total number of items in database
– GetSchema
18. AppScale Tools
• Ruby scripts that initiate AppScale
deployment
– Initializes the first AppController for use
– Uploads AppEngine app
• Conceptually similar to Amazon AWS EC2
tools
– describe-instances
– upload-app: Introduce additional apps
– terminate-instances
19. Fault Tolerance
• System can survive the following failures:
– AppServer failure
– Database Slave failure
– Database Peer failure
– AppLoadBalancer failure *
– AppController failure *
20. Testing Methodology
• Load testing done via the Grinder
• Test specifics:
– Initially 3 users
– 3 users added every 5 seconds
– Done until 160 seconds have passed
• Each user navigates the page, performs
some scripted action
• Measured total transactions performed and
average response time
21. AppScale Evaluation Cluster
• Three Grinder nodes, four AppScale nodes
– One master, three slaves
– Virtualized via Xen
– Database: HBase (3x replication) 64 MB HDFS blocks
• PBServer via Thrift; stores entire protocol buffers
• Hardware
– Quad-core 2.66 GHz machines
– 8 GB of RAM
– Connected via Gigabit Ethernet
22. Applications Tested
• Tasks - a to-do list
– Read and write intensive (44 transactions per user)
• Cccwiki – allows users to edit web pages
– Read intensive, updates only (74 transactions per
user)
• Guestbook – allows users to post messages
– Retrieves ten most recent posts only (9 transactions
per user)
• Shell – provides an interactive Python shell
– Compute intensive (14 transactions per user)
26. Room for Improvement
• Current bottlenecks:
– Queries perform filtering server-side
– Filtering is done outside of the DB
– AppEngine, PB Server are single-threaded
– Entry point to some DBs is single-threaded
• Future work will address these problems
– Will also compare performance across DBs
– e.g., BigTable-like DBs vs. P2P DBs
27. Related Work
• AppDrop
– Proof-of-concept Rails app
• TyphoonAE
– Relatively new (alpha release)
– Runs MongoDB only
• Microsoft Azure
– Uses .NET as the platform
– Has a similar pricing model to AppEngine
28. AppScale Recap
• Distributed, multi-component system
– Deployed as a single system image (self
configuring)
• Static deployment over Xen/KVM
• Dynamic deployment over Eucalyptus/EC2
• Databases supported:
– HBase, Hypertable, MySQL, Cassandra,
Voldemort
• Fault-tolerant
29. AppScale Recap
• Open cloud research platform
– International user community
• Goals
– Easy to use and extend
– Automatic deployment of PaaS cloud and
GAE apps on resources other than Google’s
– Support real applications and users
• Experimentation and testing in real environments
• Current performance results are a baseline
30. Performance Improvements
• AppEngine now multi-process, load balanced
• PB Server now multi-threaded
• Storing data like Google for HBase and
Hypertable
– Three tables: Reference, Sort Ascending, Sort
Descending
31. Future Work
• Expand out of the web services domain
– Investigating opportunities in streaming
– Integrated MapReduce support for high-
performance computing (HPC)
– Co-locate AppEngines and use shared
memory
• Additional databases:
– MongoDB, Scalaris, CouchDB
32. Thanks!
• To the AppScale team!
– Co-lead Navraj Chohan
– Advisor Prof. Chandra Krintz
• To the open-source community
• To Google, NSF, and IBM for financial support
• To you all for coming out today
• Check us out on the web:
– http://appscale.cs.ucsb.edu