Anna Liu - Associate Professor in Services Engineering, School of Computer Science and Engineering, University of NSW. Keynote presentation at the Australian Architecture Forum 2009.
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Architecting Cloud Applications - the essential checklist
1. Architecting Cloud Applications
- the essential checklist -
Anna Liu
Associate Professor in Services Engineering
School of Computer Science and Engineering
University of New South Wales
annaliu@cse.unsw.edu.au
2. Architect‟s Checklist
1. Remember the „Why‟
2. Know the platform architecture
3. Appreciate differences across cloud platforms
4. Acknowledge auto-scaling is not all magic
5. Design for eventual consistency
6. Don‟t ignore the network layer
7. Performance attributes = application profile +
platform availability + network latency
8. Plan for Monitoring and management
9. Understand Interoperability and standards
10. Believe in Cloud Computing is not just for the longtail
3. Architect‟s Checklist
1. Remember the „Why‟
2. Know the platform architecture
3. Appreciate differences across cloud platforms
4. Acknowledge auto-scaling is not all magic
5. Design for eventual consistency
6. Don‟t ignore the network layer
7. Performance attributes = application profile +
platform availability + network latency
8. Plan for Monitoring and management
9. Understand Interoperability and standards
10. Believe in Cloud Computing is not just for the longtail
4. Why Cloud Computing
• Economies of scale
• Pay per usage
• Handling Big Data
• Service Delivery platform
• Innovative, engaging user experience
• Realising Green IT initiatives
5. Architect‟s Checklist
1. Remember the „Why‟
2. Know the platform architecture
3. Appreciate differences across cloud platforms
4. Acknowledge auto-scaling is not all magic
5. Design for eventual consistency
6. Don‟t ignore the network layer
7. Performance attributes = application profile +
platform availability + network latency
8. Plan for Monitoring and management
9. Understand Interoperability and standards
10. Believe in Cloud Computing is not just for the longtail
7. Architect‟s Checklist
1. Remember the „Why‟
2. Know the platform architecture
3. Appreciate differences across cloud platforms
4. Acknowledge auto-scaling is not all magic
5. Design for eventual consistency
6. Don‟t ignore the network layer
7. Performance attributes = application profile +
platform availability + network latency
8. Plan for Monitoring and management
9. Understand Interoperability and standards
10. Believe in Cloud Computing is not just for the longtail
8. Different Platforms with
Different Target Audience
• Google App Engine
• Caters for web applications
• < 30 sec compute time
• PaaS shields you from lots of infrastructure complexity
• Microsoft Azure
• More general purpose
• optimised for .NET
• software plus services strategy caters to enterprise scenarios
• Amazon EC2/S3/SimpleDB
• Virtual compute, storage on demand,
• IaaS provides you with lots of flexibility
• Third party innovation on top to enhance application development
experience (eg. Red Hat/JBoss, MySQL, IBM Websphere, Appistry
etc)
9. Architect‟s Checklist
1. Remember the „Why‟
2. Know the platform architecture
3. Appreciate differences across cloud platforms
4. Acknowledge auto-scaling is not all magic
5. Design for eventual consistency
6. Don‟t ignore the network layer
7. Performance attributes = application profile +
platform availability + network latency
8. Plan for Monitoring and management
9. Understand Interoperability and standards
10. Believe in Cloud Computing is not just for the longtail
10. Auto scaling behind the scene
• Amazon EC2
• CloudWatch – view into VM instance server utilization details,
operational performance, disk reads and writes, network
• Elastic Load Balancer – distributes apps across EC2
instances, control request load-balancing across single or
multiple cloud sites, performs provisioning-related decisions
based on dynamic monitoring data reported by CloudWatch
• developers specify preconditions eg. average CPU utilisation
• Microsoft Azure
• Azure Fabric Controller (FC) – monitors, maintains and
provisions machines to host applications
• Web role, worker roles, instance number configurations
parameters
11. Auto scaling behind the scene
• Google App Engine
• Handles auto scaling and load balancing of
application services based on web traffic
• requests/task execution limited to 30 seconds
• Moved from Tomcat to Jetty to reduce memory
footprint (no need for session handler)
• Fault tolerance and persistence of stored data
through distributed replication
• GAE serves static web content, hence no
additional implementation to handle checkpointing
and replication to re-instantiate execution state of
processes
12. Architect‟s Checklist
1. Remember the „Why‟
2. Know the platform architecture
3. Appreciate differences across cloud platforms
4. Acknowledge auto-scaling is not all magic
5. Design for eventual consistency
6. Don‟t ignore the network layer
7. Performance attributes = application profile +
platform availability + network latency
8. Plan for Monitoring and management
9. Understand Interoperability and standards
10. Believe in Cloud Computing is not just for the longtail
13. ACID no more?
“Eventual Consistency
Amazon SimpleDB keeps multiple copies of each domain.
When data is written or updated (using PutAttributes,
DeleteAttributes, CreateDomain or DeleteDomain) and
Success is returned, all copies of the data are updated.
However, it takes time for the update to propagate to all
storage locations. The data will eventually be consistent, but
an immediate read might not show the change.
Consistency is usually reached within seconds, but a high
system load or network partition might increase this time.
Repeating a read after a short time should return the updated
data. “
- Amazon Developer Guide, 2007-11-07
14. CAP Theorem
• Three properties of shared-data systems
• Consistency: one update is made, all observers
are updated
• Availability: all database transactions should be
processed accurately and promptly
• Tolerance: tolerant to network Partitions
• CAP Theorem
• Only two properties can be achieved at any time
• Network partitions is given in distribute systems
• Have to pick one between consistency and
availability
15. Relational no more?
• Google App Engine‟s datastore:
• Select can be performed on one table only
• Intentionlly does not support Join
• Inefficient when queries span across machines
• Allows disks to fail without system failing
• Cannot easily port over existing enterprise relational DB
• Microsoft Azure:
• Retiring the previous SSDS (no transactional support then)
• Azure SQL Services to replace SSDS with relational features and Tx
• Amazon
• S3 for big storage scenario
• Have your own relational DB in the cloud!
• Interesting to investigate failover/scalability features here...
16. What does this mean?
• Data reorganisation/restructuring required
• Understand trade offs between design
(scalability versus portability/interoperability at
data layer)
• Shopping carts, reference data, vs transactional
data/updates, ACID vs BASE
• Data portability might be tough for a while
• I‟m revising my University lecture notes! So you
better re-architect your app and data!
17. Architect‟s Checklist
1. Remember the „Why‟
2. Know the platform architecture
3. Appreciate differences across cloud platforms
4. Acknowledge auto-scaling is not all magic
5. Design for eventual consistency
6. Don‟t ignore the network layer
7. Performance attributes = application profile +
platform availability + network latency
8. Plan for Monitoring and management
9. Understand Interoperability and standards
10. Believe in Cloud Computing is not just for the longtail
18. Experiment Setup
Azure Web
Amazon Web Google App Services
Services
WSDL WSDL WSDL
Interface :
HTTP
public Result InstantResponse(String value){
ST
T
RE
ES // Echo the receiving value back to client
/R
P/
AP // Test net response time
A
SO
SO }
public Result Read(String value){
// Retrieve data from DB based on the given
value
WSDL // Test DB read performance
}
public Result Create(String content){
// Persist given content into DB
Client Testing Application // Test DB write performance
}
20. Questions to ponder about
• This is a rather obvious conclusion
• My gmail sometimes tells me
“reconnecting in 5 sec...” and it‟s ok for me!
• Are the user base happy enough?
• Will our network improve?
• Situation particular bad for us Aussies...
• NBN discussion, population of 20mil not enough for vendors
to invest?
• Is it a matter of just dropping a container here?
• Is there a business case for Telstra?
21. Architect‟s Checklist
1. Remember the „Why‟
2. Know the platform architecture
3. Appreciate differences across cloud platforms
4. Acknowledge auto-scaling is not all magic
5. Design for eventual consistency
6. Don‟t ignore the network layer
7. Performance attributes = application profile +
platform characteristics + network latency
8. Plan for Monitoring and management
9. Understand Interoperability and standards
10. Believe in Cloud Computing is not just for the longtail
22. Types of Applications
Application Types Decision Dimensions
• Enterprise, Web applications • Application profile
• business apps with web front • Constraints and
end to maximise user reach
requirements on cloud
• Highly connected apps platform, resource models
• Web 2.0, CDN, social
networking, sensor network • Resource model -> cost
• Data intensive • Your business model (how
you make money out of
• massively parallel,
Hadoop/Map-Reduce the app you deploy on the
• Analysis yields potentially
cloud)
surprising results • saving cost or speed up
• Compute Intensive versus ability to connect,
build shared pool of meta-
• Financial risk calculations
data, discover surprising
• Compare to HPC?
results
23. Architect‟s Checklist
1. Remember the „Why‟
2. Know the platform architecture
3. Appreciate differences across cloud platforms
4. Acknowledge auto-scaling is not all magic
5. Design for eventual consistency
6. Don‟t ignore the network layer
7. Performance attributes = application profile +
platform characteristics+ network latency
8. Plan for Monitoring and management
9. Understand Interoperability and standards
10. Believe in Cloud Computing is not just for the longtail
24. Wide Area Distributed Systems
– the reality
• Scalability seems ok
• Relatively constant individual response time
despite larger request volume
• Availability is more of an issue?
• Design for occasional unavailability
• Plan for it
• Try catch, Retry logic, idempotent operations are
all still good!
25. Pressure Tests – App Engine
App Engine Storage Create Error Rate in Pressure Test(1024 Byte)
Round Type 1:30 4:30 7:30 10:30 13:30 Average All Req. Avg. Rate
Round 0 DB Err. 0 1 0 0 2 0.6
Sent Req. 900 857 891 900 900 889.6 900 98.84%
Round 1 DB Err. 0 4 0 0 0 0.8
Sent Req. 2699 2134 2242 2700 2700 2495 2700 92.41%
Round 2 DB Err. 0 0 4 0 8 2.4
Sent Req. 4500 4180 3873 4500 4032 4217 4500 93.71%
Round 3 DB Err. 3 0 0 8 3 2.8
Sent Req. 5403 5173 5681 5792 6065 5622.8 6300 89.25%
Round 4 DB Err. 0 0 0 6 3 1.8
Sent Req. 5572 8100 6611 4287 7111 6336.2 8100 78.22%
Round 5 DB Err. 2 3 0 4 1 2
Sent Req. 9235 9279 5561 9112 8275 8292.4 9900 83.76%
Overall DB Err. 5 8 4 18 17 10.4
Sent Req. 28309 29723 24859 27291 29083 27853 32400 85.97%
Err. Rate 0.02% 0.03% 0.02% 0.07% 0.06% 0.04%
google.appengine.api.datastore_errors:TransactionFailedError :
Too much contetion on these datastore entities.
500 Server Error
26. What‟s happening here?
• Throttling?
• Denial of service attack protection
mechanism?
• Should end user developers have access
to Configurable parameter for setting such
limit?
27. Pressure Test – Amazon SimpleDB
Amazon SimpleDB Create Error Rate in Pressure Test (1024 Byte)
Round Type 3:00 6:00 9:00 12:00 Average All Req. Avg. Rate
Round 0 DB Err. 0 0 0 0 0
Sent Req. 900 898 900 900 899.5 900 99.94%
Round 1 DB Err. 20 10 9 15 13.5
Sent Req. 2696 2700 2700 2699 2698.75 2700 99.95%
Round 2 DB Err. 4 7 7 7 6.25
Sent Req. 4367 4497 4485 3879 4307 4500 95.71%
Round 3 DB Err. 17 6 7 13 10.75
Sent Req. 5740 6193 6226 5795 5988.5 6300 95.06%
Round 4 DB Err. 13 2 3 13 7.75
Sent Req. 7081 8005 7896 7106 7522 8100 92.86%
Round 5 DB Err. 19 9 33 16 19.25
Sent Req. 8926 9694 7857 8195 8668 9900 87.56%
Overall DB Err. 73 34 59 64 57.5
Conn. Err. 29710 31987 30064 28574 30083.75 32400 92.85%
Err. Rate 0.25% 0.11% 0.20% 0.22% 0.19%
Amazon SimpleDB are currently unavailable
28.
29. Architect‟s Checklist
1. Remember the „Why‟
2. Know the platform architecture
3. Appreciate differences across cloud platforms
4. Acknowledge auto-scaling is not all magic
5. Design for eventual consistency
6. Don‟t ignore the network layer
7. Performance attributes = application profile +
platform availability + network latency
8. Plan for Monitoring and management
9. Understand Interoperability and standards
10. Believe in Cloud Computing is not just for the longtail
30.
31.
32.
33.
34.
35. Monitoring and Management
• Could be a lot better!
• We had to build a lot of monitoring code on our own
• Some cloud system status available, but not view into your application
health status
• Service Level Agreement issues
• Existing support caters for techies, developers
• Need dashboard view into business metric
• real time view into how application is running in the cloud
• Data point to have the commercial conversation with platform vendors
• Integration with existing enterprise monitoring capabilities?
36. Architect‟s Checklist
1. Remember the „Why‟
2. Know the platform architecture
3. Appreciate differences across cloud platforms
4. Acknowledge auto-scaling is not all magic
5. Design for eventual consistency
6. Don‟t ignore the network layer
7. Performance attributes = application profile +
platform availability + network latency
8. Plan for Monitoring and management
9. Understand Interoperability and standards
10. Believe in Cloud Computing is not just for the longtail
37. Standards and Interoperability
• Cloud Computing Interoperability Forum
(CCIF), OMG effort, The Open Group,
Open Cloud Manifesto...
• Is Standards THE solution?
• Competing standards? Timing? Design by
committee?
• In fact, does it make sense when cloud platform
architecture varies significantly?
• Individual services already surfaced on the internet
• Still want to orchestrate services within a long
running workflow, across/from different clouds
38. Internet Service Bus
• REST on .NET Service Bus
– Simple to implement for interop across different languages
– Less overhead packages
• SOAP on .NET Service Bus
– Only available for .NET Frameworks communications atm
– Other languages are not fully supported (Java can only
pass Access Control on .NET Service)
– More overhead packages when communicate between C#
and Java, than C# to C#
39. Architect‟s Checklist
1. Remember the „Why‟
2. Know the platform architecture
3. Appreciate differences across cloud platforms
4. Acknowledge auto-scaling is not all magic
5. Design for eventual consistency
6. Don‟t ignore the network layer
7. Performance attributes = application profile +
platform availability + network latency
8. Plan for Monitoring and management
9. Understand Interoperability and standards
10. Is Cloud Computing just for the longtail?
40. Impedance to Enterprise
Adoption of Cloud
• Security, Privacy law
• Ownership of data, data retention
• Portability, fear of vendor lock in
• Migration, integration with existing IT assets
• Values for startups does not necessarily apply to
enterprise
• Cost of initial capital investment is already spent
• Pay per use is not necessary a business benefit
41. Some Existing Efforts and
Solution Patterns
• Analyse risk profiles for your application portfolio
• Private cloud (trade off economies of scale?)
• „de-value data‟, „partitioning‟, segregation‟
• Enable user choice, „trust‟
• Integration/interoperability solutions
• Security – lots of technical solutions
• Cloud Security Alliance (CSA) for some guidance on
security issues
• Upcoming Research Collaboration with SEI CMU/US
DoD
42. Architect‟s Checklist
1. Remember the „Why‟
2. Know the platform architecture
3. Appreciate differences across cloud platforms
4. Acknowledge auto-scaling is not all magic
5. Design for eventual consistency
6. Don‟t ignore the network layer
7. Performance attributes = application profile +
platform availability + network latency
8. Plan for Monitoring and management
9. Understand Interoperability and standards
10. Is Cloud Computing just for the longtail?
43. Architect‟s Checklist
1. Remember the „Why‟
2. Know the platform architecture
3. Appreciate differences across cloud platforms
4. Acknowledge auto-scaling is not all magic
5. Design for eventual consistency
6. Don‟t ignore the network layer
7. Performance attributes = application profile +
platform availability + network latency
8. Plan for Monitoring and management
9. Understand Interoperability and standards
10. Believe in Cloud Computing is not just for the longtail
45. Getting Involved
• Collaboration with UNSW
• We are recruiting Research Fellows!
• Research residential for Architects
• Open House Lab
• Short term contract research, advisory services
• longer term linkage programs (ARC, NICTA, CRC)
• Blogs.unsw.edu.au/annaliu
46. Standing on the shoulders of
Giants
• UNSW Team
• Dr Helen Paik
• Mr Liang Zhao
• Mr Xiaomin Wu
• Mr Fei Teng
• Mr Jae Choi
• NICTA Team
• Dr Jenny Liu, Markus Lachat
• Dr Mark Staples
• Industry Advisory Team
• Mr Kevin Francis (Object Consulting)
• Dr Rajiv Ranjan (Smart Service CRC)
• Milinda Kotelawele (Longscale)