David Chen presents techniques for optimizing performance on Google App Engine. He discusses analyzing logs to identify inefficient requests and using AppStat to profile RPC calls. Key optimizations include batching and async RPCs, caching results, optimizing datastore usage by reducing indexes and entity size, and hosting dynamic content statically. Server-side caching and tasks are also recommended to improve control flow efficiency.
4. Introduction
• Google App Engine in 2007
• PaaS (Platform as a Service)
• Total Solution
• Scalable, Easy to use (really?)
• Automatic scaling and loading balance
Saturday, 1 June, 13
5. Introduction
• Google App Engine in 2007
• PaaS (Platform as a Service)
• Total Solution
• Scalable, Easy to use (really?)
• Automatic scaling and loading balance
Hint:我要打十個
Saturday, 1 June, 13
7. Google Cloud Platform Family
• App Engine becomes gateway to
• Google Computing Engine (as EC2)
• CloudSQL (as MySQL)
• Big Query (SQL for terabyte)
• Cloud Storage
• 2013 Google I/O
• Datastore Service
• Php Support ...
Saturday, 1 June, 13
8. Why We Use App Engine
• Really Easy to configure
• Really Easy to scale
• Never Few Failed
• Fully Python environment
• Real Reason:
• Too Lazy to learn a complex platform
• Cheap / Powerful / Easy if use it carefully
Saturday, 1 June, 13
9. Why We Use App Engine
• Really Easy to configure
• Really Easy to scale
• Never Few Failed
• Fully Python environment
• Real Reason:
• Too Lazy to learn a complex platform
• Cheap / Powerful / Easy if use it carefully
Saturday, 1 June, 13
21. Query
• select path, count(*) as freq, avg(cpm_usd) as avg_cost from log group by
path order by avg_cost desc limit 10
• select path, avg(ms), count(*) from log where group by path order by avg(ms)
desc limit 10
Saturday, 1 June, 13
22. Query
• select path, count(*) as freq, avg(cpm_usd) as avg_cost from log group by
path order by avg_cost desc limit 10
• select path, avg(ms), count(*) from log where group by path order by avg(ms)
desc limit 10
Saturday, 1 June, 13
28. What AppStat can tell
• Is your application making unnecessary RPC calls?
• Should it be caching data instead of making repeated RPC calls to get the
same data?
• Will your application perform better if multiple requests are executed in
parallel rather than serially?
Saturday, 1 June, 13
34. Example: Async RPC
Hint: Read async need to control the flow carefully. fire the rpc as soon as
possible, get the result as late as possible.
Saturday, 1 June, 13
40. Google App Engine Datastore
• Schemaless (NoSql) for large scale
• Need to manually configure index
• Support limited SQL command, but have different behavior
• offset
• No inequality on more than one property
• ...
• Support MapReduce (but expansive..)
Saturday, 1 June, 13
49. Redefined Model with “indexed=False”
• Get:
1 read ops = $0.07 / 100k
• Insert:
2 = 2 write ops = $0.2 / 100k
• Cannot query with property
with indexed=False
2x
Hint: know how you wanna query your data
before define the model
Saturday, 1 June, 13
52. Entity Size
Hint: Entity size won’t affect the cost (and won’t affect the performance)
Saturday, 1 June, 13
53. Datastore Hints
• Make Table BIG!
• but only index if it is necessary
• Find Alternative Solution:
• CloudSQL + Cache
• query index and tree in memory
Saturday, 1 June, 13
55. More Design Principle
• Denormalize is better than normalize
• Think about real user case
• MxM or Mxn or nxn?
• More read or more write?
• immutable data?
• relation or duplicate?
• Get or Query?
• ...
Zen of datastore
Saturday, 1 June, 13
56. Google App Engine - NDB
• Automatic caching
• In-Context Cache
• write through - Memcache Cache
• The StructuredProperty class, which allows entities to have nested structure
• Asynchronous APIs which allow concurrent actions (and "synchronous" APIs
if you don't need that)
• Watch Out: Different Async Behavior
Hint: use ndb
Saturday, 1 June, 13