1. Architecture:
Surviving the High Load
.
пятница, 6 мая 2011 г.
2. Who we are ?
Alexander Chinaryov
Lead Platform Developer
Since 2007
Alexander Hristoforov
Lead Platform Developer
Since 2009
Oleg Anastasyev
Lead Platform Developer
Since 2007
пятница, 6 мая 2011 г.
6. Load: Balance
• LVS
• One-cluster
– Weighted RR
– Pluggable Failure detectors
– Integrated with one-remote-service
– Locality groups
пятница, 6 мая 2011 г.
7. Arch: Presentation
• Apache Tomcat 6
• RDK framework:
– GUI components
– Independant portlets
– AJAX update → no full page
– No javascript required
• Google Web Toolkit for Dynamics
– Toolbar, Photo pins, gifts
• Flash (Apps, players, ads)
пятница, 6 мая 2011 г.
8. Arch: Business Logic
• Odnoklassniki-ejb
– JBoss 4.2
– JTA, Stateless, Entity beans (BMP)
– Business Op handling & orchestration
– Event/handler pattern
– Component logic
– Data partitioning
– Spring (DI)
пятница, 6 мая 2011 г.
9. Arch: Business Srvcs
• IM, discussions, feeds
– JBoss Remoting 2.2
– One remote service
– 100k+ req/sec on recent 8 core CPU
/**
* Ex. of Remote server
*/
public interface Server extends RemoteService
{
@RemoteMethod
IListChunk<Friend> getFreshMyFriends(@PartitionSource long userId, IChunkProperties cp);
@RemoteMethod(invokeAll=true,split=true,reduceStrategy=ListReduceStrategy.class)
List<?> mapReduceMethod(@PartitionSource long userId, ... );
@RemoteMethod(invokeAll=true,asyncMaxDelay=1000L,asyncMaxBatch=100)
void asyncNotify(@PartitionSource long userId, ... );
}
пятница, 6 мая 2011 г.
10. Arch: Caches
• one-graph
– Social graph storage
– 30Gb, 17K ops/server 7%CPU …
• Odnoklassniki-cache
– users, groups, photos,sessions...
– Smart
– Off heap (Unsafe) → no FGC
• Near cache
пятница, 6 мая 2011 г.
11. Arch: Persistance
• MS SQL 2005
– High Consistency
– Flexible queries
• NoSQL: one-db
– Berkley 4.5 C edition +
– JBoss remoting based server +
– Simple querying =
– noSQL storage server
• … and others are in research
пятница, 6 мая 2011 г.
12. Concept: DB Partitioning
• DB scaling is hard & expensive
• Vertical
• Horizontal
• ID:
– long ID = uid << 8 + domain
– Domain = 0..255
– Domain → servers map
пятница, 6 мая 2011 г.
13. Perf : SQL DB
• XA → local TA only
• Dirty reads
• DB JOIN → app server memory
• FK, SP, Triggers
• DELETE :
– No delete/insert workflow → update
– Async batch process, retry
• Indexes, clustered indexes
пятница, 6 мая 2011 г.
14. Perf: general
• Seq Access speed:
– RAM 10x > SSD 1.5x > 1Gbit eth comm 2x > disk
• Random Access speed:
– RAM 20000x (~50ns) > SSD 5-10x > disk (~5ms)
– Net roundtrip ~ 0.5 ms
• So:
– Near data/cache – fastest solution ( cache coherence problem )
– Partitioned network cache
– Database access is the slowest thing
• Still you have to sacrifice consistency
пятница, 6 мая 2011 г.
15. Surviving : GC
• Young GC → high CPU load
– Too much garbage (autoboxing, overlooked log.debug,...)
– FIX: find and fix code → can take weeks
• Old GC → pauses → carousel
– 2-4Gb is limit for ParallelGC ( 1-4 secs )
– 8-10 Gb is limit for CMS
• and it still can stop the world!
– FIX: use Unsafe (offheap memory) or partition
• Perm GC → pauses → carousel again
– Too much .classes
– FIX: +CMSClassUnloadingEnable
пятница, 6 мая 2011 г.
16. Surviving: failures
• SQL partition failure
– FIX: fault tolerance: read incomplete, write
fail
• One-db
– Non stable replication → no fix :-(
– Data corruption → separate ids storage
– Random disk access → SSD, tmpfs
пятница, 6 мая 2011 г.
17. Surviving: carousel
• Reasons:
– Net problems
– Unusual activity, spammers
– Full GCs
– Cold caches
– Unexpected slowdowns, bugs
– Activity growth
• Fixes:
– Timeout = 3s
– Client side automatic fail detectors, server cutout
– Gatekeepers
пятница, 6 мая 2011 г.
18. Surviving: gatekeepers
• Fine grain func switches
• Used for:
– Fighting with carousel
– Smooth new functions launch
– Experiments
• Can:
– Turn on/off specific func, individual 3rd party games
– On per server basis
– On per user domain
пятница, 6 мая 2011 г.
20. Thank you
Questions ?
We are hiring
jobs@forticom.com
пятница, 6 мая 2011 г.
21. Test yourself ;-)
• PhotoMarks table
PhotoId:long UserId:long Mark:byte timestamp
– 32p x (500M rows, 42 Gb data + 25 Gb index)
– Load (photoId, userId): 14kops, create: 1500kpos
– Most load calls are check for row absence
• Rejected apriori
– Add more SQL nodes – too expensive
– Place all marks to cache – 2600Gb RAM is not cheap as well
пятница, 6 мая 2011 г.