Tata AIG General Insurance Company - Insurer Innovation Award 2024
2go ScaleConf 2012
1.
2. Introducing 2go
• Instant messenger for Java phones
• Send messages to friends
• Share photos and files
• Connect with other IM networks
• Meet others in chat rooms
• And more...
3. Why it’s popular
• Cheaper than SMS (1 cent/message)
• Fast on slow networks
• Designed for mobile phones
• It’s social!
4. The Beginning
• Founded by Alan Wolff, Ashley Peter
•6 months of coding, learning, not studying
• Clueless about ‘scaling’
•1 desktop PC as server
• Launched in March 2007
10. 2go Today
• Over 15 million users
• Users in over 150 countries
• Mostly in Africa (Nigeria, South Africa, Kenya)
• Fastest
rising Google search term in Nigeria and Kenya
in 2010
• 200 million messages per day
• 20 million logins per day
• 45 thousand signups per day
11. Why we’re at ScaleConf
• Raise awareness about scaling
• Not a common problem
• Not covered in formal education
• Learn from others
• Share our lessons with you
21. Traditional Websites Data
• Users interact with own data
• Users expect RAM speeds
• Easy to keep ‘hot’ data in RAM
• Normally 1-2% of users are concurrent
• Website grows, add more servers
22. Traditional Websites Data
John’s Sam’s
John Server Sam Server
Data Data
Mary’s Elina’s
Mary Server Elina Server
Data Data
23. Social Networks Data
John’s Data
Sara’s Data
James’ Data
John Server
Julie’s Data
...
Chris’ Data
24. Social Networks Data
John’s Data
John Server
Sara’s Data
Sara Server
James’ Data
Anni’s Data
James Server
...
Anni Server Chris’ Data
25. Social Networks Data
• Users have many (100+) friends
• Users interact with friends’ data
• Data access is geometric
26. Quick Example
• 600 users login per second
• Each user has 100 friends = 600*100 = 60k
• Get 60k users’ data (name, status, image) = 60k * 3
•= 180k objects per second
• Not possible on 1 or 2 servers
• Need 10+ DB servers!
27. Social Networks Data
• Users have many (100+) friends
• Users interact with friends’ data
• Data access is geometric
• Accessing180k objects in 1 second means
hitting many DB servers
• Difficult to keep ‘hot’ data in RAM
28. How we store & retrieve data
MySQL
persistent, disk based
Memcached
volatile, RAM based
29. Why do we use MySQL?
• Reliable
• Never had data corruption
• Simple
• Free
• Good, helpful community
• Widely used, well understood
30. How do we scale MySQL?
• Vertical Scaling:
• Disks, RAID
• Parallelism:
• Multiple connections to MySQL
• MyISAM (default) has table locking, use InnoDB for row
locking
• Replication (scales reads)
32. MySQL Replication
DB Master DB Master DB Slave1
500 reads/s
250 reads/s 250 reads/s
200 writes/s 200 writes/s 200 writes/s
33. MySQL Replication
DB Master DB Slave1 DB Slave2 DB Slave3
2 reads/s 2 reads/s 2 reads/s 2 reads/s
698 writes/s 698 writes/s 698 writes/s 698 writes/s
Write saturation
34. How do we scale MySQL?
• Parallelism:
• Multiple connections to MySQL
• MyISAM (default) has table locking, user InnoDB for row locking
• Replication (scales reads)
• Horizontal Scaling:
• Split data onto multiple masters. ‘sharding’. (scales writes)
• Bye bye relational DB. Joining data moves to application level
35. MySQL
The biggest issue:
MySQL stores data on disk.
Users expect RAM speeds.
We have a problem...
37. What is Memcached?
• Developed by Brad Fitzpatrick at
LiveJournal
• In-memory LRU distributed hash table
• ‘Hot’ data stored in the cache
• Manually managed cache
38. Why do we use Memcached?
• It’s fast
• Really.
• Alleviates DB load
• Distributed
• Low latency
• Also, it’s fast
39. Issues with Memcache:
• Manually managed
• Stale cache is bad
• Manage with caution
• Race conditions
• Serialise data
• Storing strings is inefficient
• Use binary protocol
• New connection overhead
• Use connection pools or UDP
• Multiget
40. Data Overview
query = SELECT name from Users WHERE userId
= 1234;
result = getFromMemcache(user_name_1234);
// Return cached result (FAST!)
if (result != null) return result;
// Get from DB (slow...)
result = getFromDB(query);
// Add to cache (fast next time!)
putInMemcache(user_name_1234, result);
return result;
42. Application Layer
• We use Java
• Works well
• Learn to tune JVM
• Different backend services
• Some services have multiple instances
• Services communicate with messaging protocol
43. How do we scale applications?
• Vertical Scaling:
• Multicore CPUs, faster cores, more RAM
• Parallelism:
• Multithreaded applications
• Connection pools
• Horizontal Scaling:
• Different services, split by functionality
• Some services have multiple instances
• Load balancing to instances via LVS
45. OS & network layers
• We use Linux
• Follow kernel developments
• Apply relevant patches
• Puppet
• Tune Linux and shell to handle many connections
• C10K problem
• Experiment with virtualization