Presentation "From Local to Global" from Tobias Heintz at the AWS E-Business Web Day for windows applications. All videos and presentations can be found here: http://amzn.to/2ds3aMX
5. • Data-driven platform for content and ad recommendations
• Part of
• Present in 17 markets worldwide
• Market leader in German speaking countries
• Works with 4000+ publishers
• spiegel.de, n-tv.de, sport1.de, etc.
Who is plista?
7. • Not a website!
• Collection of recommendation algorithms
• We have Collaborative Filtering, Semantic (Solr), Simple Most Clicked, etc.
• Latency is an issue
• Response time < 100ms
• Lots of asynchronous backend processing
• Recommendations are pre-calculated for many given combinations out of a huge vector space
• Actual delivery is only from cached results
• Real real-time results for a few special recommenders
A recommendation service
9. • High Performance Bus is pub/sub
• Frontend servers only talk to cache
• Recommendations are pre-calculated and cached
• Planet is our admin system
The plista system
10. • Timezones
• It’s always five o’clock somewhere!
• Currencies
• Language
• Laws
• Privacy
• Data locality
Challenges - organisational
11. • Timezones
• It’s always five o’clock somewhere!
• Currencies
• Language
• Laws
• Privacy
• Data locality
Challenges - organisational
13. Models for i18n
A
● No i18n at all
● All requests go to EMEA
● Total consistency
● Massive latency
14. ● Clone the entire system
● No latency issues
● Double maintenance
● Too many services
Models for i18n
A
● No i18n at all
● All requests go to EMEA
● Total consistency
● Massive latency
B
15. ● Clone the entire system
● No latency issues
● Double maintenance
● Too many services
Models for i18n
A
● No i18n at all
● All requests go to EMEA
● Total consistency
● Massive latency
B
C
● Minimum amount of servers
in APAC
● All backend processing in EMEA
● Very cost-efficient
● Still massive latency
16. ● Clone the entire system
● No latency issues
● Double maintenance
● Too many services
● Some backend processing
in APAC
● As few roles duplicated
as possible
● Eventual consistency
● Just right
Models for i18n
A
● No i18n at all
● All requests go to EMEA
● Total consistency
● Massive latency
B
C
● Minimum amount of servers
in APAC
● All backend processing in EMEA
● Very cost-efficient
● Still massive latency
D
17. ● Clone the entire system
● No latency issues
● Double maintenance
● Too many services
● Some backend processing
in APAC
● As few roles duplicated
as possible
● Eventual consistency
● Just right
Models for i18n
A
● No i18n at all
● All requests go to EMEA
● Total consistency
● Massive latency
B
C
● Minimum amount of servers
in APAC
● All backend processing in EMEA
● Very cost-efficient
● Still massive latency
D ✔
19. • Setup still basically the same
• Frontend servers configured exactly alike
• All backend processing still in EMEA
• APAC servers forward data
• Recommendations calculated in EMEA, then cached in APAC
• Each DC has own DB master
• With replicas in each other DC
• Consistency using SequenceDB
The plista system - i18nized
20. • SequenceDB as centralized source of truth for database Ids
• Ticket server à la Flickr
• Servers in the cloud for quick scaling
• Do it in the cloud: AWS EC2 + ELB
• Fully automatized through puppet
• Need to be able to spin up machines fast when something breaks
• Static DNS routing
• Eventual consistency
• Partition tolerance through statelessness
Pillars of i18n architecture
21. • Needed because we have multiple DB masters and IDs need to be unique
• Ticket server system based on Percona (MySQL fork)
• 3 masters in the same rack
• Stored procedure that increments a row in a table
• Distributed globally
• Master lives in EMEA DC
• Ids are buffered locally using Redis
• Clients never talk to Master, only Redis buffer
• Cronjob fills buffer
• Very failsafe
DB syncing: SequenceDB
22. DB syncing: SequenceDB (cont.)
• Based on ideas from Flickr: https://goo.gl/roFVXZ
• Alternative approach at pinterest: https://goo.gl/w7cgtP
• Use 64 bit ID and encode information about the datacenter into the ID
23. Battling latency: CDNs + Caching
• Most important issue for us: low latency!
• Caching layers built into architecture
• Recall architecture diagram
• CDN for all static data is a no-brainer
• static JS, images and video
24. DNS: Static over Geo
• GeoDNS means that a domain name is resolved to different IPs depending
on where the user is physically located
• Seems like logical choice; only one domain name could serve everything
• Has several issues
• No control over which user goes where
• Need to ensure that users of a site always have data regardless of their location
• Provide static DNS entries to publishers
• farm-de.plista.com, farm-au.plista.com, etc.
25. Partition tolerance
• Expect that your systems will break
• Some companies even go as far as to proactively do just that: https://goo.gl/CKzwSN
• Redundancy!
• Set up systems in multiple availability zones
• Stateless applications
• No dependency between applications
• No need for base data in background job engine, since everything is encoded in the task
• Provide fallback
• plista can deliver recommendations even when database goes down
26. • 2 datacenters
• 17 markets
• 600+ servers
• 15 MBit/s data rate between EMEA and APAC
• 80+ million impressions per day
Results