From Local to Global

From local to global
How plista got its global infrastructure

Tobias Heintz
Core architecture @plista
thz@plista.com
@tobiasinfinity
50259516
Who am I?

• Data-driven platform for content and ad recommendations
• Part of
• Present in 17 markets worldwide
• Market leader in German speaking countries
• Works with 4000+ publishers
• spiegel.de, n-tv.de, sport1.de, etc.
Who is plista?

• Not a website!
• Collection of recommendation algorithms
• We have Collaborative Filtering, Semantic (Solr), Simple Most Clicked, etc.
• Latency is an issue
• Response time < 100ms
• Lots of asynchronous backend processing
• Recommendations are pre-calculated for many given combinations out of a huge vector space
• Actual delivery is only from cached results
• Real real-time results for a few special recommenders
A recommendation service

• High Performance Bus is pub/sub
• Frontend servers only talk to cache
• Recommendations are pre-calculated and cached
• Planet is our admin system
The plista system

• Timezones
• It’s always five o’clock somewhere!
• Currencies
• Language
• Laws
• Privacy
• Data locality
Challenges - organisational

• Latency
• Data consistency
• Availability
• Partition tolerance
• Cost
• Maintainability
Challenges - technical

Models for i18n
A
● No i18n at all
● All requests go to EMEA
● Total consistency
● Massive latency

● Clone the entire system
● No latency issues
● Double maintenance
● Too many services
Models for i18n
A
● No i18n at all
● Massive latency
B

Models for i18n
A
● No i18n at all
● Massive latency
B
C
● Minimum amount of servers
in APAC
● All backend processing in EMEA
● Very cost-efficient
● Still massive latency

● Some backend processing
in APAC
● As few roles duplicated
as possible
● Eventual consistency
● Just right
Models for i18n
A
● No i18n at all
● Massive latency
B
C
in APAC
D

● Some backend processing
in APAC
● As few roles duplicated
as possible
● Eventual consistency
● Just right
Models for i18n
A
● No i18n at all
● Massive latency
B
C
in APAC
D ✔

• Setup still basically the same
• Frontend servers configured exactly alike
• All backend processing still in EMEA
• APAC servers forward data
• Recommendations calculated in EMEA, then cached in APAC
• Each DC has own DB master
• With replicas in each other DC
• Consistency using SequenceDB
The plista system - i18nized

• SequenceDB as centralized source of truth for database Ids
• Ticket server à la Flickr
• Servers in the cloud for quick scaling
• Do it in the cloud: AWS EC2 + ELB
• Fully automatized through puppet
• Need to be able to spin up machines fast when something breaks
• Static DNS routing
• Eventual consistency
• Partition tolerance through statelessness
Pillars of i18n architecture

• Needed because we have multiple DB masters and IDs need to be unique
• Ticket server system based on Percona (MySQL fork)
• 3 masters in the same rack
• Stored procedure that increments a row in a table
• Distributed globally
• Master lives in EMEA DC
• Ids are buffered locally using Redis
• Clients never talk to Master, only Redis buffer
• Cronjob fills buffer
• Very failsafe
DB syncing: SequenceDB

DB syncing: SequenceDB (cont.)
• Based on ideas from Flickr: https://goo.gl/roFVXZ
• Alternative approach at pinterest: https://goo.gl/w7cgtP
• Use 64 bit ID and encode information about the datacenter into the ID

Battling latency: CDNs + Caching
• Most important issue for us: low latency!
• Caching layers built into architecture
• Recall architecture diagram
• CDN for all static data is a no-brainer
• static JS, images and video

DNS: Static over Geo
• GeoDNS means that a domain name is resolved to different IPs depending
on where the user is physically located
• Seems like logical choice; only one domain name could serve everything
• Has several issues
• No control over which user goes where
• Need to ensure that users of a site always have data regardless of their location
• Provide static DNS entries to publishers
• farm-de.plista.com, farm-au.plista.com, etc.

Partition tolerance
• Expect that your systems will break
• Some companies even go as far as to proactively do just that: https://goo.gl/CKzwSN
• Redundancy!
• Set up systems in multiple availability zones
• Stateless applications
• No dependency between applications
• No need for base data in background job engine, since everything is encoded in the task
• Provide fallback
• plista can deliver recommendations even when database goes down

• 2 datacenters
• 17 markets
• 600+ servers
• 15 MBit/s data rate between EMEA and APAC
• 80+ million impressions per day
Results

From Local to Global

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie From Local to Global

Ähnlich wie From Local to Global (20)

Mehr von AWS Germany

Mehr von AWS Germany (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

From Local to Global