Presentations from Criteo Labs’ Infrastructure team with a guest speakers from Yandex.
• FastTrack: scaling customer integration
• Evolution of data structures in Yandex.Metrica
• Don't take your software for granted
• Evolution of analytics at Criteo
The solution is based on the existing process handling of all user events. Each time a user sees a product or put a product in his basket, Criteo will receive an event and store it in order to display a relevant ad later on. For performance purpose, tracker servers call a “lazy refreshing” memcache, and not directly the real products, which are stored on a Couchbase cluster.
We needed to plug our solution just after Criteo received the event.
Step 1: Audit the tracker events
Check the event for mandatory parameters, parameters format, check if the event is related to one or several products, check whether these products actually exist in our system (this can be missed due to incomplete product feed or just because the advertiser passed us a wrong product id)…
Step 2: Send this audit event to Kafka, the famous Apache messaging system, where a global scale mirroring system have been set up, allowing us to aggregate data from all around the world.
Step 3: Consume Kafka from Druid, which is a column-oriented distributed data store built on a delta architecture, allowing use to do sub-second query on the huge amount of metrics we needed to compute.
Step 2: Send this audit event to Kafka, the famous Apache messaging system, where a global scale mirroring system have been set up, allowing us to aggregate data from all around the world.
Step 3: Consume Kafka from Druid, which is a column-oriented distributed data store built on a delta architecture, allowing use to do sub-second query on the huge amount of metrics we needed to compute.
WebScale write code to ensure the sustainability and the maintenability of the Criteo real time platform
We are 12 and we spend a good chunk of our time looking at performance ploblem
Some of the performance problem comes from the change of your traffic pattern,
There is no killer like a giant planetary sales and Kevin will talk about the way we prepare for that.
Significant increase of traffic over a few days
Release freeze
Teams rush to release features before the release freeze. The platform becomes actually more unstable than ordinary. It is critical to find the issues and fix them before black friday.
Monitoring deviant machines across the datacenters
Spotting isolated abnormally behaving servers
Proactively diagnose and fix the issues before they spread to the DC