2. About me
● Originally from Russia, live in Berlin
● Background: Databases
● Started as Software Engineer
● Masters in TU Berlin
● Data Science since 2013
● Book: Mastering Java for Data Science
● Kaggler in the past
● Now: Data Scientist at OLX
“ML Engineer”
4. OLX
● Tech hubs:
○ Poznan, Lisbon, Berlin
○ Buenos Aires, Delhi
● 30+ countries
● 200M MAU
● 10M sellers & 30M buyers
● 30M new listings per month
● Live listings:
○ OLX.PL (18M listings)
○ OLX.UA (12.5M listings)
○ OLX.IN (3M listings)
5. Apollo at OLX
Image uploads per day
● eu-west-1: up to 6.5 m
○ PL: 3 mln
○ UA: 1 mln
● ap-southeast-1: 2.2 m
○ IN: 0.7 mln
● eu-central-1: 1.6 m
● us-west-1: 55k
● total: ~10 m per day
6. Apollo X - the metadata service
● Data about images:
○ Category: what is on the image
○ Image quality: is the image good
● How to extract metadata?
● With machine learning!
9. Sagemaker vs Self-hosted
Sagemaker
● Sagemaker is great for model training and testing in prod
● But it’s expensive for serving millions of requests
● Hard to monitor
● Hard to make Ops happy
Self-hosted (Kubernetes)
● Cheaper
● Fits existing infra
● Metrics, Logs, Alerts
● SREs know how to deal with issues