SlideShare a Scribd company logo
1 of 41
Download to read offline
Bringing code to the data: from MySQL to
RocksDB for high volume searches
Ivan Kruglov | Senior Developer
ivan.kruglov@booking.com
Percona Live 2016 | Santa Clara, CA
Agenda
●  Problem domain
●  Evolution of search
●  Architecture
●  Results
●  Conclusion
Problem domain
Search at Booking.com
●  Input
●  Where – city, country,
region
●  When – check-in date
●  How long – check-out date
●  What – search options
(stars, price range, etc.)
●  Result
●  Available hotels
Inventory vs. Availability
●  Inventory is what hotels give Booking.com
●  hotel/room inventory
●  Availability = search + inventory
●  under which circumstances one can book this room and at what price
●  Availability >>> Inventory
[Booking.com] works with approximately 800,000 partners,
offering an average of 3 room types, 2+ rates, 30 different length
of stays across 365 arrival days, which yields something north of
52 billion price points at any given time.
http://www.forbes.com/sites/jonathansalembaskin/2015/09/24/booking-com-channels-its-inner-geek-toward-
engagement/#2dbc6f6326b2
Evolution of search
Normalized availability (pre 2011)
●  classical LAMP stack
●  P – stands for Perl
●  normalized availability
●  write optimized dataset
●  search request handled by single
worker
●  too much of computation complexity
●  large cities become unsearchable
Pre-computed availability (2011+)
●  materialized == de-normalized, flatten dataset
●  aim for constant time fetch
●  read (AV) and write (inv)
optimized datasets
Pre-computed availability (2011+)
●  materialized == de-normalized, flatten dataset
●  aim for constant time fetch
●  read (AV) and write (inv)
optimized datasets
●  single worker
●  as inventory grows still have
problems with big searches
Map-Reduced search (2014+)
●  parallelized search
●  multiple workers
●  multiple MR phases
●  search as service
●  a distributed service with
all good and bad sides
Map-Reduced search (2014+)
●  parallelized search
●  multiple workers
●  multiple MR phases
●  search as service
●  a distributed service with
all good and bad sides
●  world search ~20s
●  overheads
●  IPC, serialization
Don't Bring the Data to the Code, Bring the Code to the Data
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 25 ns
Main memory reference 100 ns
Compress 1K bytes with Snappy 3,000 ns
Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms
Read 4K randomly from SSD 150,000 ns 0.15 ms
Read 1 MB sequentially from memory 250,000 ns 0.25 ms
Round trip within same datacenter 500,000 ns 0.5 ms
Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms
Disk seek 10,000,000 ns 10 ms
Read 1 MB sequentially from disk 20,000,000 ns 20 ms
Send packet CA->Netherlands->CA 150,000,000 ns 150 ms
https://gist.github.com/jboner/2841832
Don't Bring the Data to the Code, Bring the Code to the Data
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 25 ns
Main memory reference 100 ns
Compress 1K bytes with Snappy 3,000 ns
Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms
Read 4K randomly from SSD 150,000 ns 0.15 ms
Read 1 MB sequentially from memory 250,000 ns 0.25 ms
Round trip within same datacenter 500,000 ns 0.5 ms
Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms
Disk seek 10,000,000 ns 10 ms
Read 1 MB sequentially from disk 20,000,000 ns 20 ms
Send packet CA->Netherlands->CA 150,000,000 ns 150 ms
https://gist.github.com/jboner/2841832
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 25 ns
Main memory reference 100 ns
Compress 1K bytes with Snappy 3,000 ns
Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms
Read 4K randomly from SSD 150,000 ns 0.15 ms
Read 1 MB sequentially from memory 250,000 ns 0.25 ms
Round trip within same datacenter 500,000 ns 0.5 ms
Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms
Disk seek 10,000,000 ns 10 ms
Read 1 MB sequentially from disk 20,000,000 ns 20 ms
Send packet CA->Netherlands->CA 150,000,000 ns 150 ms
https://gist.github.com/jboner/2841832
Don't Bring the Data to the Code, Bring the Code to the Data
Map-Reduce + local AV (2015+)
●  SmartAV – smart availability
●  combined MR search with
local database
Map-Reduce + local AV (2015+)
●  SmartAV – smart availability
●  combined MR search with
local database
●  keep data in RAM
●  change stack to Java
●  reduce constant factor
●  distance to point for 100K hotels
●  perl 0.4 s
●  java 0.04 s
●  use multithreading
●  smaller overheads than IPC
Architecture
search
materialization
search
replicas
partitions
Coordinator
●  acts as proxy
●  knows cluster state
●  query randomly chosen replica in all partitions
(scatter-gather)
●  retry if necessary
●  merge partial results into final result
replicas
partitions
Inverted indexes
●  dataset
| 0 | hello world |
| 1 | small world |
| 2 | goodbye world |
{
"hello" => [ 0 ],
"goodbye" => [ 2 ],
"small" => [ 1 ],
"world" => [ 0, 1, 2 ] # must be sorted
}
●  query
(hello OR goodbye) AND world
([ 0 ] OR [ 2 ]) AND [ 0, 1, 2]
merge
[ 0, 2 ]
●  indexes for ufi, country, region, district and more
Application server / database
●  filter
●  base on search criteria (stars, Wi-Fi, parking, etc.)
●  base on group matching (# of rooms and persons per room)
●  base on availability (check-in and check-out dates)
●  sort
●  price, distance, review score, etc.
●  top N
●  merge
Application server / database
●  data statically partitioned (modulo partitioning by hotel id)
●  hotel data
●  kept in RAM
●  not persisted – easy enough to fetch and rebuild
●  updated hourly
●  availability data
●  persisted
●  real-time updates
●  1
RocksDB
●  embedded key-value storage
●  LSM – log-structured merge-tree database
Why RocksDB?
●  needed embedded key-value storage
●  tried MapDB, Kyoto/Tokyo cabinet, leveldb
●  reason of choice
●  stable random read performance under random writes and compaction
(80% reads, 20% writes)
●  works on HDDs with ~1.5K updates per second
●  dataset fits in RAM (in-memory workload)
RocksDB use and configuration
●  RocksDB v3.13.1
●  JNI + custom patch
●  config is result of iterative try-and-
fail approach
●  optimized for read-latency
●  mmap reads
●  compress on app level
●  WriteBatchWithIndex for read-your-
own-writes
●  multiple smaller DBs instead of one
big
●  simplify purging old availability
config:
.setDisableDataSync(false)
.setWriteBufferSize(15 * SizeUnit.MB)
.setMaxOpenFiles(-1)
.setLevelCompactionDynamicLevelBytes(true)
.setMaxBytesForLevelBase(160 * SizeUnit.MB)
.setMaxBytesForLevelMultiplier(10)
.setTargetFileSizeBase(15 * SizeUnit.MB)
.setAllowMmapReads(true)
.setMemTableConfig(newHashSkipListMemTableConfig())
.setMaxBackgroundCompactions(1)
.useFixedLengthPrefixExtractor(8)
.setTableFormatConfig(new PlainTableConfig()
.setKeySize(8)
.setStoreIndexInFile(true)
.setIndexSparseness(8));
materialization
Materialized availability queue
●  no replication between nodes
●  simplify architecture
●  calculate once
●  simplify app logic
●  no need to re-implement logic
Node consistency
●  eventually consistent
●  naturally fits business
●  rely on monitoring/alerting
●  quality checks
●  observer compares results
●  easy and fast to rebuild a
node
Results
Results
MR search
vs.
MR search + local AV + new tech. stack
●  Adriatic coast (~30K hotels)
●  before - 13s, after - 30ms
●  Rome (~6K hotels)
●  before 5s, after 20ms
●  Sofia (~0.3K hotels)
●  before 200ms, after - 10ms
Conclusion
Conclusion
1.  search on top of normalized dataset in MySQL
2.  search on top of pre-computed (flattened)
dataset in MySQL
3.  MR-search on top of pre-computed dataset in
MySQL
4.  MR-search on top of local dataset in RocksDB
(authoritative dataset in MySQL)
●  full rewrite, but conceptually a small step
●  locality matters
●  technology stack (constant factor) matters
Thank you!
ivan.kruglov@booking.com

More Related Content

What's hot

AWS 고객이 주로 겪는 운영 이슈에 대한 해법-AWS Summit Seoul 2017
AWS 고객이 주로 겪는 운영 이슈에 대한 해법-AWS Summit Seoul 2017AWS 고객이 주로 겪는 운영 이슈에 대한 해법-AWS Summit Seoul 2017
AWS 고객이 주로 겪는 운영 이슈에 대한 해법-AWS Summit Seoul 2017
Amazon Web Services Korea
 
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016
Amazon Web Services Korea
 

What's hot (20)

Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
AWS 고객이 주로 겪는 운영 이슈에 대한 해법-AWS Summit Seoul 2017
AWS 고객이 주로 겪는 운영 이슈에 대한 해법-AWS Summit Seoul 2017AWS 고객이 주로 겪는 운영 이슈에 대한 해법-AWS Summit Seoul 2017
AWS 고객이 주로 겪는 운영 이슈에 대한 해법-AWS Summit Seoul 2017
 
Dual write strategies for microservices
Dual write strategies for microservicesDual write strategies for microservices
Dual write strategies for microservices
 
쿠키런: 킹덤 대규모 인프라 및 서버 운영 사례 공유 [데브시스터즈 - 레벨 200] - 발표자: 용찬호, R&D 엔지니어, 데브시스터즈 ...
쿠키런: 킹덤 대규모 인프라 및 서버 운영 사례 공유 [데브시스터즈 - 레벨 200] - 발표자: 용찬호, R&D 엔지니어, 데브시스터즈 ...쿠키런: 킹덤 대규모 인프라 및 서버 운영 사례 공유 [데브시스터즈 - 레벨 200] - 발표자: 용찬호, R&D 엔지니어, 데브시스터즈 ...
쿠키런: 킹덤 대규모 인프라 및 서버 운영 사례 공유 [데브시스터즈 - 레벨 200] - 발표자: 용찬호, R&D 엔지니어, 데브시스터즈 ...
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger Internals
 
AWS로 게임의 공통 기능 개발하기! - 채민관, 김민석, 한준식 :: AWS Game Master 온라인 세미나 #2
AWS로 게임의 공통 기능 개발하기! - 채민관, 김민석, 한준식 :: AWS Game Master 온라인 세미나 #2AWS로 게임의 공통 기능 개발하기! - 채민관, 김민석, 한준식 :: AWS Game Master 온라인 세미나 #2
AWS로 게임의 공통 기능 개발하기! - 채민관, 김민석, 한준식 :: AWS Game Master 온라인 세미나 #2
 
Redis
RedisRedis
Redis
 
Sizing Your MongoDB Cluster
Sizing Your MongoDB ClusterSizing Your MongoDB Cluster
Sizing Your MongoDB Cluster
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
 
AWS를 활용하여 Daily Report 만들기 : 로그 수집부터 자동화된 분석까지
AWS를 활용하여 Daily Report 만들기 : 로그 수집부터 자동화된 분석까지AWS를 활용하여 Daily Report 만들기 : 로그 수집부터 자동화된 분석까지
AWS를 활용하여 Daily Report 만들기 : 로그 수집부터 자동화된 분석까지
 
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
 
ProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQL
 
Architecture Sustaining LINE Sticker services
Architecture Sustaining LINE Sticker servicesArchitecture Sustaining LINE Sticker services
Architecture Sustaining LINE Sticker services
 
MySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software TestMySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software Test
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
 
MariaDB 10: The Complete Tutorial
MariaDB 10: The Complete TutorialMariaDB 10: The Complete Tutorial
MariaDB 10: The Complete Tutorial
 
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016
 
MongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTigerMongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTiger
 
DynamodbDB Deep Dive
DynamodbDB Deep DiveDynamodbDB Deep Dive
DynamodbDB Deep Dive
 
Redis and its Scaling and Obersvability
Redis and its Scaling and ObersvabilityRedis and its Scaling and Obersvability
Redis and its Scaling and Obersvability
 

Similar to Bringing code to the data: from MySQL to RocksDB for high volume searches

Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache spark
Demi Ben-Ari
 

Similar to Bringing code to the data: from MySQL to RocksDB for high volume searches (20)

EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
 
EVCache & Moneta (GoSF)
EVCache & Moneta (GoSF)EVCache & Moneta (GoSF)
EVCache & Moneta (GoSF)
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
 
TiDB vs Aurora.pdf
TiDB vs Aurora.pdfTiDB vs Aurora.pdf
TiDB vs Aurora.pdf
 
Application Caching: The Hidden Microservice
Application Caching: The Hidden MicroserviceApplication Caching: The Hidden Microservice
Application Caching: The Hidden Microservice
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
 
AWS March 2016 Webinar Series - Managed Database Services on Amazon Web Services
AWS March 2016 Webinar Series - Managed Database Services on Amazon Web ServicesAWS March 2016 Webinar Series - Managed Database Services on Amazon Web Services
AWS March 2016 Webinar Series - Managed Database Services on Amazon Web Services
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
To Serverless and Beyond
To Serverless and BeyondTo Serverless and Beyond
To Serverless and Beyond
 
Seattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffersSeattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffers
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache spark
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский Дмитрий
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016
 
Solr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World OverSolr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World Over
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
 
Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)
 

More from Ivan Kruglov

Обратная сторона сервис-ориентированной архитектуры
Обратная сторона сервис-ориентированной архитектурыОбратная сторона сервис-ориентированной архитектуры
Обратная сторона сервис-ориентированной архитектуры
Ivan Kruglov
 
Тернии контейнеризованных приложений и микросервисов
Тернии контейнеризованных приложений и микросервисовТернии контейнеризованных приложений и микросервисов
Тернии контейнеризованных приложений и микросервисов
Ivan Kruglov
 
Service mesh для микросервисов
Service mesh для микросервисовService mesh для микросервисов
Service mesh для микросервисов
Ivan Kruglov
 
SOA: Строим свой service mesh
SOA: Строим свой service meshSOA: Строим свой service mesh
SOA: Строим свой service mesh
Ivan Kruglov
 
SOA: послать запрос на сервер? Что может быть проще?!
SOA: послать запрос на сервер? Что может быть проще?!SOA: послать запрос на сервер? Что может быть проще?!
SOA: послать запрос на сервер? Что может быть проще?!
Ivan Kruglov
 
Архитектура поиска в Booking.com
Архитектура поиска в Booking.comАрхитектура поиска в Booking.com
Архитектура поиска в Booking.com
Ivan Kruglov
 
Sereal and its tooling
Sereal and its toolingSereal and its tooling
Sereal and its tooling
Ivan Kruglov
 

More from Ivan Kruglov (16)

SRE: Site Reliability Engineering
SRE: Site Reliability EngineeringSRE: Site Reliability Engineering
SRE: Site Reliability Engineering
 
Blue-green & canary deployments
Blue-green & canary deploymentsBlue-green & canary deployments
Blue-green & canary deployments
 
Обратная сторона сервис-ориентированной архитектуры
Обратная сторона сервис-ориентированной архитектурыОбратная сторона сервис-ориентированной архитектуры
Обратная сторона сервис-ориентированной архитектуры
 
Kubernetes в Booking.com
Kubernetes в Booking.comKubernetes в Booking.com
Kubernetes в Booking.com
 
Тернии контейнеризованных приложений и микросервисов
Тернии контейнеризованных приложений и микросервисовТернии контейнеризованных приложений и микросервисов
Тернии контейнеризованных приложений и микросервисов
 
Introducing envoy-based service mesh at Booking.com
Introducing envoy-based service mesh at Booking.comIntroducing envoy-based service mesh at Booking.com
Introducing envoy-based service mesh at Booking.com
 
Service mesh для микросервисов
Service mesh для микросервисовService mesh для микросервисов
Service mesh для микросервисов
 
SOA: Строим свой service mesh
SOA: Строим свой service meshSOA: Строим свой service mesh
SOA: Строим свой service mesh
 
Solving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.comSolving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.com
 
Sereal: a view from inside
Sereal: a view from insideSereal: a view from inside
Sereal: a view from inside
 
SOA: послать запрос на сервер? Что может быть проще?!
SOA: послать запрос на сервер? Что может быть проще?!SOA: послать запрос на сервер? Что может быть проще?!
SOA: послать запрос на сервер? Что может быть проще?!
 
Мониторинг, когда не тестируешь
Мониторинг, когда не тестируешьМониторинг, когда не тестируешь
Мониторинг, когда не тестируешь
 
Архитектура поиска в Booking.com
Архитектура поиска в Booking.comАрхитектура поиска в Booking.com
Архитектура поиска в Booking.com
 
Processing JSON messages in highspeed
Processing JSON messages in highspeedProcessing JSON messages in highspeed
Processing JSON messages in highspeed
 
Optimize sereal
Optimize serealOptimize sereal
Optimize sereal
 
Sereal and its tooling
Sereal and its toolingSereal and its tooling
Sereal and its tooling
 

Recently uploaded

Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 

Recently uploaded (20)

Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 

Bringing code to the data: from MySQL to RocksDB for high volume searches

  • 1. Bringing code to the data: from MySQL to RocksDB for high volume searches Ivan Kruglov | Senior Developer ivan.kruglov@booking.com Percona Live 2016 | Santa Clara, CA
  • 2. Agenda ●  Problem domain ●  Evolution of search ●  Architecture ●  Results ●  Conclusion
  • 4. Search at Booking.com ●  Input ●  Where – city, country, region ●  When – check-in date ●  How long – check-out date ●  What – search options (stars, price range, etc.) ●  Result ●  Available hotels
  • 5. Inventory vs. Availability ●  Inventory is what hotels give Booking.com ●  hotel/room inventory ●  Availability = search + inventory ●  under which circumstances one can book this room and at what price ●  Availability >>> Inventory
  • 6. [Booking.com] works with approximately 800,000 partners, offering an average of 3 room types, 2+ rates, 30 different length of stays across 365 arrival days, which yields something north of 52 billion price points at any given time. http://www.forbes.com/sites/jonathansalembaskin/2015/09/24/booking-com-channels-its-inner-geek-toward- engagement/#2dbc6f6326b2
  • 8. Normalized availability (pre 2011) ●  classical LAMP stack ●  P – stands for Perl ●  normalized availability ●  write optimized dataset ●  search request handled by single worker ●  too much of computation complexity ●  large cities become unsearchable
  • 9. Pre-computed availability (2011+) ●  materialized == de-normalized, flatten dataset ●  aim for constant time fetch ●  read (AV) and write (inv) optimized datasets
  • 10. Pre-computed availability (2011+) ●  materialized == de-normalized, flatten dataset ●  aim for constant time fetch ●  read (AV) and write (inv) optimized datasets ●  single worker ●  as inventory grows still have problems with big searches
  • 11. Map-Reduced search (2014+) ●  parallelized search ●  multiple workers ●  multiple MR phases ●  search as service ●  a distributed service with all good and bad sides
  • 12. Map-Reduced search (2014+) ●  parallelized search ●  multiple workers ●  multiple MR phases ●  search as service ●  a distributed service with all good and bad sides ●  world search ~20s ●  overheads ●  IPC, serialization
  • 13. Don't Bring the Data to the Code, Bring the Code to the Data L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 25 ns Main memory reference 100 ns Compress 1K bytes with Snappy 3,000 ns Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms Read 4K randomly from SSD 150,000 ns 0.15 ms Read 1 MB sequentially from memory 250,000 ns 0.25 ms Round trip within same datacenter 500,000 ns 0.5 ms Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms Disk seek 10,000,000 ns 10 ms Read 1 MB sequentially from disk 20,000,000 ns 20 ms Send packet CA->Netherlands->CA 150,000,000 ns 150 ms https://gist.github.com/jboner/2841832
  • 14. Don't Bring the Data to the Code, Bring the Code to the Data L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 25 ns Main memory reference 100 ns Compress 1K bytes with Snappy 3,000 ns Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms Read 4K randomly from SSD 150,000 ns 0.15 ms Read 1 MB sequentially from memory 250,000 ns 0.25 ms Round trip within same datacenter 500,000 ns 0.5 ms Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms Disk seek 10,000,000 ns 10 ms Read 1 MB sequentially from disk 20,000,000 ns 20 ms Send packet CA->Netherlands->CA 150,000,000 ns 150 ms https://gist.github.com/jboner/2841832
  • 15. L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 25 ns Main memory reference 100 ns Compress 1K bytes with Snappy 3,000 ns Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms Read 4K randomly from SSD 150,000 ns 0.15 ms Read 1 MB sequentially from memory 250,000 ns 0.25 ms Round trip within same datacenter 500,000 ns 0.5 ms Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms Disk seek 10,000,000 ns 10 ms Read 1 MB sequentially from disk 20,000,000 ns 20 ms Send packet CA->Netherlands->CA 150,000,000 ns 150 ms https://gist.github.com/jboner/2841832 Don't Bring the Data to the Code, Bring the Code to the Data
  • 16. Map-Reduce + local AV (2015+) ●  SmartAV – smart availability ●  combined MR search with local database
  • 17. Map-Reduce + local AV (2015+) ●  SmartAV – smart availability ●  combined MR search with local database ●  keep data in RAM ●  change stack to Java ●  reduce constant factor ●  distance to point for 100K hotels ●  perl 0.4 s ●  java 0.04 s ●  use multithreading ●  smaller overheads than IPC
  • 19.
  • 23. Coordinator ●  acts as proxy ●  knows cluster state ●  query randomly chosen replica in all partitions (scatter-gather) ●  retry if necessary ●  merge partial results into final result
  • 25.
  • 26. Inverted indexes ●  dataset | 0 | hello world | | 1 | small world | | 2 | goodbye world | { "hello" => [ 0 ], "goodbye" => [ 2 ], "small" => [ 1 ], "world" => [ 0, 1, 2 ] # must be sorted } ●  query (hello OR goodbye) AND world ([ 0 ] OR [ 2 ]) AND [ 0, 1, 2] merge [ 0, 2 ] ●  indexes for ufi, country, region, district and more
  • 27.
  • 28. Application server / database ●  filter ●  base on search criteria (stars, Wi-Fi, parking, etc.) ●  base on group matching (# of rooms and persons per room) ●  base on availability (check-in and check-out dates) ●  sort ●  price, distance, review score, etc. ●  top N ●  merge
  • 29. Application server / database ●  data statically partitioned (modulo partitioning by hotel id) ●  hotel data ●  kept in RAM ●  not persisted – easy enough to fetch and rebuild ●  updated hourly ●  availability data ●  persisted ●  real-time updates ●  1
  • 30. RocksDB ●  embedded key-value storage ●  LSM – log-structured merge-tree database
  • 31. Why RocksDB? ●  needed embedded key-value storage ●  tried MapDB, Kyoto/Tokyo cabinet, leveldb ●  reason of choice ●  stable random read performance under random writes and compaction (80% reads, 20% writes) ●  works on HDDs with ~1.5K updates per second ●  dataset fits in RAM (in-memory workload)
  • 32. RocksDB use and configuration ●  RocksDB v3.13.1 ●  JNI + custom patch ●  config is result of iterative try-and- fail approach ●  optimized for read-latency ●  mmap reads ●  compress on app level ●  WriteBatchWithIndex for read-your- own-writes ●  multiple smaller DBs instead of one big ●  simplify purging old availability config: .setDisableDataSync(false) .setWriteBufferSize(15 * SizeUnit.MB) .setMaxOpenFiles(-1) .setLevelCompactionDynamicLevelBytes(true) .setMaxBytesForLevelBase(160 * SizeUnit.MB) .setMaxBytesForLevelMultiplier(10) .setTargetFileSizeBase(15 * SizeUnit.MB) .setAllowMmapReads(true) .setMemTableConfig(newHashSkipListMemTableConfig()) .setMaxBackgroundCompactions(1) .useFixedLengthPrefixExtractor(8) .setTableFormatConfig(new PlainTableConfig() .setKeySize(8) .setStoreIndexInFile(true) .setIndexSparseness(8));
  • 34.
  • 35. Materialized availability queue ●  no replication between nodes ●  simplify architecture ●  calculate once ●  simplify app logic ●  no need to re-implement logic
  • 36. Node consistency ●  eventually consistent ●  naturally fits business ●  rely on monitoring/alerting ●  quality checks ●  observer compares results ●  easy and fast to rebuild a node
  • 38. Results MR search vs. MR search + local AV + new tech. stack ●  Adriatic coast (~30K hotels) ●  before - 13s, after - 30ms ●  Rome (~6K hotels) ●  before 5s, after 20ms ●  Sofia (~0.3K hotels) ●  before 200ms, after - 10ms
  • 40. Conclusion 1.  search on top of normalized dataset in MySQL 2.  search on top of pre-computed (flattened) dataset in MySQL 3.  MR-search on top of pre-computed dataset in MySQL 4.  MR-search on top of local dataset in RocksDB (authoritative dataset in MySQL) ●  full rewrite, but conceptually a small step ●  locality matters ●  technology stack (constant factor) matters