SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
Archmage, Pinterest’s
Real-time Analytics
Platform on Druid
October 2020
Jian Wang, Tech Lead, Pinterest
Jiaqi Gu, Software Engineer, Pinterest
1
3© 2020 Pinterest. All rights reserved.
Agenda
Motivation
Challenges
Use cases
Cluster stats
Architecture
Learnings
1
2
3
4
5
6
4© 2020 Pinterest. All rights reserved.
Motivation
● Cons of Hbase based precomputed key value look up system
○ Key value data model doesn’t fit into analytics query pattern
○ Cardinality explosion anytime a new column is added
○ Impossible to precompute all filter combinations
○ More work is needed on the application side to do aggregation
We want a better system as demand for Pinterest’s analytics
use cases increase...
Why do we replace Hbase with Druid for analytics use
cases?
Example key value model:
country=usa,device=iphone,gender=male,click=123
country=china,device=iphone,gender=female,click=456
country=japan,device=android,gender=male,click=789
country=usa,device=iphone,gender=female,click=135
5© 2020 Pinterest. All rights reserved.
Challenges
What are the unique challenges of onboarding to use
Druid in Pinterest?
● Clients expects low latency on par to key value store
○ Migrated from a Hbase based key value lookup backend, clients expects
latency to stay at lower 100ms while vanilla Druid only guarantees
subseconds/seconds latency
● Pinterest scale data volume
○ Largest batch use case: 300 TB with seconds SLA
○ Largest real time use case: 500k write QPS with SLA requirement of 500
query QPS and 200 ms p99
● Cost effective
○ We want the lowest cost for the best performance possible
6© 2020 Pinterest. All rights reserved.
Use cases
Many of company’s analytics use cases are powered by
Druid
● Partner and advertiser reporting
○ Provides stats on board/pins impressions, clicks, saves, etc.
7© 2020 Pinterest. All rights reserved.
Use cases
Many of company’s analytics use cases are powered by
Druid
● Realtime spam detection
○ Detects spamming events for user login and pin operations
8© 2020 Pinterest. All rights reserved.
Use cases
Many of company’s analytics use cases are powered by
Druid
● Partner and advertiser reporting
○ Stats on board/pins impressions, clicks, saves, etc.
● Realtime spam detection
○ Detects spamming events for user login and pin operations
● Experiment metrics
○ AB testing experiment metrics
● Ads delivery debugger
○ Debugging tool for Ads delivery status
● And many more ...
9© 2020 Pinterest. All rights reserved.
Cluster stats
We have both online and offline use clusters
● Biggest online use cluster
○ 200 r4.8x historical nodes hosting 32TB, 50 i3.2x hosting 100TB
○ QPS 250
○ Query P99 ranges from 100ms to ~1.5s depending on use cases
● Biggest offline use cluster
○ 160 i3en.2x historical nodes hosting 280TB
○ QPS < 1
○ P99 2s
10© 2020 Pinterest. All rights reserved.
Architecture
Batch ingestion
Real-time ingestion
Archmage
11© 2020 Pinterest. All rights reserved.
Architecture
Archmage
● Proxy service
○ A thrift service that acts as a proxy between clients and druid to ease
integration with other services in Pinterest
○ Handles druid service discovery by watching broker znode on Druid
zookeeper
○ Thrift to HTTP/HTTP to thrift request/response translation
○ Metrics reporting
○ Speculative execution
○ Query optimization and rewriting
○ Shadow cluster dark traffic routing
12© 2020 Pinterest. All rights reserved.
Architecture
Query
● Thrift API
○ Clients send a thrift request with a SQL field to Archmage who does the
forwarding to Druid
● UI
○ Individual clients’ use case specific UI
○ Internal UI with SQL editor tool for ad-hoc queries
○ Apache Superset for dashboarding
13© 2020 Pinterest. All rights reserved.
Architecture
Ingestion
● Batch ingestion
○ Hadoop: extracted library which bypassed Druid locking
○ Reads input from s3 and writes Druid segment files on s3
● Real time ingestion
○ Kafka: exactly-once-delivery
○ Evaluated push-based Tranquility library but deprecated
14© 2020 Pinterest. All rights reserved.
Learnings
Tiered setup
● Need disk access? Look for host types with good 4KB page size
random read IOPS
○ Disk is needed when segments are not accessed often or simply the data volume
is so large thus too expensive to have a full in memory setup
○ Druid uses mmap and abstracts a segment into a byte array. Only specific portion
of the byte array is loaded from disk (e.g., for a certain column) during query time
and the loading is done in 4KB pages which means a host type (excluding process
memory) with 256G RAM behaves pretty much the same as one with 1G RAM if 1)
the 4KB page size random read IOPS are the same 2) you expect scan different
segments for each query
○ For AWS, host types with on-instance SSD work the best: i3 > i3en >> other
instance types attaching an EBS disk
15© 2020 Pinterest. All rights reserved.
Learnings
Tiered setup
● Recent data? All in memory
○ Recent data is expected to be queried more often so we want to avoid
query time disk I/O by caching all data in page cache
○ Put most recent segments (e.g., last 3 months) into memory intensive
instance types with 1:1 RAM/disk ratio: r5.8x with attached EBS
○ Background threads in historical nodes to read segment files (equivalent
to `cat 0000.smoosh > /dev/null`) on server bootstrap and new segment
downloading to force OS to load into page cache to avoid query time on
demand loading
○ The exact period of “recent” is recommended to be figured out through
request analysis. Druid real time ingestion is a good choice.
16© 2020 Pinterest. All rights reserved.
Learnings
Middle managers
● Need as much intention in tuning as historical nodes
○ Monitor metrics on Kafka ingestion offset and timestamp lag
○ Increase intermediatePersistPeriod if you are sensitive to query latency
on middle managers
○ Use a custom partitioner on Kafka producer side to improve data locality
○ Use lateMessageRejectionPeriod and earlyMessageRejectionPeriod to
avoid scattered late and early events to create a lot of small segments
○ Reindexing (compaction) jobs
○ Be careful not to use Kafka transaction on producer side prior to Druid
0.15
17© 2020 Pinterest. All rights reserved.
Learnings
Group by queries
● Tail latency
○ Many are convertible to top N if you add a limit clause
○ Add a combined dimension if group by dimensions are more than 2 but
fixed
○ Enable push limit down to sacrifice some accuracy for performance
○ Enable parallel broker side merge
○ Limit number of rows to do group by if possible from the application side
○ Make sure you have enough merge buffers to not run out them
18© 2020 Pinterest. All rights reserved.
Learnings
Secondary dimension query time pruning other than time
● Cluster computing resource is limited
○ Each segment is processed in one processing thread whose number is
usually identical to number of cores
○ Cores are the expensive and are always fewer than number of segments
○ We should be cautious on which segments to scan for a query
● Shard specs with query time partition dimensions pruning
○ Batch ingestion
■ Hash based shard spec
■ Even size single dimension shard spec
○ Real time ingestion
■ Stream hash based shard spec
19© 2020 Pinterest. All rights reserved.
Learnings
Secondary dimension query time pruning other than time
● Shard specs with query time partition dimensions pruning
○ Batch ingestion
■ Hash based shard spec
● Worked well in most use cases
● Added missing query time pruning based on hashing and
partition dimensions
● However: skewed data which leads to skewed segment size,
long ingestion tail latency and query performance issue
■ Even size single dimension shard spec
20© 2020 Pinterest. All rights reserved.
Learnings
Secondary dimension query time pruning other than time
● Shard specs with query time partition dimensions pruning
○ Batch ingestion
■ Hash based shard spec
■ Even size single dimension shard spec
● Default single dimension shard spec will fit data for the same partition
dimension value into a single segment
● Added a custom partitioner to distribute data for skewed partition dimension
value to multiple segments
● Replaced the two very slow hadoop jobs (roll up input and calculate per
partition dimension value number of rows to decide partition) with reading
output from a SparkSQL job
21© 2020 Pinterest. All rights reserved.
Learnings
Secondary dimension query time pruning other than time
● Shard specs with query time partition dimensions pruning
○ Realtime
■ Stream hash based shard spec
● Real time ingestion defaults to use numbered shard spec which doesn’t have
metadata on what data is in it which means every query has all segment fanout,
making it very hard to support high query QPS
● The stream hashed shard spec is a real time version of batch Hash based shard
spec
● Let Kafka producer puts records to different kafka partition id based on:
hash(partition dimensions) % number of kafka partitions
● Cons: this approach doesn’t allow increasing kafka partitions which will lead to
incorrect results during the transition period
22© 2020 Pinterest. All rights reserved.
Learnings
Operation tips
● druid.broker.select.tier and druid.server.priority
○ Controls routing for dark reads, Druid config AB testing and no downtime deploy
23© 2020 Pinterest. All rights reserved.
Learnings
Operation tips
● skipCoordinatorRun
○ Use this runtime config when deploy/restart historical nodes to avoid coordinator
triggering unnecessary segments movements
● maxSegmentsInNodeLoadingQueue and maxSegmentsToMove
○ Segments are represented as children under a historical host znode
○ Load queue znodes not compressed
○ Be careful of hitting zk buffer limit (default to a few MBs) when loading a large
number of segments to a historical node
24© 2020 Pinterest. All rights reserved.
Time for questions
@Pinterest
25
Thank you!
Apache Druid is an independent project of The Apache Software Foundation. More information can be found at https://druid.apache.org.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
Dates: November 10, 2020
druidsummit.org
26
Register Now for
the Next Druid
Virtual Summit

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
 
Druid
DruidDruid
Druid
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of data
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Trino at linkedIn - 2021
Trino at linkedIn - 2021Trino at linkedIn - 2021
Trino at linkedIn - 2021
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportApache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
 
New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache Arrow
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
 

Ähnlich wie Archmage, Pinterest’s Real-time Analytics Platform on Druid

AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
Omid Vahdaty
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Omid Vahdaty
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
confluent
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Spark Summit
 

Ähnlich wie Archmage, Pinterest’s Real-time Analytics Platform on Druid (20)

AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Presto Apache BigData 2017
Presto Apache BigData 2017Presto Apache BigData 2017
Presto Apache BigData 2017
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data Platforms
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
RubiX
RubiXRubiX
RubiX
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
 
PyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applicationsPyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applications
 

Mehr von Imply

Zeotap: Data Modeling in Druid for Non temporal and Nested Data
Zeotap: Data Modeling in Druid for Non temporal and Nested DataZeotap: Data Modeling in Druid for Non temporal and Nested Data
Zeotap: Data Modeling in Druid for Non temporal and Nested Data
Imply
 

Mehr von Imply (17)

Pivot 2.0 - The next generation visualization tool for your streaming data
Pivot 2.0 - The next generation visualization tool for your streaming dataPivot 2.0 - The next generation visualization tool for your streaming data
Pivot 2.0 - The next generation visualization tool for your streaming data
 
Druid Adoption Tips and Tricks
Druid Adoption Tips and TricksDruid Adoption Tips and Tricks
Druid Adoption Tips and Tricks
 
Druid in Spot Instances
Druid in Spot InstancesDruid in Spot Instances
Druid in Spot Instances
 
Zeotap: Data Modeling in Druid for Non temporal and Nested Data
Zeotap: Data Modeling in Druid for Non temporal and Nested DataZeotap: Data Modeling in Druid for Non temporal and Nested Data
Zeotap: Data Modeling in Druid for Non temporal and Nested Data
 
Nielsen: Casting the Spell - Druid in Practice
Nielsen: Casting the Spell - Druid in PracticeNielsen: Casting the Spell - Druid in Practice
Nielsen: Casting the Spell - Druid in Practice
 
Building Data Applications with Apache Druid
Building Data Applications with Apache DruidBuilding Data Applications with Apache Druid
Building Data Applications with Apache Druid
 
Maximizing Apache Druid performance: Beyond the basics
Maximizing Apache Druid performance: Beyond the basicsMaximizing Apache Druid performance: Beyond the basics
Maximizing Apache Druid performance: Beyond the basics
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
 
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and BotsHow TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
 
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...
 
August meetup - All about Apache Druid
August meetup - All about Apache Druid August meetup - All about Apache Druid
August meetup - All about Apache Druid
 
Benchmarking Apache Druid
Benchmarking Apache DruidBenchmarking Apache Druid
Benchmarking Apache Druid
 
Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
 
What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18
 
Apache Druid Vision and Roadmap
Apache Druid Vision and RoadmapApache Druid Vision and Roadmap
Apache Druid Vision and Roadmap
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at Twitter
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Archmage, Pinterest’s Real-time Analytics Platform on Druid

  • 1. Archmage, Pinterest’s Real-time Analytics Platform on Druid October 2020 Jian Wang, Tech Lead, Pinterest Jiaqi Gu, Software Engineer, Pinterest 1
  • 2.
  • 3. 3© 2020 Pinterest. All rights reserved. Agenda Motivation Challenges Use cases Cluster stats Architecture Learnings 1 2 3 4 5 6
  • 4. 4© 2020 Pinterest. All rights reserved. Motivation ● Cons of Hbase based precomputed key value look up system ○ Key value data model doesn’t fit into analytics query pattern ○ Cardinality explosion anytime a new column is added ○ Impossible to precompute all filter combinations ○ More work is needed on the application side to do aggregation We want a better system as demand for Pinterest’s analytics use cases increase... Why do we replace Hbase with Druid for analytics use cases? Example key value model: country=usa,device=iphone,gender=male,click=123 country=china,device=iphone,gender=female,click=456 country=japan,device=android,gender=male,click=789 country=usa,device=iphone,gender=female,click=135
  • 5. 5© 2020 Pinterest. All rights reserved. Challenges What are the unique challenges of onboarding to use Druid in Pinterest? ● Clients expects low latency on par to key value store ○ Migrated from a Hbase based key value lookup backend, clients expects latency to stay at lower 100ms while vanilla Druid only guarantees subseconds/seconds latency ● Pinterest scale data volume ○ Largest batch use case: 300 TB with seconds SLA ○ Largest real time use case: 500k write QPS with SLA requirement of 500 query QPS and 200 ms p99 ● Cost effective ○ We want the lowest cost for the best performance possible
  • 6. 6© 2020 Pinterest. All rights reserved. Use cases Many of company’s analytics use cases are powered by Druid ● Partner and advertiser reporting ○ Provides stats on board/pins impressions, clicks, saves, etc.
  • 7. 7© 2020 Pinterest. All rights reserved. Use cases Many of company’s analytics use cases are powered by Druid ● Realtime spam detection ○ Detects spamming events for user login and pin operations
  • 8. 8© 2020 Pinterest. All rights reserved. Use cases Many of company’s analytics use cases are powered by Druid ● Partner and advertiser reporting ○ Stats on board/pins impressions, clicks, saves, etc. ● Realtime spam detection ○ Detects spamming events for user login and pin operations ● Experiment metrics ○ AB testing experiment metrics ● Ads delivery debugger ○ Debugging tool for Ads delivery status ● And many more ...
  • 9. 9© 2020 Pinterest. All rights reserved. Cluster stats We have both online and offline use clusters ● Biggest online use cluster ○ 200 r4.8x historical nodes hosting 32TB, 50 i3.2x hosting 100TB ○ QPS 250 ○ Query P99 ranges from 100ms to ~1.5s depending on use cases ● Biggest offline use cluster ○ 160 i3en.2x historical nodes hosting 280TB ○ QPS < 1 ○ P99 2s
  • 10. 10© 2020 Pinterest. All rights reserved. Architecture Batch ingestion Real-time ingestion Archmage
  • 11. 11© 2020 Pinterest. All rights reserved. Architecture Archmage ● Proxy service ○ A thrift service that acts as a proxy between clients and druid to ease integration with other services in Pinterest ○ Handles druid service discovery by watching broker znode on Druid zookeeper ○ Thrift to HTTP/HTTP to thrift request/response translation ○ Metrics reporting ○ Speculative execution ○ Query optimization and rewriting ○ Shadow cluster dark traffic routing
  • 12. 12© 2020 Pinterest. All rights reserved. Architecture Query ● Thrift API ○ Clients send a thrift request with a SQL field to Archmage who does the forwarding to Druid ● UI ○ Individual clients’ use case specific UI ○ Internal UI with SQL editor tool for ad-hoc queries ○ Apache Superset for dashboarding
  • 13. 13© 2020 Pinterest. All rights reserved. Architecture Ingestion ● Batch ingestion ○ Hadoop: extracted library which bypassed Druid locking ○ Reads input from s3 and writes Druid segment files on s3 ● Real time ingestion ○ Kafka: exactly-once-delivery ○ Evaluated push-based Tranquility library but deprecated
  • 14. 14© 2020 Pinterest. All rights reserved. Learnings Tiered setup ● Need disk access? Look for host types with good 4KB page size random read IOPS ○ Disk is needed when segments are not accessed often or simply the data volume is so large thus too expensive to have a full in memory setup ○ Druid uses mmap and abstracts a segment into a byte array. Only specific portion of the byte array is loaded from disk (e.g., for a certain column) during query time and the loading is done in 4KB pages which means a host type (excluding process memory) with 256G RAM behaves pretty much the same as one with 1G RAM if 1) the 4KB page size random read IOPS are the same 2) you expect scan different segments for each query ○ For AWS, host types with on-instance SSD work the best: i3 > i3en >> other instance types attaching an EBS disk
  • 15. 15© 2020 Pinterest. All rights reserved. Learnings Tiered setup ● Recent data? All in memory ○ Recent data is expected to be queried more often so we want to avoid query time disk I/O by caching all data in page cache ○ Put most recent segments (e.g., last 3 months) into memory intensive instance types with 1:1 RAM/disk ratio: r5.8x with attached EBS ○ Background threads in historical nodes to read segment files (equivalent to `cat 0000.smoosh > /dev/null`) on server bootstrap and new segment downloading to force OS to load into page cache to avoid query time on demand loading ○ The exact period of “recent” is recommended to be figured out through request analysis. Druid real time ingestion is a good choice.
  • 16. 16© 2020 Pinterest. All rights reserved. Learnings Middle managers ● Need as much intention in tuning as historical nodes ○ Monitor metrics on Kafka ingestion offset and timestamp lag ○ Increase intermediatePersistPeriod if you are sensitive to query latency on middle managers ○ Use a custom partitioner on Kafka producer side to improve data locality ○ Use lateMessageRejectionPeriod and earlyMessageRejectionPeriod to avoid scattered late and early events to create a lot of small segments ○ Reindexing (compaction) jobs ○ Be careful not to use Kafka transaction on producer side prior to Druid 0.15
  • 17. 17© 2020 Pinterest. All rights reserved. Learnings Group by queries ● Tail latency ○ Many are convertible to top N if you add a limit clause ○ Add a combined dimension if group by dimensions are more than 2 but fixed ○ Enable push limit down to sacrifice some accuracy for performance ○ Enable parallel broker side merge ○ Limit number of rows to do group by if possible from the application side ○ Make sure you have enough merge buffers to not run out them
  • 18. 18© 2020 Pinterest. All rights reserved. Learnings Secondary dimension query time pruning other than time ● Cluster computing resource is limited ○ Each segment is processed in one processing thread whose number is usually identical to number of cores ○ Cores are the expensive and are always fewer than number of segments ○ We should be cautious on which segments to scan for a query ● Shard specs with query time partition dimensions pruning ○ Batch ingestion ■ Hash based shard spec ■ Even size single dimension shard spec ○ Real time ingestion ■ Stream hash based shard spec
  • 19. 19© 2020 Pinterest. All rights reserved. Learnings Secondary dimension query time pruning other than time ● Shard specs with query time partition dimensions pruning ○ Batch ingestion ■ Hash based shard spec ● Worked well in most use cases ● Added missing query time pruning based on hashing and partition dimensions ● However: skewed data which leads to skewed segment size, long ingestion tail latency and query performance issue ■ Even size single dimension shard spec
  • 20. 20© 2020 Pinterest. All rights reserved. Learnings Secondary dimension query time pruning other than time ● Shard specs with query time partition dimensions pruning ○ Batch ingestion ■ Hash based shard spec ■ Even size single dimension shard spec ● Default single dimension shard spec will fit data for the same partition dimension value into a single segment ● Added a custom partitioner to distribute data for skewed partition dimension value to multiple segments ● Replaced the two very slow hadoop jobs (roll up input and calculate per partition dimension value number of rows to decide partition) with reading output from a SparkSQL job
  • 21. 21© 2020 Pinterest. All rights reserved. Learnings Secondary dimension query time pruning other than time ● Shard specs with query time partition dimensions pruning ○ Realtime ■ Stream hash based shard spec ● Real time ingestion defaults to use numbered shard spec which doesn’t have metadata on what data is in it which means every query has all segment fanout, making it very hard to support high query QPS ● The stream hashed shard spec is a real time version of batch Hash based shard spec ● Let Kafka producer puts records to different kafka partition id based on: hash(partition dimensions) % number of kafka partitions ● Cons: this approach doesn’t allow increasing kafka partitions which will lead to incorrect results during the transition period
  • 22. 22© 2020 Pinterest. All rights reserved. Learnings Operation tips ● druid.broker.select.tier and druid.server.priority ○ Controls routing for dark reads, Druid config AB testing and no downtime deploy
  • 23. 23© 2020 Pinterest. All rights reserved. Learnings Operation tips ● skipCoordinatorRun ○ Use this runtime config when deploy/restart historical nodes to avoid coordinator triggering unnecessary segments movements ● maxSegmentsInNodeLoadingQueue and maxSegmentsToMove ○ Segments are represented as children under a historical host znode ○ Load queue znodes not compressed ○ Be careful of hitting zk buffer limit (default to a few MBs) when loading a large number of segments to a historical node
  • 24. 24© 2020 Pinterest. All rights reserved.
  • 25. Time for questions @Pinterest 25 Thank you! Apache Druid is an independent project of The Apache Software Foundation. More information can be found at https://druid.apache.org. Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
  • 26. Dates: November 10, 2020 druidsummit.org 26 Register Now for the Next Druid Virtual Summit