SlideShare ist ein Scribd-Unternehmen logo
1 von 23
aniketmokashi@
June 2018
Procella: A Fast Versatile SQL Query Engine
About me
● Tech Lead on YouTube Data Team
○ Procella
○ Youtube Data Warehouse
○ Youtube Analytics Backend
○ Data Quality: Anomaly Detection
● Prior to Google
○ Data Infra teams at Twitter, Netflix
○ Apache Parquet PMC
○ Apache Pig PMC
○ Contributor: Hadoop, Hive
Agenda
● Products at YouTube and Google
● Use cases and features
● Architecture
● Key features deep dive
● Q&A
Youtube Analytics (youtube.com/analytics)
Youtube Music Insights (youtube.com/artists)
And more...
● Youtube Metrics
● Firebase Performance Console
● Google Play Console
● Youtube Internal Analytics
● ...
About Procella
A widely used SQL query engine at Google
● Fully featured: Most of SQL++: Structured, joins, set ops, analytic functions
● Super fast: Most queries are sub-second.
● Highly scalable: Petabytes of data, thousands of tables, trillions of rows, ...
● High QPS: Millions of point queries @ msec, thousands of reporting queries @
sub-second, hundreds of ad-hoc queries @ seconds
● Designed for real-time: Millions QPS instant ingestion, highly efficient, …
● Easy to use: SQL, DDL, Virtual tables, …
● Compatible: Capacitor, Dremel, ACLs, Encryption, …
External Reporting/Analytics
● Use cases
○ YouTube Analytics
○ Firebase Perf Monitoring
○ Google Play Console
○ Data Studio
● Properties
○ High QPS, low latency
ingestion and queries
○ Hundreds of metrics across
dozens of dimensions
○ SQL
● Technology:
○ On the fly aggregations
○ Indexing & partitioning
○ Real-time ingestion
○ Batch & Real-time Stitching
■ Lambda architecture
○ Caching
Internal Reporting/Dashboards
● Use cases
○ Dashboards
○ Experiments Analysis
○ Custom reporting
● Properties
○ Low qps
○ Speed & scale
○ SQL
● Technology
○ Stateful caching
○ Columnar optimized data
format (Artus)
○ Vector evaluation
○ Virtual tables
○ Tools (Dremel) and format
(Capacitor) compatibility
Real Time Insights/Monitoring
● Use cases
○ YTA Real Time (External)
○ YTA Insights (External)
○ YT Site Health (Internal)
● Properties
○ Native time-series support
○ Complex compaction, complex
metrics
○ High scan rate
○ Very high real time ingestion rate
● Technology
○ TTL
○ SQL based compaction
○ Approx algorithms
(Quantiles, Top, ..)
Real Time Stats Serving
● Use cases
○ YouTube metrics
(subscriptions, likes, etc.) on
various YT pages
● Properties
○ Millions of rows ingested per
sec
○ Millions of queries per sec
○ Milliseconds response time
○ Simple point queries
○ Many replicas
● Technology
○ Indexable columnar format
(Artus)
○ Heavily optimized serving
path
○ Pubsub based ingestion
Ad-hoc Analytics
● Use cases
○ Data Science
○ Analysts
● Properties
○ Complex SQL queries
○ Multi-stage queries
○ Semi structured and
complex data (eg - user
profiles)
○ Joins
● Technology
○ Efficient RDMA Shuffle
○ Efficient Joins - Broadcast,
Lookup, Pipelined, Shuffled,
Co-partitioned
○ Multi-stage queries
○ Optimizer (adaptive)
○ UDF
Data Pipelines
● Use cases
○ Pipelines
○ Logs Processing
● Properties
○ Complex business logic
○ Multi-stage queries
○ Large shuffles
○ Joins
○ UDFs
● Technology
○ Efficient RDMA Shuffle
○ Efficient Joins - Broadcast,
Lookup, Pipelined, Shuffled,
Co-partitioned
○ Multi-stage queries
○ Optimizer (adaptive)
○ UDFs
Architecture
Procella Scale
Property Value
Queries executed 450+ million per day
Queries on real-time data 200+ million per day
Leaf plans executed 90+ billion per day
Rows scanned 20+ quadrillion per day
Rows returned 100+ billion per day
Peak scan rate per query 100+ billion rows per second
Schema size 200+ metrics and dimensions
Tables 100+
Procella Latencies
Property p50 p99 p99.9
E2E Latency 25 ms 450 ms 3000 ms
Rows Scanned 17 M 270 M 1000 M
MDS Latency 6 ms 120 ms 250 ms
DS Latency 1 ms 75 ms 220 ms
Query Size 300B 4.7KB 10KB
Metadata Server Latencies
Percentile Latency (ms) Tablets Pruned Tablets Scanned
p50 6 110,000 60
p90 11 900,000 250
p95 15 1,000,000 310
p99 120 33,000,000 4000
p99.9 250 38,000,000 7200
p99.99 5100 38,050,000 11,500
Procella Secret Sauce
● Stateful Caching
○ Cached indexes - zone-maps, dictionaries, bloom filters etc.
○ File handle, metadata caching
○ Data segments caching
○ Data segments have data server affinity (Primary DS, Secondary DS,...)
○ Cached expressions
● Tail Latency Reduction
○ Backup RPCs - at about 90% of total RPCs.
○ Minibatch backup RPCs - one per n RPCs as a backup for slowest of
those n RPCs.
Procella Secret Sauce
● Indexing
○ Separate metadata server optimized for compact storage of zone maps
○ Dynamic range partitioning
○ Use of dictionaries and bloom filters
○ Posting lists for repeated data
● Distributed Execution
○ N tier execution, X000 parallel
● Vectorized Evaluation: Superluminal
○ Performs block eval
○ Natively understands RLE and dictionary
○ Columnar evaluation
Procella Secret Sauce
● Artus File Format
○ Multi-pass adaptive encodings to select the best encoding for a column
○ Uses adaptive encodings vs generalized compression
○ O(log n) lookups, O(1) seeks
○ Length based representation for repeated data types (arrays)
○ Exposes dictionary, RLE information to execution engine
○ Allows for rich metadata, inverted index
Procella Secret Sauce
● Real-time Ingestion
○ Data ingested into memory buffers
○ Periodically checkpointed to files
○ Small files are periodically aggregated and compacted into large files
● Virtual Tables
○ Index aware aggregate selection
○ Stitched Queries
○ Lambda architecture awareness
○ Normalization friendly (dimensional joins)
TPC-H queries*
* Run of TPC-H queries on:
● Static artus data
● Large shared instance
● Manually optimized queries
Geomean: 9.6 Seconds
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Cloud Data Warehouses
Cloud Data WarehousesCloud Data Warehouses
Cloud Data WarehousesAsis Mohanty
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Tackle Your Dark Data Challenge with AWS Glue - AWS Online Tech Talks
Tackle Your Dark Data  Challenge with AWS Glue - AWS Online Tech TalksTackle Your Dark Data  Challenge with AWS Glue - AWS Online Tech Talks
Tackle Your Dark Data Challenge with AWS Glue - AWS Online Tech TalksAmazon Web Services
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
 Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit... Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...Databricks
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...HostedbyConfluent
 
Sprint's Data Modernization Journey
Sprint's Data Modernization JourneySprint's Data Modernization Journey
Sprint's Data Modernization JourneyHortonworks
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsDataWorks Summit/Hadoop Summit
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 

Was ist angesagt? (20)

Cloud Data Warehouses
Cloud Data WarehousesCloud Data Warehouses
Cloud Data Warehouses
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
Data engineering
Data engineeringData engineering
Data engineering
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Tackle Your Dark Data Challenge with AWS Glue - AWS Online Tech Talks
Tackle Your Dark Data  Challenge with AWS Glue - AWS Online Tech TalksTackle Your Dark Data  Challenge with AWS Glue - AWS Online Tech Talks
Tackle Your Dark Data Challenge with AWS Glue - AWS Online Tech Talks
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
 Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit... Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
 
Sprint's Data Modernization Journey
Sprint's Data Modernization JourneySprint's Data Modernization Journey
Sprint's Data Modernization Journey
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop components
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 

Ähnlich wie Procella: A fast versatile SQL query engine powering data at Youtube

Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseGruter
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseJihoon Son
 
Elasticsearch as a time series database
Elasticsearch as a time series databaseElasticsearch as a time series database
Elasticsearch as a time series databasefelixbarny
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at TwitterPrasad Wagle
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...HostedbyConfluent
 
Data Lessons Learned at Scale - Big Data DC
Data Lessons Learned at Scale - Big Data DCData Lessons Learned at Scale - Big Data DC
Data Lessons Learned at Scale - Big Data DCCharlie Reverte
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerMichael Spector
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013Emanuel Calvo
 
Presto Bangalore Meetup1 Repertoire@Myntra
Presto Bangalore Meetup1 Repertoire@MyntraPresto Bangalore Meetup1 Repertoire@Myntra
Presto Bangalore Meetup1 Repertoire@MyntraShubham Tagra
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
Data Lessons Learned at Scale
Data Lessons Learned at ScaleData Lessons Learned at Scale
Data Lessons Learned at ScaleCharlie Reverte
 
20131008 - Wajug - TweetWall Pro
20131008 - Wajug - TweetWall Pro20131008 - Wajug - TweetWall Pro
20131008 - Wajug - TweetWall ProPascal Alberty
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataJihoon Son
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016kbajda
 
Big data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsBig data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsClaudiu Coman
 

Ähnlich wie Procella: A fast versatile SQL query engine powering data at Youtube (20)

Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
 
Elasticsearch as a time series database
Elasticsearch as a time series databaseElasticsearch as a time series database
Elasticsearch as a time series database
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
Data Lessons Learned at Scale - Big Data DC
Data Lessons Learned at Scale - Big Data DCData Lessons Learned at Scale - Big Data DC
Data Lessons Learned at Scale - Big Data DC
 
What's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and BeyondWhat's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and Beyond
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013
 
Presto Bangalore Meetup1 Repertoire@Myntra
Presto Bangalore Meetup1 Repertoire@MyntraPresto Bangalore Meetup1 Repertoire@Myntra
Presto Bangalore Meetup1 Repertoire@Myntra
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Data Lessons Learned at Scale
Data Lessons Learned at ScaleData Lessons Learned at Scale
Data Lessons Learned at Scale
 
20131008 - Wajug - TweetWall Pro
20131008 - Wajug - TweetWall Pro20131008 - Wajug - TweetWall Pro
20131008 - Wajug - TweetWall Pro
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
 
Big data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsBig data @ Hootsuite analtyics
Big data @ Hootsuite analtyics
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Kürzlich hochgeladen (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Procella: A fast versatile SQL query engine powering data at Youtube

  • 1. aniketmokashi@ June 2018 Procella: A Fast Versatile SQL Query Engine
  • 2. About me ● Tech Lead on YouTube Data Team ○ Procella ○ Youtube Data Warehouse ○ Youtube Analytics Backend ○ Data Quality: Anomaly Detection ● Prior to Google ○ Data Infra teams at Twitter, Netflix ○ Apache Parquet PMC ○ Apache Pig PMC ○ Contributor: Hadoop, Hive
  • 3. Agenda ● Products at YouTube and Google ● Use cases and features ● Architecture ● Key features deep dive ● Q&A
  • 5. Youtube Music Insights (youtube.com/artists)
  • 6. And more... ● Youtube Metrics ● Firebase Performance Console ● Google Play Console ● Youtube Internal Analytics ● ...
  • 7. About Procella A widely used SQL query engine at Google ● Fully featured: Most of SQL++: Structured, joins, set ops, analytic functions ● Super fast: Most queries are sub-second. ● Highly scalable: Petabytes of data, thousands of tables, trillions of rows, ... ● High QPS: Millions of point queries @ msec, thousands of reporting queries @ sub-second, hundreds of ad-hoc queries @ seconds ● Designed for real-time: Millions QPS instant ingestion, highly efficient, … ● Easy to use: SQL, DDL, Virtual tables, … ● Compatible: Capacitor, Dremel, ACLs, Encryption, …
  • 8. External Reporting/Analytics ● Use cases ○ YouTube Analytics ○ Firebase Perf Monitoring ○ Google Play Console ○ Data Studio ● Properties ○ High QPS, low latency ingestion and queries ○ Hundreds of metrics across dozens of dimensions ○ SQL ● Technology: ○ On the fly aggregations ○ Indexing & partitioning ○ Real-time ingestion ○ Batch & Real-time Stitching ■ Lambda architecture ○ Caching
  • 9. Internal Reporting/Dashboards ● Use cases ○ Dashboards ○ Experiments Analysis ○ Custom reporting ● Properties ○ Low qps ○ Speed & scale ○ SQL ● Technology ○ Stateful caching ○ Columnar optimized data format (Artus) ○ Vector evaluation ○ Virtual tables ○ Tools (Dremel) and format (Capacitor) compatibility
  • 10. Real Time Insights/Monitoring ● Use cases ○ YTA Real Time (External) ○ YTA Insights (External) ○ YT Site Health (Internal) ● Properties ○ Native time-series support ○ Complex compaction, complex metrics ○ High scan rate ○ Very high real time ingestion rate ● Technology ○ TTL ○ SQL based compaction ○ Approx algorithms (Quantiles, Top, ..)
  • 11. Real Time Stats Serving ● Use cases ○ YouTube metrics (subscriptions, likes, etc.) on various YT pages ● Properties ○ Millions of rows ingested per sec ○ Millions of queries per sec ○ Milliseconds response time ○ Simple point queries ○ Many replicas ● Technology ○ Indexable columnar format (Artus) ○ Heavily optimized serving path ○ Pubsub based ingestion
  • 12. Ad-hoc Analytics ● Use cases ○ Data Science ○ Analysts ● Properties ○ Complex SQL queries ○ Multi-stage queries ○ Semi structured and complex data (eg - user profiles) ○ Joins ● Technology ○ Efficient RDMA Shuffle ○ Efficient Joins - Broadcast, Lookup, Pipelined, Shuffled, Co-partitioned ○ Multi-stage queries ○ Optimizer (adaptive) ○ UDF
  • 13. Data Pipelines ● Use cases ○ Pipelines ○ Logs Processing ● Properties ○ Complex business logic ○ Multi-stage queries ○ Large shuffles ○ Joins ○ UDFs ● Technology ○ Efficient RDMA Shuffle ○ Efficient Joins - Broadcast, Lookup, Pipelined, Shuffled, Co-partitioned ○ Multi-stage queries ○ Optimizer (adaptive) ○ UDFs
  • 15. Procella Scale Property Value Queries executed 450+ million per day Queries on real-time data 200+ million per day Leaf plans executed 90+ billion per day Rows scanned 20+ quadrillion per day Rows returned 100+ billion per day Peak scan rate per query 100+ billion rows per second Schema size 200+ metrics and dimensions Tables 100+
  • 16. Procella Latencies Property p50 p99 p99.9 E2E Latency 25 ms 450 ms 3000 ms Rows Scanned 17 M 270 M 1000 M MDS Latency 6 ms 120 ms 250 ms DS Latency 1 ms 75 ms 220 ms Query Size 300B 4.7KB 10KB
  • 17. Metadata Server Latencies Percentile Latency (ms) Tablets Pruned Tablets Scanned p50 6 110,000 60 p90 11 900,000 250 p95 15 1,000,000 310 p99 120 33,000,000 4000 p99.9 250 38,000,000 7200 p99.99 5100 38,050,000 11,500
  • 18. Procella Secret Sauce ● Stateful Caching ○ Cached indexes - zone-maps, dictionaries, bloom filters etc. ○ File handle, metadata caching ○ Data segments caching ○ Data segments have data server affinity (Primary DS, Secondary DS,...) ○ Cached expressions ● Tail Latency Reduction ○ Backup RPCs - at about 90% of total RPCs. ○ Minibatch backup RPCs - one per n RPCs as a backup for slowest of those n RPCs.
  • 19. Procella Secret Sauce ● Indexing ○ Separate metadata server optimized for compact storage of zone maps ○ Dynamic range partitioning ○ Use of dictionaries and bloom filters ○ Posting lists for repeated data ● Distributed Execution ○ N tier execution, X000 parallel ● Vectorized Evaluation: Superluminal ○ Performs block eval ○ Natively understands RLE and dictionary ○ Columnar evaluation
  • 20. Procella Secret Sauce ● Artus File Format ○ Multi-pass adaptive encodings to select the best encoding for a column ○ Uses adaptive encodings vs generalized compression ○ O(log n) lookups, O(1) seeks ○ Length based representation for repeated data types (arrays) ○ Exposes dictionary, RLE information to execution engine ○ Allows for rich metadata, inverted index
  • 21. Procella Secret Sauce ● Real-time Ingestion ○ Data ingested into memory buffers ○ Periodically checkpointed to files ○ Small files are periodically aggregated and compacted into large files ● Virtual Tables ○ Index aware aggregate selection ○ Stitched Queries ○ Lambda architecture awareness ○ Normalization friendly (dimensional joins)
  • 22. TPC-H queries* * Run of TPC-H queries on: ● Static artus data ● Large shared instance ● Manually optimized queries Geomean: 9.6 Seconds

Hinweis der Redaktion

  1. Hi Everyone, I’m Aniket Mokashi, I am a tech lead at Google. Let me give you a little bit of background about myself. I work on Youtube’s Data team. At Youtube, I primarily contribute to Procella, which is what this talk is about. In addition, I work on Youtube Analytics Backend and on variety of projects under Youtube Data warehouse. Prior to Google, I have worked on data infra teams at Twitter and Netflix. I’m a Apache Parquet PMC member. I’ve contributed a few encodings, pig integration to Parquet. I was responsible for rolling out for first production use of Parquet at Twitter. I’m also a Apache Pig PMC member. My main contributions have been support of UDFs in scripting languages, Native Mapreduce, scalar values support, auto local mode and jar caching. I enjoy working on large scale data processing systems and in this talk I will cover Procella, a versatile analytics engine we’ve built at Youtube to serve most of our analytics needs.
  2. We will cover a number of topics in this talk. First, we will look at products at Youtube and Google that are powered by Procella. Next, we will explore variety of use cases enabled by Procella and we will discuss features that are required in order to support those use cases. After that, I will give you all an overview of Procella’s architecture. And lastly, we will spend some time to deep dive into some of the key features that make Procella work so well. In interest of time, I will be taking questions at the end.
  3. Let’s look at some products that are powered by Procella. But, before we get started, let me ask you all a few questions: So, raise of hands, How many of you use YouTube regularly like at least once a week… Perfect - that makes all of you. How many of you are Youtube Creators - that is you have uploaded at least one video to Youtube? Those of you who haven’t done that yet, I will highly encourage you to try it out. Especially to get rich analytics... :) How many of you have used youtube analytics for tracking their videos or at least seen youtube analytics before? Excellent. So, Youtube Analytics is this amazingly rich analytics dashboard that lets Youtube content creators or content owners analyze usage and popularity of their videos and channels on the Youtube platform. It lets them track all of success metrics such as number of views, amount of watch time, revenue from monetized videos across various dimensions such as demographics, geo, devices, playback location etc. We also have a mobile application, that you see on the right, called creator studio which has similar details. Some of this information is also available in real-time. Three years back, we started building Procella primarily to serve as a backend for Youtube analytics which is often referred as YTA. We wanted to make sure that backend of YTA will continue to scale horizontally as Youtube grows and as we add more features to the product. We had a few design goals in mind when we started working on Procella. First, we wanted to provide widely understood SQL interface so that navigating through this data is easy and integrating various frontends with the backend was less cumbersome. Second, we also wanted to provide flexibility to extend the product without a lot of involvement of the backend team, so we wanted to build as generic product as possible. We launched Procella as a backend for Youtube Analytics about 2 and half years ago. Since then, we have been able to make several changes to the the product seamlessly. For example, recently, we started showing impressions and impressions CTR in YTA and it required almost no involvement of the Procella team.
  4. In addition to Youtube Analytics, over last few years Procella is being used for several other internal and external facing analytics dashboards or products. One example is Youtube Artists product that you can see on the slides. This product allows everyone to track popularity of artists and their music videos in a region on Youtube platform. If you haven’t already, I recommend you to check it out.
  5. There are few other fairly large products at Google that are powered by Procella. For example - Firebase Performance Console. It lets app developers track various growth and success related metrics for their apps that are integrated with firebase platform. Google Play Console - that lets android application developers track performance and popularity of their apps. Also, now, various metric numbers like subscriptions, likes, dislikes that you see on the youtube.com website or app are also powered by Procella. Given the popularity of Youtube, as you can imagine, this was a significant achievement in terms of scale. I will describe later in the talk, the system properties that allowed us achieve it.
  6. So now that you have looked at these interesting products that are using Procella. Let’s look at Procella. What is Procella? Procella is a widely used SQL query engine at Google. It supports most of the standard SQL functionality including joins, set operations and analytic functions. It also supports queries on top of complex or structured data types like arrays, structs and maps. What makes it unique is that it’s built to be super fast at scale - so most of the queries running on Procella have sub-second latencies. It’s highly scalable - works on petabytes of data, on thousands of tables storing trillions of rows. It can handle high qps of queries - like millions of qps of point lookup queries at millisecond latencies and thousands of qps of dashboard queries at sub-second latencies and hundreds of ad-hoc queries running in seconds. It is also designed to serve data ingested in real-time. So, it supports millions of qps of ingestion of data rows. It’s easy to use with SQL interface that supports DDL statements to create tables and partitions. It also supports virtual tables. Virtual tables are used to hide the complexity of data model that is required to serve a use case. Procella is developed to be compatible with existing tools at Google so that it can adopted easily. It exposes Dremel’s query interface which is a popular query system at Google and it can process files in Capacitor file format which is Dremel’s native file format.
  7. Procella is a composable and versatile system. It powers a variety of use cases ranging from external facing metrics reporting applications to data pipelines. Let’s go through these use cases one by one to understand the motivations behind the architecture and various features in Procella. External reporting applications are public facing products like Youtube analytics, firebase performance console, Google play console those I showed before. Being public facing, these applications have high qps and low-blink-of-an-eye latency requirements. These products typically surface over a hundred metrics sliced and diced by a few dozen of dimensions for a given entity or a customer. Having a SQL interface instead of an API interface makes it easy build the backends and frontends independently. These applications usually work on large datasets. Precomputing all the required metrics numbers required to power these dashboards is expensive or sometimes even infeasible. So, having ability to perform fast aggregations of these metrics on the fly is important. To enable these applications, Procella provides efficient indexing and tablet pruning functionality as most of these queries need to process a small number of rows out of a very large amount of data. These applications also need ability to ingest real-time data to enable fast insights. And, in most cases, the ability to stitch between realtime datasets and batch datasets using lambda architecture is crucial. These applications can leverage temporal locality of data, so building caching at different layers helps in performance.
  8. Another use case is internal reporting or dashboards. Some of the examples internal reporting are: product dashboards, experiment analysis dashboards, custom reports. Internal reporting has much lower qps and relatively less aggressive latency requirements compared to external reporting. However, internal dashboards typically work on much complex and larger datasets. So, data scan efficiency and ability to handle complex data types are important to be able to serve their queries. To enable these use cases, we have developed features such as fast optimized data formats, vectorized evaluation library.
  9. Real-time time-series analysis and monitoring is another use case. YTA provides insights in real-time. In addition, we also to track Youtube site health using Procella. These applications require native support of time dimension especially to filter based on different time boundaries like last 60 mins or last 2 days.
  10. As I mentioned previously, use of Procella to power metrics on various pages of the Youtube platform is one of the most exciting use case. Youtube gets millions of activity updates per second. So, this requires millions of rows per second of reliable ingestion in real-time. This is mainly done through pubsub which is persistent message queue. Updates are replicated to multiple ingestion servers for reliability. On the serving side, this use case requires ability to serve millions of simple point lookup queries per seconds. These queries take only a few milliseconds to run on the ingested data. To make this possible, we have heavily optimized the query serving path with additional features that can lookup and scan required data efficiently.
  11. Ad-hoc analytics essentially covers interactive querying at a low qps. Data scientists and analysts make use of tables stored in Procella to identify trends, growth factors and other important custom business metrics. This typically requires querying complex data types like arrays, nested structs and ability to join arbitrary datasets. We support a number of joins that are required for these queries to work. In particular, we support broadcast join - which broadcasts the right side of the joins to all the leaf nodes so that they can construct local hash maps for lookups during the joins. We support remote lookup joins that leverage our lookup-friendly data format. We support pipelined joins that run in stages. They first compute small amount of join data from right side so that it can be used as a filter on the left side in the subsequent stages. Shuffled joins, which are essentially reduce-side joins that can scale for arbitrary large datasets. Lastly, co-partitioned joins, which leverage partitioning of left and right sides to efficiently merge them during the join.
  12. Supporting SQL based data pipelines requires ability to handle large amount of data and process it using a complex business logic. These are typically multi-stage queries that join various data sources. These joins and aggregations require large shuffles. For logic that cannot be expressed in SQL, UDF support is necessary. To enable this use case, we have plugable shuffle component and we leverage Google wide available efficient RDMA shuffle service. We have also implemented adaptive cost based optimizer that can optimize parts of these queries on the fly.
  13. All of the categories of use cases I mentioned so far are being powered by Procella in production at Youtube. Procella’s unique scalable, composable architecture makes that possible. Let’s dive into the architecture now. This diagram roughly shows architecture of Procella. All rectangular boxes in the diagram are essentially independent services in the Procella query engine. Towards your left is metadata server, metadata store and registration server. Users register their tables with Procella using DDL commands like CREATE/ALTER table or programmatically using RPCs. This is done to define the schemas of the tables. Then users setup upstream batch processes to periodically create partitions of datasets. These datasets are stored on Google’s distributed file system called Colossus. After datasets are generated, users register them with Procella using ALTER table commands or programmatically with RPCs. These datasets are typically stored in Columnar file formats such as Capacitor or Artus. Each dataset consists of many blocks of rows called tablets. A tablet typically has tens of thousands to millions of rows. Data is generally range or hash partitioned to achieve clustering of data by columns that are frequently filtered by. During the data registration, registration server extracts metadata information of all tablets such as stats, zone-maps, dictionaries, bloom filters and stores it in the metadata store in a highly compact and organized way. In the query path, shown on your right, with yellow colored arrows, dashboard clients or human users send their SQL queries as strings. These are compiled into distributed multi-level query plans to be executed on data servers. Root server is responsible for compiling and coordinating execution of the query. It also does the last stage of query execution, before returning results back to the users. Metadata server is primarily responsible for pruning tablets based on the partition metadata stored in the metadata store. To make this efficient, metadata is cached in the metadata servers in a LRU cache. We use zone-maps, dictionaries and bloom filters for pruning tablets required for the query. Once the tablets to be processed are identified, distributed query plan is executed on the data servers for those tablets. Data servers load the columns required for query execution on need basis and cache them into large local caches at every data server. Data shuffles between stages of the query are handled using remote RDMA shuffle service. In realtime ingestion, shown with blue arrows, data producers send rows of data directly via RPC or using a persistent pub sub queue. Based on the pre-configured partitioning on selected columns, data is ingested into two ingestion servers, primary and secondary for each row. This data is held into efficient lookup memory buffers and are periodically checkpointed to Colossus. These checkpointed files are eventually compacted or aggregated into larger files for efficiency of querying. Data ingested in Procella is servable from the point it is ingested into the system. So, queries on realtime tables are served from buffers and small files and the large compacted files. The lifecycle of these buffers, small files and large files is coordinated with transactions on the metadata store via registration server.
  14. Now that we have looked at the architecture. Let’s look at some scale and latency numbers. These are numbers from one of our instances serving analytics on Youtube. On this instance, we execute about 450+ million queries per day. Amongst which 200+ million are queries on real-time datasets. This results into over 90+ billion of leaf plans that run on the data servers. In total, this scans over 20 quadrillion rows per day and returns over 100 billion rows. We can achieve peak scan rate of 100+ billion rows per second. This instance has over 100 tables with more than 200 columns.
  15. Let’s look at the latency numbers for our analytics instance. As you can see, our queries have just 25 ms of e2e latency at median. Our slowest queries take about 3 seconds possibly because of the size of the query and cache misses. At median, a query scans about 17M rows.
  16. On range partitioned data, Procella metadata server can prune large number of tablets to be scanned. This is done by using efficient data structures to store tablet boundaries in memory and then binary searching through them to identify the tablets that satisfy the given filter clause.
  17. Now, let’s deep dive into some of the features that make Procella work so well. These few features are essentially secret sauce that make Procella so fast and efficient. Stateful caching: For fast and efficient processing, we cache at various layers in Procella. On the metadata server, constraints or zone-maps and other indexes are cached in a LRU cache to avoid going to metadata store. These are stored in a versioned cache so that invalidation of cache is easy. As I mentioned before, data processed by Procella is stored remotely on Colossus, distributed file system. To allow efficient fetch of data, we cache file handles which essentially hold location of the remote Colossus server and metadata information like number of chunks, size and location of chunks. We also cache column level metadata information including size of various columns, their offset in the file. In addition, in a separate cache, we store blocks of columns required during query processing. We use a smart age and cost aware caching policy for this cache. Data segments have data server affinity to achieve high cache hit rate. So, any data segment is loaded and processed only by two data servers, a primary and a secondary for that segment. Any request to process a data segment first goes to primary and if primary does not respond fast enough, secondary handles that request. In addition, we do support expression caching that allows users to cache expensive expressions in the query. Now, lets talk about tail latency reduction. To achieve tail latency reduction, we use speculative execution for data server RPCs. So, during execution of any query, after 90% of query execution completes, we send backup RPCs to secondary data servers for the rest 10% incomplete RPCs. Also, we send minibatch backup RPCs, during the execution of query at regular intervals. That is, one backup RPC is sent for slowest RPCs in every set of n RPCs.
  18. Indexing is one of the thing that makes Procella architecturally different from other query engines. Procella uses a separate metadata server to store indexing information and does a separate efficient processing pass over it before query execution to prune and select tablets. Procella uses dynamic range partitioning, for both batch and realtime tables, to achieve uniform distribution of data across tablets. Dictionary pruning is helpful when there are relatively small number of values compared to number of rows and bloom filters help significantly for needle in the haystack kind of queries for example - getting metrics for less popular videos like views for videos on my channel over last 28 days. Our query execution is fully distributed and massively parallel so we can leverage power of all the data servers during execution of a query. We also have a vectorized evaluation engine, called Superluminal that can work on block of values and leverage modern CPU instructions during evaluation. Superluminal is columnar evaluation library that can natively understand and leverage RLE and dictionary encodings to make group by and filtering operations much cheaper to execute.
  19. Procella supports querying data stored in two columnar file formats- Capacitor and Artus. Capacitor is Dremel’s native file format which is widely used at Google. It supports RLE and dictionary encoded columns and has support for storing bloom filters for cheaply checking if a value exists in a column. We have also developed another file format, Artus which in addition to features in Capacitor provides efficient lookup and seek APIs on columns to make Procella evaluation significantly more efficient. In Artus, we use multi-pass adaptive encoding to select best encoding for a column. Artus gets space savings from this without using generalized compressions as used by other file formats. Artus uses adaptive encodings instead of generalized compressions to avoid full decompression costs of columns. It allows for lookups using binary search and O(1) seek APIs to move between rows. It also simplifies handling of complex data types by using length based representation for arrays instead of using repetition and definition levels. Artus exposes dictionary and RLE information to execution engine so that operations like filtering and group-bys can be made more efficient.
  20. We’ve already talked about realtime ingestion, but one more thing to add to it is that once the data is persisted to the files, queries on those are served by data servers. So ingestion server only manage memory buffers and queries on them. Lastly Procella supports virtual tables. These are used for various purposes. One, to simplify development and allow flexibility, they allow for efficient aggregate selection. So, users can write their queries to select metrics and dimensions from the virtual table without knowing what physical aggregates have them and Procella will rewrite these to query most efficient aggregate that can correctly serve this query. This also allows backend teams to add necessary aggregates, change backend data models independently without modifying any incoming queries. Two: virtual tables provide automatic stitching between batch and realtime tables. So, for pipelines using lambda architecture, as the batch processed data arrives, queries automatically add necessary filters on real-time data to show a stitched view of datasets. And lastly, virtual tables are normalization friendly. They provide denomalized view of the tables to the users and add necessary joins during the rewrite to perform denormalizations on the fly. I haven’t covered all the features here but these key features are primarily responsible for making Procella a success.
  21. Here are some results of running TPC-H benchmark queries on Procella. As you can see, using the techniques discussed before we are able to achieve one of the best results for this benchmark compared to other query engines. Note that these are slightly older numbers from running queries on static artus TPCH dataset, using a large instance with manually rewritten queries to add join hints etc.
  22. With that, let me conclude my talk by summarizing what we learned so far. In this talk, we learned about Procella, a composable, scalable and versatile query engine we have built at Youtube. We learned about the architecture and what makes it work so well. If this is something that excites you, feel free to come talk to me later to learn about opportunities. I can take questions now. But, before that, I would like to mention that I will not be sharing any numbers other than ones mentioned on the slides in this talk. Also, I will not comment on comparison of Procella with any other existing Google products.