SlideShare ist ein Scribd-Unternehmen logo
1 von 49
WHOOPS, THE NUMBERS
ARE WRONG!
SCALING DATA QUALITY
@
MICHELLE UFFORD
DATA ENGINEERING & ANALYTICS, NETFLIX
HADOOP SUMMIT 2017
Overview.
The business.
20170612
100+ million
members
$6 billion
on content
125+ million
hours watched
launched
in 1997
every. day.
$
Anytime. Anywhere.*
20170612
* Well, almost anywhere.
Any device.
20170612
300 terabyte
DW writes
5 petabyte
DW reads
The data.
20170612
60+ petabyte
data warehouse
700+ billion
events written
300 terabyte
DW writes
5 petabyte
DW reads
The data.
20170612
60+ petabyte
data warehouse
700+ billion
events written
20170612
data access
AWS
S3
Amazon
Redshift
data processing
fast storage data viz
METACA
T
data services
events data
operational data
elastic storage
Apache Pig
Big Data Platform
Data Quality.
Federated metastore &
extensible data catalog
20170612
Metacat
Federated Metastore
s3://…/dw/fact_table_f/utc_date=20170101/batchid=1483229855
…
s3://…/dw/fact_table_f/utc_date=20170611/batchid=1497226702
s3://…/dw/fact_table_f/utc_date=20170612/batchid=1497312541
dw.fact_table_f
20170612
Metacat
Federated Metastore
s3://…/dw/fact_table_f/utc_date=20170101/batchid=1483229855
…
s3://…/dw/fact_table_f/utc_date=20170611/batchid=1497226702
s3://…/dw/fact_table_f/utc_date=20170612/batchid=1497312541
dw.fact_table_f
utc_date=20170101
utc_date=20170611
utc_date=20170612
…
20170612
Metacat
Federated Metastore
utc_date=20170101
20170612
Metacat
Federated Metastore
utc_date=20170101
20170612
Extended table attributes
● primary key(s)
● column types
● lifecycle
● audience
● “valid-thru” timestamp
● … and much more
Metacat
Federated Metastore
Data Quality Service.
20170612
Quinto
Data Quality Service
20170612
Quinto
Data Quality Service
20170612
Quinto
Data Quality Service
20170612
Quinto
Data Quality Service
Write - Audit - Publish
ETL pattern
for high-quality
big data jobs
20170612
s3://…/utc_date=20170101/batchid=1483229855
…
s3://…/utc_date=20170611/batchid=1497226702
dw.my_table_f
WAP
ETL Pattern
20170612
s3://…/utc_date=20170101/batchid=1483229855
…
s3://…/utc_date=20170611/batchid=1497226702
dw.my_table_f audit.my_table_f_1497312000
WAPStage-0: Prep
ETL Pattern
20170612
s3://…/utc_date=20170101/batchid=1483229855
…
s3://…/utc_date=20170611/batchid=1497226702
s3://…/utc_date=20170612/batchid=1497312541
WAPStage-1: Write
audit.my_table_f_1497312000dw.my_table_f
ETL Pattern
20170612
s3://…/utc_date=20170101/batchid=1483229855
…
s3://…/utc_date=20170611/batchid=1497226702
WAPStage-1: Write
audit.my_table_f_1497312000
$TABLE
dw.my_table_f
utc_date=20170612
com.netflix.dse.mds.metric.RowCount: 17240
com.netflix.dse.mds.metric.NullCount: 17240
...
ETL Pattern
20170612
s3://…/utc_date=20170101/batchid=1483229855
…
WAPStage-2: Audit
audit.my_table_f_1497312000dw.my_table_f
utc_date=20170611
com.netflix.dse.mds.metric.RowCount: 16135
com.netflix.dse.mds.metric.NullCount: 21
...
Quint
o
utc_date=20170612
com.netflix.dse.mds.metric.RowCount: 17240
com.netflix.dse.mds.metric.NullCount: 17240
...
ETL Pattern
20170612
…
WAPStage-2: Audit
audit.my_table_f_1497312000dw.my_table_f
Quint
o
metric eval behavior result
--------------------------------------------------
RowCount >= zero fail job
RowCount >= prior value fail job
NullCount normal dist warn job
Quinto configuration
utc_date=20170612
com.netflix.dse.mds.metric.RowCount: 17240
com.netflix.dse.mds.metric.NullCount: 17240
...
utc_date=20170611
com.netflix.dse.mds.metric.RowCount: 16135
com.netflix.dse.mds.metric.NullCount: 21
...
ETL Pattern
20170612
…
WAPStage-2: Audit
audit.my_table_f_1497312000dw.my_table_f
Quint
o
metric eval behavior result
--------------------------------------------------
RowCount >= zero fail job
RowCount >= prior value fail job
NullCount normal dist warn job
utc_date=20170612
com.netflix.dse.mds.metric.RowCount: 17240
com.netflix.dse.mds.metric.NullCount: 17240
...
utc_date=20170611
com.netflix.dse.mds.metric.RowCount: 16135
com.netflix.dse.mds.metric.NullCount: 21
...
Quinto configuration
ETL Pattern
20170612
…
WAPStage-2: Audit
audit.my_table_f_1497312000dw.my_table_f
Quint
o
metric eval behavior result
--------------------------------------------------
RowCount >= zero fail job
RowCount >= prior value fail job
NullCount normal dist warn job
utc_date=20170612
com.netflix.dse.mds.metric.RowCount: 17240
com.netflix.dse.mds.metric.NullCount: 17240
...
utc_date=20170611
com.netflix.dse.mds.metric.RowCount: 16135
com.netflix.dse.mds.metric.NullCount: 21
...
Quinto configuration
ETL Pattern
20170612
…
WAPStage-2: Audit
audit.my_table_f_1497312000dw.my_table_fmetric eval behavior result
--------------------------------------------------
RowCount >= zero fail job pass
RowCount >= prior value fail job
NullCount normal dist warn job
Quint
o
utc_date=20170612
com.netflix.dse.mds.metric.RowCount: 17240
com.netflix.dse.mds.metric.NullCount: 17240
...
utc_date=20170611
com.netflix.dse.mds.metric.RowCount: 16135
com.netflix.dse.mds.metric.NullCount: 21
...
Quinto configuration
ETL Pattern
20170612
…
WAPStage-2: Audit
audit.my_table_f_1497312000dw.my_table_fmetric eval behavior result
--------------------------------------------------
RowCount >= zero fail job pass
RowCount >= prior value fail job
NullCount normal dist warn job
Quint
o
utc_date=20170612
com.netflix.dse.mds.metric.RowCount: 17240
com.netflix.dse.mds.metric.NullCount: 17240
...
utc_date=20170611
com.netflix.dse.mds.metric.RowCount: 16135
com.netflix.dse.mds.metric.NullCount: 21
...
Quinto configuration
ETL Pattern
20170612
…
WAPStage-2: Audit
audit.my_table_f_1497312000dw.my_table_fmetric eval behavior result
--------------------------------------------------
RowCount >= zero fail job pass
RowCount >= prior value fail job
NullCount normal dist warn job
Quint
o
utc_date=20170612
com.netflix.dse.mds.metric.RowCount: 17240
com.netflix.dse.mds.metric.NullCount: 17240
...
utc_date=20170611
com.netflix.dse.mds.metric.RowCount: 16135
com.netflix.dse.mds.metric.NullCount: 21
...
Quinto configuration
ETL Pattern
20170612
…
WAPStage-2: Audit
audit.my_table_f_1497312000dw.my_table_fmetric eval behavior result
--------------------------------------------------
RowCount >= zero fail job pass
RowCount >= prior value fail job pass
NullCount normal dist warn job
Quint
o
utc_date=20170612
com.netflix.dse.mds.metric.RowCount: 17240
com.netflix.dse.mds.metric.NullCount: 17240
...
utc_date=20170611
com.netflix.dse.mds.metric.RowCount: 16135
com.netflix.dse.mds.metric.NullCount: 21
...
Quinto configuration
ETL Pattern
20170612
…
WAPStage-2: Audit
audit.my_table_f_1497312000dw.my_table_fmetric eval behavior result
--------------------------------------------------
RowCount >= zero fail job pass
RowCount >= prior value fail job pass
NullCount normal dist warn job
Quint
o
utc_date=20170612
com.netflix.dse.mds.metric.RowCount: 17240
com.netflix.dse.mds.metric.NullCount: 17240
...
utc_date=20170611
com.netflix.dse.mds.metric.RowCount: 16135
com.netflix.dse.mds.metric.NullCount: 21
...
Quinto configuration
ETL Pattern
20170612
…
WAPStage-2: Audit
audit.my_table_f_1497312000dw.my_table_fmetric eval behavior result
--------------------------------------------------
RowCount >= zero fail job pass
RowCount >= prior value fail job pass
NullCount normal dist warn job
Quint
o
utc_date=20170612
com.netflix.dse.mds.metric.RowCount: 17240
com.netflix.dse.mds.metric.NullCount: 17240
...
utc_date=20170611
com.netflix.dse.mds.metric.RowCount: 16135
com.netflix.dse.mds.metric.NullCount: 21
...
Quinto configuration
ETL Pattern
20170612
…
WAPStage-2: Audit
audit.my_table_f_1497312000dw.my_table_fmetric eval behavior result
--------------------------------------------------
RowCount >= zero fail job pass
RowCount >= prior value fail job pass
NullCount normal dist warn job fail
Quint
o
utc_date=20170612
com.netflix.dse.mds.metric.RowCount: 17240
com.netflix.dse.mds.metric.NullCount: 17240
...
utc_date=20170611
com.netflix.dse.mds.metric.RowCount: 16135
com.netflix.dse.mds.metric.NullCount: 21
...
Quinto configuration
ETL Pattern
20170612
…
WAPStage-2: Audit
audit.my_table_f_1497312000dw.my_table_fmetric eval behavior result
--------------------------------------------------
RowCount >= zero fail job pass
RowCount >= prior value fail job pass
NullCount normal dist warn job fail
Quint
o
utc_date=20170612
com.netflix.dse.mds.metric.RowCount: 17240
com.netflix.dse.mds.metric.NullCount: 17240
...
utc_date=20170611
com.netflix.dse.mds.metric.RowCount: 16135
com.netflix.dse.mds.metric.NullCount: 21
...
Quinto configuration
ETL Pattern
20170612
s3://…/utc_date=20170101/batchid=1483229855
…
s3://…/utc_date=20170611/batchid=1497226702
WAPStage-3: Publish
audit.my_table_f_1497312000dw.my_table_f
s3://…/utc_date=20170612/batchid=1497312541
ETL Pattern
20170612
s3://…/utc_date=20170101/batchid=1483229855
…
s3://…/utc_date=20170611/batchid=1497226702
WAPStage-3: Publish
audit.my_table_f_1497312000dw.my_table_f
s3://…/utc_date=20170612/batchid=1497312541
ETL Pattern
20170612
s3://…/utc_date=20170101/batchid=1483229855
…
s3://…/utc_date=20170611/batchid=1497226702
WAPStage-3: Publish
s3://…/utc_date=20170612/batchid=1497312541
dw.my_table_f
valid_thru_ts = 20170613 00:00:00
ETL Pattern
20170612
Quinto evaluations
● intelligent recommendations
● multiple tiers of coverage
● configurable rules
Jumpstarter.
Python Library
WAP.
Python Library
Minimal requirements
● parameterized destination table
WAP.
Python Library
Running WAP.
20170612
What’s Next.
● additional Metacat statistics
● robust anomaly detection (RAD)
● complete migration for all prod tables
20170612
Tips & Lessons Learned.
● Query-based solution may be “good enough” for many.
● Not all tables need quality coverage.
● One size rarely fits all tables.
● Build components, not “all-or-nothing” frameworks.
MICHELLE UFFORD
mufford@netflix.com
twitter.com/MichelleUfford
DATA
techblog.netflix.com
medium.com/netflix-techblog
twitter.com/NetflixData
tinyurl.com/NetflixData
Thank you!
WE’RE HIRING! jobs.netflix.com

Weitere ähnliche Inhalte

Was ist angesagt?

Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidDatabricks
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기NAVER D2
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
 
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Shirshanka Das
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)SANG WON PARK
 
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta LakeNear Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta LakeDatabricks
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxData
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELKGeert Pante
 
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기AWSKRUG - AWS한국사용자모임
 
An Approach to Data Quality for Netflix Personalization Systems
An Approach to Data Quality for Netflix Personalization SystemsAn Approach to Data Quality for Netflix Personalization Systems
An Approach to Data Quality for Netflix Personalization SystemsDatabricks
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBconfluent
 
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashKeeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashAmazon Web Services
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastDatabricks
 
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarScalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarDatabricks
 

Was ist angesagt? (20)

Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and Druid
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
 
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta LakeNear Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
All about InfluxDB.
All about InfluxDB.All about InfluxDB.
All about InfluxDB.
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELK
 
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
 
An Approach to Data Quality for Netflix Personalization Systems
An Approach to Data Quality for Netflix Personalization SystemsAn Approach to Data Quality for Netflix Personalization Systems
An Approach to Data Quality for Netflix Personalization Systems
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashKeeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
 
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarScalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
 

Ähnlich wie Scaling Data Quality @ Netflix

SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal Health
SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal HealthSRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal Health
SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal HealthAmazon Web Services
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호Amazon Web Services Korea
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSAmazon Web Services
 
LendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGateLendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGateRajit Saha
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Amazon Web Services
 
Oracle Goldengate for Big Data - LendingClub Implementation
Oracle Goldengate for Big Data - LendingClub ImplementationOracle Goldengate for Big Data - LendingClub Implementation
Oracle Goldengate for Big Data - LendingClub ImplementationVengata Guruswamy
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC OsloDavid Pilato
 
Rakuten Technology Conference 2017 A Distributed SQL Database For Data Analy...
Rakuten Technology Conference 2017 A Distributed SQL Database  For Data Analy...Rakuten Technology Conference 2017 A Distributed SQL Database  For Data Analy...
Rakuten Technology Conference 2017 A Distributed SQL Database For Data Analy...Rakuten Group, Inc.
 
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase DataWorks Summit
 
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-HealingApplying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-HealingAndreas Grabner
 
Managing your black friday logs - Code Europe
Managing your black friday logs - Code EuropeManaging your black friday logs - Code Europe
Managing your black friday logs - Code EuropeDavid Pilato
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAPEDB
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsLuke Han
 
GPSWKS401_Designing a Cloud Enterprise Data Warehouse
GPSWKS401_Designing a Cloud Enterprise Data WarehouseGPSWKS401_Designing a Cloud Enterprise Data Warehouse
GPSWKS401_Designing a Cloud Enterprise Data WarehouseAmazon Web Services
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformEva Tse
 
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward
 
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary DatabaseRedis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary DatabaseRedis Labs
 
(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014
(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014
(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014Amazon Web Services
 
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAmazon Web Services
 

Ähnlich wie Scaling Data Quality @ Netflix (20)

SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal Health
SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal HealthSRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal Health
SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal Health
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWS
 
LendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGateLendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGate
 
SQREAM DB on IBM Power9
SQREAM DB on IBM Power9SQREAM DB on IBM Power9
SQREAM DB on IBM Power9
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
 
Oracle Goldengate for Big Data - LendingClub Implementation
Oracle Goldengate for Big Data - LendingClub ImplementationOracle Goldengate for Big Data - LendingClub Implementation
Oracle Goldengate for Big Data - LendingClub Implementation
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC Oslo
 
Rakuten Technology Conference 2017 A Distributed SQL Database For Data Analy...
Rakuten Technology Conference 2017 A Distributed SQL Database  For Data Analy...Rakuten Technology Conference 2017 A Distributed SQL Database  For Data Analy...
Rakuten Technology Conference 2017 A Distributed SQL Database For Data Analy...
 
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
 
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-HealingApplying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
 
Managing your black friday logs - Code Europe
Managing your black friday logs - Code EuropeManaging your black friday logs - Code Europe
Managing your black friday logs - Code Europe
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics Products
 
GPSWKS401_Designing a Cloud Enterprise Data Warehouse
GPSWKS401_Designing a Cloud Enterprise Data WarehouseGPSWKS401_Designing a Cloud Enterprise Data Warehouse
GPSWKS401_Designing a Cloud Enterprise Data Warehouse
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data Platform
 
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
 
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary DatabaseRedis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
 
(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014
(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014
(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014
 
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
 

Mehr von Michelle Ufford

JupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
JupyterCon 2020 - Supercharging SQL Users with Jupyter NotebooksJupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
JupyterCon 2020 - Supercharging SQL Users with Jupyter NotebooksMichelle Ufford
 
Data Science at Netflix - Principles for Speed & Scale [Rev 2019 keynote]
Data Science at Netflix - Principles for Speed & Scale [Rev 2019 keynote]Data Science at Netflix - Principles for Speed & Scale [Rev 2019 keynote]
Data Science at Netflix - Principles for Speed & Scale [Rev 2019 keynote]Michelle Ufford
 
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooksNotebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooksMichelle Ufford
 
Data Engineering Efficiency @ Netflix - Strata 2017
Data Engineering Efficiency @ Netflix - Strata 2017Data Engineering Efficiency @ Netflix - Strata 2017
Data Engineering Efficiency @ Netflix - Strata 2017Michelle Ufford
 
Data Warehousing Patterns for Hadoop
Data Warehousing Patterns for HadoopData Warehousing Patterns for Hadoop
Data Warehousing Patterns for HadoopMichelle Ufford
 

Mehr von Michelle Ufford (6)

JupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
JupyterCon 2020 - Supercharging SQL Users with Jupyter NotebooksJupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
JupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
 
Data Science at Netflix - Principles for Speed & Scale [Rev 2019 keynote]
Data Science at Netflix - Principles for Speed & Scale [Rev 2019 keynote]Data Science at Netflix - Principles for Speed & Scale [Rev 2019 keynote]
Data Science at Netflix - Principles for Speed & Scale [Rev 2019 keynote]
 
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooksNotebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooks
 
Data Engineering Efficiency @ Netflix - Strata 2017
Data Engineering Efficiency @ Netflix - Strata 2017Data Engineering Efficiency @ Netflix - Strata 2017
Data Engineering Efficiency @ Netflix - Strata 2017
 
Data Warehousing Patterns for Hadoop
Data Warehousing Patterns for HadoopData Warehousing Patterns for Hadoop
Data Warehousing Patterns for Hadoop
 
Data-Driven @ Netflix
Data-Driven @ NetflixData-Driven @ Netflix
Data-Driven @ Netflix
 

Kürzlich hochgeladen

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Kürzlich hochgeladen (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Scaling Data Quality @ Netflix

Hinweis der Redaktion

  1. https://dataworkssummit.com/san-jose-2017/sessions/whoops-the-numbers-are-wrong-scaling-data-quality-netflix/ WHOOPS, THE NUMBERS ARE WRONG! SCALING DATA QUALITY @ NETFLIX Netflix is a famously data-driven company. Data is used to make informed decisions on everything from content acquisition to content delivery, and everything in-between. As with any data-driven company, it’s critical that data used by the business is accurate. Or, at worst, that the business has visibility into potential quality issues as soon as they arise. But even in the most mature data warehouses, data quality can be hard. How can we ensure high quality in a cloud-based, internet-scale, modern big data warehouse employing a variety of data engineering technologies? In this talk, Michelle Ufford will share how the Data Engineering & Analytics team at Netflix is doing exactly that. We’ll kick things off with a quick overview of Netflix’s analytics environment, then dig into details of our data quality solution. We’ll cover what worked, what didn’t work so well, and what we plan to work on next. We’ll conclude with some tips and lessons learned for ensuring data quality on big data. DETAILS This session is a (Intermediate) talk in our Data Processing and Warehousing track. It focuses on Apache Hadoop, Apache Hive, Apache Pig, Apache Spark and is geared towards Architect, Data Analyst, Developer / Engineer, Operations / IT audiences.