SlideShare a Scribd company logo
1 of 16
Wrestling Large Data Volumes to the Ground Daniel Austin Yahoo! Exceptional Performance March 24, 2011 Large-Scale Production Engineering Meetup
Agenda: A Boy and His Prototype ,[object Object],[object Object],[object Object]
Project X: Hi Performance, Low Budget ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Business Intelligence in Three Acts Data Collection Data Storage Data Analysis (out of scope)
Data Collection: Tools – Gomez Last Mile ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Collection: Tools – Talend Open Studio ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Storage: Tools – Infobright MySQL Engine ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Intro to Data Products ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How to design ETL chains? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Diagram: ETL for Data Validation  2. Semantic Validation Step 1. Syntactic Validation Step
Simple 3NF Level 1 Data Model for HTTP ,[object Object],[object Object],[object Object],[object Object]
Level 1: The Boss Battle!
Some Best Practices ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Project X: What We Learned ,[object Object],[object Object],[object Object],[object Object],[object Object]
Endgame! Analysis in Near Real-Time
Thank You! Daniel Austin Yahoo! Exceptional Performance @daniel_b_austin [email_address] March 24, 2011 Large-Scale Production Engineering Meetup

More Related Content

What's hot

Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)TAUS - The Language Data Network
 
Matlab-Assignment-Projects
Matlab-Assignment-ProjectsMatlab-Assignment-Projects
Matlab-Assignment-ProjectsPhdtopiccom
 
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016MLconf
 
Space Codesign at TandemLaunch 20150414
Space Codesign at TandemLaunch 20150414Space Codesign at TandemLaunch 20150414
Space Codesign at TandemLaunch 20150414Space Codesign
 
Reproducibility with Unstructured Data in 3 steps
Reproducibility with Unstructured Data in 3 stepsReproducibility with Unstructured Data in 3 steps
Reproducibility with Unstructured Data in 3 stepsGleb Mezhanskiy
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsBigML, Inc
 
Matlab-Assignment-Help-India
Matlab-Assignment-Help-IndiaMatlab-Assignment-Help-India
Matlab-Assignment-Help-IndiaPhdtopiccom
 
Matlab-Master-Thesis-Projects
Matlab-Master-Thesis-ProjectsMatlab-Master-Thesis-Projects
Matlab-Master-Thesis-ProjectsPhdtopiccom
 

What's hot (8)

Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
 
Matlab-Assignment-Projects
Matlab-Assignment-ProjectsMatlab-Assignment-Projects
Matlab-Assignment-Projects
 
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
 
Space Codesign at TandemLaunch 20150414
Space Codesign at TandemLaunch 20150414Space Codesign at TandemLaunch 20150414
Space Codesign at TandemLaunch 20150414
 
Reproducibility with Unstructured Data in 3 steps
Reproducibility with Unstructured Data in 3 stepsReproducibility with Unstructured Data in 3 steps
Reproducibility with Unstructured Data in 3 steps
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 Sessions
 
Matlab-Assignment-Help-India
Matlab-Assignment-Help-IndiaMatlab-Assignment-Help-India
Matlab-Assignment-Help-India
 
Matlab-Master-Thesis-Projects
Matlab-Master-Thesis-ProjectsMatlab-Master-Thesis-Projects
Matlab-Master-Thesis-Projects
 

Viewers also liked

Webnews The assistance and salvage tugs draw crowds
Webnews The assistance and salvage tugs draw crowdsWebnews The assistance and salvage tugs draw crowds
Webnews The assistance and salvage tugs draw crowdsBOURBON
 
Loi C-28 Anti-pourriel : Comment se préparer ?
Loi C-28 Anti-pourriel : Comment se préparer ?Loi C-28 Anti-pourriel : Comment se préparer ?
Loi C-28 Anti-pourriel : Comment se préparer ?Cyberimpact
 
I Will!
I Will!I Will!
I Will!pbarto
 
Présentation de la conférence Horizon 2012
Présentation de la conférence Horizon 2012Présentation de la conférence Horizon 2012
Présentation de la conférence Horizon 2012BOURBON
 

Viewers also liked (6)

Big Foot Conferenece. June 5. Opening Slideshow. The Carpathians_Harald Egerer
Big Foot Conferenece. June 5. Opening Slideshow. The Carpathians_Harald EgererBig Foot Conferenece. June 5. Opening Slideshow. The Carpathians_Harald Egerer
Big Foot Conferenece. June 5. Opening Slideshow. The Carpathians_Harald Egerer
 
Webnews The assistance and salvage tugs draw crowds
Webnews The assistance and salvage tugs draw crowdsWebnews The assistance and salvage tugs draw crowds
Webnews The assistance and salvage tugs draw crowds
 
Loi C-28 Anti-pourriel : Comment se préparer ?
Loi C-28 Anti-pourriel : Comment se préparer ?Loi C-28 Anti-pourriel : Comment se préparer ?
Loi C-28 Anti-pourriel : Comment se préparer ?
 
I Will!
I Will!I Will!
I Will!
 
Intergenerational Leaning Handbook_Thomas Fischer_MENON
Intergenerational Leaning Handbook_Thomas Fischer_MENONIntergenerational Leaning Handbook_Thomas Fischer_MENON
Intergenerational Leaning Handbook_Thomas Fischer_MENON
 
Présentation de la conférence Horizon 2012
Présentation de la conférence Horizon 2012Présentation de la conférence Horizon 2012
Présentation de la conférence Horizon 2012
 

Similar to Wrestling Large Data Volumes to the Ground

Etl with apache impala by athemaster
Etl with apache impala by athemasterEtl with apache impala by athemaster
Etl with apache impala by athemasterAthemaster Co., Ltd.
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programsgreenwop
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caringsporst
 
Building a Testable Data Access Layer
Building a Testable Data Access LayerBuilding a Testable Data Access Layer
Building a Testable Data Access LayerTodd Anglin
 
Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Gavin Lin
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010ivan provalov
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...Yann Cluchey
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftJie Li
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
ConFoo 2017: Introduction to performance optimization of .NET web apps
ConFoo 2017: Introduction to performance optimization of .NET web appsConFoo 2017: Introduction to performance optimization of .NET web apps
ConFoo 2017: Introduction to performance optimization of .NET web appsPierre-Luc Maheu
 
Productionalizing ML : Real Experience
Productionalizing ML : Real ExperienceProductionalizing ML : Real Experience
Productionalizing ML : Real ExperienceIhor Bobak
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
 
Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems ...
Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems ...Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems ...
Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems ...Hirofumi Iwasaki
 
Natural Laws of Software Performance
Natural Laws of Software PerformanceNatural Laws of Software Performance
Natural Laws of Software PerformanceGibraltar Software
 
PHP Performance: Principles and tools
PHP Performance: Principles and toolsPHP Performance: Principles and tools
PHP Performance: Principles and tools10n Software, LLC
 
[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_DataNAVER D2
 
Why Wordnik went non-relational
Why Wordnik went non-relationalWhy Wordnik went non-relational
Why Wordnik went non-relationalTony Tam
 

Similar to Wrestling Large Data Volumes to the Ground (20)

Etl with apache impala by athemaster
Etl with apache impala by athemasterEtl with apache impala by athemaster
Etl with apache impala by athemaster
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programs
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caring
 
Building a Testable Data Access Layer
Building a Testable Data Access LayerBuilding a Testable Data Access Layer
Building a Testable Data Access Layer
 
Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
ConFoo 2017: Introduction to performance optimization of .NET web apps
ConFoo 2017: Introduction to performance optimization of .NET web appsConFoo 2017: Introduction to performance optimization of .NET web apps
ConFoo 2017: Introduction to performance optimization of .NET web apps
 
Productionalizing ML : Real Experience
Productionalizing ML : Real ExperienceProductionalizing ML : Real Experience
Productionalizing ML : Real Experience
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems ...
Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems ...Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems ...
Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems ...
 
Natural Laws of Software Performance
Natural Laws of Software PerformanceNatural Laws of Software Performance
Natural Laws of Software Performance
 
PHP Performance: Principles and tools
PHP Performance: Principles and toolsPHP Performance: Principles and tools
PHP Performance: Principles and tools
 
[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data
 
Why Wordnik went non-relational
Why Wordnik went non-relationalWhy Wordnik went non-relational
Why Wordnik went non-relational
 

More from Daniel Austin

Next generation web protocols
Next generation web protocolsNext generation web protocols
Next generation web protocolsDaniel Austin
 
Always Offline: Delay-Tolerant Networking for the Internet of Things
Always Offline: Delay-Tolerant Networking for the Internet of ThingsAlways Offline: Delay-Tolerant Networking for the Internet of Things
Always Offline: Delay-Tolerant Networking for the Internet of ThingsDaniel Austin
 
Performance: How Fast is Fast Enough?
Performance: How Fast is Fast Enough?Performance: How Fast is Fast Enough?
Performance: How Fast is Fast Enough?Daniel Austin
 
Big Data and the Future of Money 2014
Big Data and the Future of Money 2014Big Data and the Future of Money 2014
Big Data and the Future of Money 2014Daniel Austin
 
Big data comes in small packages v1.2
Big data comes in small packages v1.2Big data comes in small packages v1.2
Big data comes in small packages v1.2Daniel Austin
 
Designing Delay-tolerant Data Services for the Network of Things
Designing Delay-tolerant Data Services for the Network of ThingsDesigning Delay-tolerant Data Services for the Network of Things
Designing Delay-tolerant Data Services for the Network of ThingsDaniel Austin
 
Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Daniel Austin
 
HTML5, HTTP2, and You 1.1
HTML5, HTTP2, and You 1.1HTML5, HTTP2, and You 1.1
HTML5, HTTP2, and You 1.1Daniel Austin
 
Managing Performance Globally with MySQL
Managing Performance Globally with MySQLManaging Performance Globally with MySQL
Managing Performance Globally with MySQLDaniel Austin
 
Web Performance BootCamp 2013
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013Daniel Austin
 
Perspectives on the Evolution of HTML
Perspectives on the Evolution of HTMLPerspectives on the Evolution of HTML
Perspectives on the Evolution of HTMLDaniel Austin
 
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...Daniel Austin
 
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...Daniel Austin
 
Reconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data SystemReconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data SystemDaniel Austin
 
Big data and the Future of Money (World Big Data Congress 2013)
Big data and the Future of Money (World Big Data Congress 2013)Big data and the Future of Money (World Big Data Congress 2013)
Big data and the Future of Money (World Big Data Congress 2013)Daniel Austin
 
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)Daniel Austin
 
Performance analysisclass
Performance analysisclassPerformance analysisclass
Performance analysisclassDaniel Austin
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydbDaniel Austin
 
The Fastest Possible Search Algorithm
The Fastest Possible Search AlgorithmThe Fastest Possible Search Algorithm
The Fastest Possible Search AlgorithmDaniel Austin
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLDaniel Austin
 

More from Daniel Austin (20)

Next generation web protocols
Next generation web protocolsNext generation web protocols
Next generation web protocols
 
Always Offline: Delay-Tolerant Networking for the Internet of Things
Always Offline: Delay-Tolerant Networking for the Internet of ThingsAlways Offline: Delay-Tolerant Networking for the Internet of Things
Always Offline: Delay-Tolerant Networking for the Internet of Things
 
Performance: How Fast is Fast Enough?
Performance: How Fast is Fast Enough?Performance: How Fast is Fast Enough?
Performance: How Fast is Fast Enough?
 
Big Data and the Future of Money 2014
Big Data and the Future of Money 2014Big Data and the Future of Money 2014
Big Data and the Future of Money 2014
 
Big data comes in small packages v1.2
Big data comes in small packages v1.2Big data comes in small packages v1.2
Big data comes in small packages v1.2
 
Designing Delay-tolerant Data Services for the Network of Things
Designing Delay-tolerant Data Services for the Network of ThingsDesigning Delay-tolerant Data Services for the Network of Things
Designing Delay-tolerant Data Services for the Network of Things
 
Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014
 
HTML5, HTTP2, and You 1.1
HTML5, HTTP2, and You 1.1HTML5, HTTP2, and You 1.1
HTML5, HTTP2, and You 1.1
 
Managing Performance Globally with MySQL
Managing Performance Globally with MySQLManaging Performance Globally with MySQL
Managing Performance Globally with MySQL
 
Web Performance BootCamp 2013
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013
 
Perspectives on the Evolution of HTML
Perspectives on the Evolution of HTMLPerspectives on the Evolution of HTML
Perspectives on the Evolution of HTML
 
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
 
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
 
Reconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data SystemReconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data System
 
Big data and the Future of Money (World Big Data Congress 2013)
Big data and the Future of Money (World Big Data Congress 2013)Big data and the Future of Money (World Big Data Congress 2013)
Big data and the Future of Money (World Big Data Congress 2013)
 
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
 
Performance analysisclass
Performance analysisclassPerformance analysisclass
Performance analysisclass
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
The Fastest Possible Search Algorithm
The Fastest Possible Search AlgorithmThe Fastest Possible Search Algorithm
The Fastest Possible Search Algorithm
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQL
 

Recently uploaded

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 

Recently uploaded (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 

Wrestling Large Data Volumes to the Ground

  • 1. Wrestling Large Data Volumes to the Ground Daniel Austin Yahoo! Exceptional Performance March 24, 2011 Large-Scale Production Engineering Meetup
  • 2.
  • 3.
  • 4. Business Intelligence in Three Acts Data Collection Data Storage Data Analysis (out of scope)
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10. Diagram: ETL for Data Validation 2. Semantic Validation Step 1. Syntactic Validation Step
  • 11.
  • 12. Level 1: The Boss Battle!
  • 13.
  • 14.
  • 15. Endgame! Analysis in Near Real-Time
  • 16. Thank You! Daniel Austin Yahoo! Exceptional Performance @daniel_b_austin [email_address] March 24, 2011 Large-Scale Production Engineering Meetup

Editor's Notes

  1. One other thing I forgot to mention: no budget
  2. Some very basic architectural building blocks
  3. Heresy! Normalized Dates and Times!