SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
Adding tree and tree 
@avibryant
Brushfire:! 
Distributed, 
Generic, 
Decision Tree Learning 
in Scala 
(using Hadoop) 
@avibryant 
Open source: Real Soon Now
Vun!
Two! 
+
Tree!
Do you like cookies? 
{height: 5, color: blue, wears: fur} ? 
{height: 7, color: yellow, wears: feathers} ? 
{height: 3, color: green, wears: garbage} ? 
{height: 5, color: yellow, wears: stripes} ? 
{height: 4, color: orange, wears: stripes} ?
Do you like cookies? 
color != blue color = blue
Does Cookie Monster like Cookies? 
color != blue color = blue
Is Cookie Monster Blue? 
color != blue color = blue
Cooooookie! 
color != blue color = blue 
cookie!
Do you like cookies? 
color != blue color = blue 
yuck ok 
cookie! 
wears != stripes 
wears = stripes
color != blue color = blue 
T T 
T 
wears != stripes 
wears = stripes
color != blue color = blue 
T T 
T 
wears != stripes 
wears = stripes 
Do you like cookies? 
How many cookies will you eat? 
What’s your favorite kind of cookie?
Bootstrap or k-fold? 
Chi-square or entropy? 
Wow! 
Classification or regression? 
Binary splits or multiway? 
Out-of-bag 
or out-of-time? 
One tree or 
many? 
Binary or multi-class?
trait Evaluator[V,T] 
trait Tree[V,T] 
trait Splitter[V,T] 
trait Error[T,E] 
Wow! 
Such types! 
case class Instance[V,T]
false true 
false 
true 
Binary classification
0.1 0.4 
0.0 
0.9 
Binary classification
T+T+T+T= 
T T 
T 
T 
T+T+T+T+T= 
T+T+T+T+T= T+T+T=
Binary classification
Bigger (data) 
= Better (models) 
Generic != Fast 
“Why do you rob banks?”
Learning a tree in Scalding 
11 passes through the data 
21 MapReduce steps
T 
T
T T T T
T T 
T T 
T T T T
Step 1/21 
T
{height: 5, color: blue, wears: fur} 
{height: 7, color: yellow, wears: feathers} 
{height: 3, color: green, wears: garbage} 
{height: 5, color: yellow, wears: stripes} 
{height: 4, color: orange, wears: stripes}
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
T 
T 
T 
T 
T 
T 
T 
T 
T 
T 
Map 
T 
Reduce
color 
!= blue = blue 
T T 
color 
!= yellow = yellow 
T T 
height 
< 5 >= 5 
T T 
? 
Step 2/21
color 
!= blue = blue 
T T 
color 
Step 2/21 
!= yellow = yellow 
T T 
?
blue 
yellow 
green 
yellow 
orange
blue 
yellow 
green 
yellow 
orange
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
S 
S 
S 
S 
S 
S 
S 
S 
Map Reduce 
Step 2/21 
S 
S 
Other options: 
CountMinSketch 
QTree 
…
V => Boolean V => Boolean 
T T
V => Boolean V => Boolean 
T T 
T 
V => Boolean
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
S 
S 
S 
S 
S 
S 
S 
S 
Step 3/21 
S 
S 
S Split[V,T] Split[V,T] 
Split[V,T] 
Split[V,T]
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
S 
S 
S 
S 
S 
S 
S 
S 
S 
Step 3/21 
S 
S 
S Split[V,T] Split[V,T] 
Split[V,T] 
Split[V,T] 
S 
S 
S 
S 
S 
S
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
S 
S 
S 
S 
S 
S 
S 
S 
S 
Step 3/21 
S 
S 
S Split[V,T] Split[V,T] 
Split[V,T] 
Split[V,T] 
S 
S 
S 
S 
S 
S 
S 
S 
S 
S Split[V,T] 
Split[V,T] 
Split[V,T]
Instance[V,T] 
Instance[V,T] 
Instance[V,T]
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
… 
Forests!
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
…
V? 
{height: 5, color: blue, wears: fur} ? 
{height: 7, color: yellow, wears: feathers} ? 
{height: 3, color: green, wears: garbage} ? 
{height: 5, color: yellow, wears: stripes} ? 
{height: 4, color: orange, wears: stripes} ?
PLANET 
http://static.googleusercontent.com/media/ 
research.google.com/en/us/pubs/archive/36296.pdf 
Scalding + Algebird 
http://github.com/twitter/scalding 
http://github.com/twitter/algebird 
Coming soon 
http://github.com/stripe/brushfire

Weitere ähnliche Inhalte

Ähnlich wie Adding Tree and Tree

lecture 17
lecture 17lecture 17
lecture 17sajinsc
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.pptbutest
 
Introducing R
Introducing RIntroducing R
Introducing Rnzfauna
 
Box Plots and Histograms
Box Plots and HistogramsBox Plots and Histograms
Box Plots and HistogramsRenegarmath
 
Python for High School Programmers
Python for High School ProgrammersPython for High School Programmers
Python for High School ProgrammersSiva Arunachalam
 
A quick introduction to R
A quick introduction to RA quick introduction to R
A quick introduction to RAngshuman Saha
 
Naive Bayes.pptx
Naive Bayes.pptxNaive Bayes.pptx
Naive Bayes.pptxSobanSquad1
 
Introduction to Graph Theory
Introduction to Graph TheoryIntroduction to Graph Theory
Introduction to Graph TheoryYosuke Mizutani
 
Managing Data: storage, decisions and classification
Managing Data: storage, decisions and classificationManaging Data: storage, decisions and classification
Managing Data: storage, decisions and classificationEdward Blurock
 

Ähnlich wie Adding Tree and Tree (10)

lecture 17
lecture 17lecture 17
lecture 17
 
R learning by examples
R learning by examplesR learning by examples
R learning by examples
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.ppt
 
Introducing R
Introducing RIntroducing R
Introducing R
 
Box Plots and Histograms
Box Plots and HistogramsBox Plots and Histograms
Box Plots and Histograms
 
Python for High School Programmers
Python for High School ProgrammersPython for High School Programmers
Python for High School Programmers
 
A quick introduction to R
A quick introduction to RA quick introduction to R
A quick introduction to R
 
Naive Bayes.pptx
Naive Bayes.pptxNaive Bayes.pptx
Naive Bayes.pptx
 
Introduction to Graph Theory
Introduction to Graph TheoryIntroduction to Graph Theory
Introduction to Graph Theory
 
Managing Data: storage, decisions and classification
Managing Data: storage, decisions and classificationManaging Data: storage, decisions and classification
Managing Data: storage, decisions and classification
 

Kürzlich hochgeladen

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Kürzlich hochgeladen (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Adding Tree and Tree