SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Tech Talk: TPOT
“The Data Science Assisstant”
Francis Nguyen
Hoffman Lab
July, 2017
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
TPOT logo from official documentation @ http://rhiever.github.io/tpot/
Introduction: What is TPOT?
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
TPOT logo from official documentation @ http://rhiever.github.io/tpot/
Introduction: What is TPOT?
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Image adapted from official TPOT documentation @ http://rhiever.github.io/tpot/
Introduction: How TPOT works
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Image adapted from official TPOT documentation @ http://rhiever.github.io/tpot/
Introduction: How TPOT works
Automated by
scikit-learn
Manual Steps Manual Step
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Image adapted from official TPOT documentation @ http://rhiever.github.io/tpot/
Introduction: How TPOT works
Manual Steps Manual Step
Automated by
scikit-learn
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Introduction: Scikit built-ins
Exhaustive Grid Search Randomized Parameter Optimization
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Introduction: Scikit built-ins
Exhaustive Grid Search Randomized Parameter Optimization
Both methods...
● ...help find optimal hyperparameters for a given model
● ...are very easy to use
● ...are easily parallelizable
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Introduction: Scikit built-ins
Exhaustive Grid Search
Parameter search
Classifier training
Randomized Parameter Optimization
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Introduction: Scikit built-ins
Exhaustive Grid Search
Kernel Error
Penalty (C)
linear 1
linear 10
linear 100
rbf 1
rbf 10
Randomized Parameter Optimization
Can be very slow!
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Introduction: Scikit built-ins
● Used when exhaustive grid searches are too computationally intensive
● Random sampling means that adding more parameters doesn’t reduce
performance per se
Exhaustive Grid Search Randomized Parameter Optimization
Screenshots of official scikit-learn documentation @ http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Introduction: TPOT
Screenshots of official scikit-learn documentation @ http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
Exhaustive Grid Search Randomized Parameter
Optimization
Tree-based Pipeline
OpTimization
Speed Very slow
Scalable to project
constraints
Scalable to project
constraints
Breadth
Searches all possible
solutions
Randomly selects
solutions
Approaches best
solution via genetic
programming
Steps
Required
Data cleanup; model and
hyperparameter choice
Data cleanup; model and
hyperparameter choice
Data cleanup
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Introduction: TPOT
Screenshots of official scikit-learn documentation @ http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Figure from Olson et al., EvoApplications (2016) pp123-137
Introduction: How TPOT works
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Figure from Olson et al., EvoApplications (2016) pp123-137
Introduction: How TPOT works
Feature Selection
Or Construction
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Figure from Olson et al., EvoApplications (2016) pp123-137
Introduction: How TPOT works
Combination
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Figure from Olson et al., EvoApplications (2016) pp123-137
Introduction: How TPOT works
Classification
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Figure from Olson et al., EvoApplications (2016) pp123-137
Introduction: How TPOT works
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Figure from Olson et al., EvoApplications (2016) pp123-137
Introduction: How TPOT works
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Image from official TPOT documentation @ http://rhiever.github.io/tpot/
Introduction: Genetic Programming
Step 1: Create population_size (default 100) random classification algorithms
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Image from official TPOT documentation @ http://rhiever.github.io/tpot/
Introduction: Genetic Programming
Step 1: Create population_size (default 100) random classification algorithms
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Introduction: Genetic Programming
Step 2: Evaluate their performance on the metric specified by scoring (default:
“accuracy”, but can do “f1”, “recall” etc.)
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Step 2: Evaluate their performance on the metric specified by scoring (default:
“accuracy”, but can do “f1”, “recall” etc.)
Step 3: Create new population out of:
● 10% copies of the best performing algorithm
● 90% based on “three-way tournaments” on the rest of the data
○ Accuracy and simplicity are optimized for here
Image from official TPOT documentation @ http://rhiever.github.io/tpot/
Introduction: Genetic Programming
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Introduction: Genetic Programming
Step 2: Evaluate their performance on the metric specified by scoring (default:
“accuracy”, but can do “f1”, “recall” etc.)
Step 3: Create new population out of:
● 10% copies of the best performing algorithm
● 90% based on “three-way tournaments” on the rest of the data
○ Accuracy and simplicity are optimized for here
Step 4: Mutate pipelines according to mutation_rate and crossover_rate:
● Similarly to mutations in DNA, pipeline operators may be replaced, inserted, or
deleted according to the mutation_rate parameter
Image from official TPOT documentation @ http://rhiever.github.io/tpot/
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Introduction: Genetic Programming
Step 2: Evaluate their performance on the metric specified by scoring (default:
“accuracy”, but can do “f1”, “recall” etc.)
Step 3: Create new population out of:
● 10% copies of the best performing algorithm
● 90% based on “three-way tournaments” on the rest of the data
○ Accuracy and simplicity are optimized for here
Step 4: Mutate pipelines according to mutation_rate and crossover_rate:
● Similarly to mutations in DNA, pipeline operators may be replaced, inserted, or
deleted according to the mutation_rate parameter
● Crossover mutations, where parts of one pipeline are cut-and-pasted into another
pipeline, can be controlled via the crossover_rate parameter
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Introduction: Genetic Programming
Step 2: Evaluate their performance on the metric specified by scoring (default:
“accuracy”, but can do “f1”, “recall” etc.)
Step 3: Create new population out of:
● 10% copies of the best performing algorithm
● 90% based on “three-way tournaments” on the rest of the data
○ Accuracy and simplicity are optimized for here
Step 4: Mutate pipelines according to mutation_rate and crossover_rate:
● Similarly to mutations in DNA, pipeline operators may be replaced, inserted, or
deleted according to the mutation_rate parameter
● Crossover mutations, where parts of one pipeline are cut-and-pasted into another
pipeline, can be controlled via the crossover_rate parameter
Step 5: Repeat steps 2-4 n times (where n is controlled via the generations parameter)
● Subsequent generations will only be offspring_size large
● In total, TPOT evaluates population_size + generations *
offspring_size pipelines
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Usage: installation
Requires:
● numpy, scipy, scikit-learn (via pip or conda)
● deap, update_checker, tqdm (via pip)
● Py-xgboost (via pip) (Optional) (Warning: crashes on download.q, ill-behaved)
● Tpot (via pip)
Will install a command-line utility along with the python library
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Usage: two types of problems
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Image from official TPOT documentation @ http://rhiever.github.io/tpot/
Usage: Python example
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Image from official TPOT documentation @ http://rhiever.github.io/tpot/
Usage: Python example
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Image from official TPOT documentation @ http://rhiever.github.io/tpot/
Usage: Python example
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Image from official TPOT documentation @ http://rhiever.github.io/tpot/
Usage: Python example
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Image from official TPOT documentation @ http://rhiever.github.io/tpot/
Usage: Command-line
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Image from official TPOT documentation @ http://rhiever.github.io/tpot/
Usage: Command-line
Things to note with the command-line interface:
● -is should be specified
● The input file should have column names; -target should be the
classification column name
● -njobs is meant to be used within a parallel environment:
○ When using qlogin, qsub, or qrsh, use -pe smp <n> to reserve
<n> cores on your target machine
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Limitations:
● Only finds solutions with scikit-learn
● Only works on supervised classification/regression problems
● Long run times - documentation recommends running it for days or longer for best results
● Strangely difficult (but possible) to install on the cluster - has many dependencies which
must be installed in order, one of which will run into memory issues on download.q
● Gives no insight on why particular model/hyperparameters were chosen
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Conclusions:
Questions
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Image from official TPOT documentation @ http://rhiever.github.io/tpot/
Introduction: How TPOT works
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Image from official TPOT documentation @ http://rhiever.github.io/tpot/
Introduction: Genetic Programming
Three-way tournament:
Given three random pipelines from the existing population...
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Image from official TPOT documentation @ http://rhiever.github.io/tpot/
Introduction: Genetic Programming
Three-way tournament:
Given three random pipelines from the existing population…
...remove the worst performing one...
TPOT
1. Introduction
2. Usage
3. Limitations
4. Conclusions
Image from official TPOT documentation @ http://rhiever.github.io/tpot/
Introduction: Genetic Programming
Three-way tournament:
Given three random pipelines from the existing population…
...remove the worst performing one…
...then remove the most complex of the two

Weitere ähnliche Inhalte

Was ist angesagt?

C++ Unit Test with Google Testing Framework
C++ Unit Test with Google Testing FrameworkC++ Unit Test with Google Testing Framework
C++ Unit Test with Google Testing FrameworkHumberto Marchezi
 
Memories of Bug Fixes
Memories of Bug FixesMemories of Bug Fixes
Memories of Bug FixesSung Kim
 
PyPy's approach to construct domain-specific language runtime
PyPy's approach to construct domain-specific language runtimePyPy's approach to construct domain-specific language runtime
PyPy's approach to construct domain-specific language runtimeNational Cheng Kung University
 
Testing Django Applications
Testing Django ApplicationsTesting Django Applications
Testing Django ApplicationsGareth Rushgrove
 
Java Library Evolution Puzzlers
Java Library Evolution PuzzlersJava Library Evolution Puzzlers
Java Library Evolution PuzzlersJens Dietrich
 
Automated testing in Python and beyond
Automated testing in Python and beyondAutomated testing in Python and beyond
Automated testing in Python and beyonddn
 
20111018 boost and gtest
20111018 boost and gtest20111018 boost and gtest
20111018 boost and gtestWill Shen
 
Unit Testing RPG with JUnit
Unit Testing RPG with JUnitUnit Testing RPG with JUnit
Unit Testing RPG with JUnitGreg.Helton
 
Extending Python - FOSDEM 2015
Extending Python - FOSDEM 2015Extending Python - FOSDEM 2015
Extending Python - FOSDEM 2015fcofdezc
 
MNE group analysis presentation @ Biomag 2016 conf.
MNE group analysis presentation @ Biomag 2016 conf.MNE group analysis presentation @ Biomag 2016 conf.
MNE group analysis presentation @ Biomag 2016 conf.agramfort
 

Was ist angesagt? (12)

C++ Unit Test with Google Testing Framework
C++ Unit Test with Google Testing FrameworkC++ Unit Test with Google Testing Framework
C++ Unit Test with Google Testing Framework
 
Introduzione al TDD
Introduzione al TDDIntroduzione al TDD
Introduzione al TDD
 
Memories of Bug Fixes
Memories of Bug FixesMemories of Bug Fixes
Memories of Bug Fixes
 
PyPy's approach to construct domain-specific language runtime
PyPy's approach to construct domain-specific language runtimePyPy's approach to construct domain-specific language runtime
PyPy's approach to construct domain-specific language runtime
 
Testing Django Applications
Testing Django ApplicationsTesting Django Applications
Testing Django Applications
 
Java Library Evolution Puzzlers
Java Library Evolution PuzzlersJava Library Evolution Puzzlers
Java Library Evolution Puzzlers
 
Automated testing in Python and beyond
Automated testing in Python and beyondAutomated testing in Python and beyond
Automated testing in Python and beyond
 
20111018 boost and gtest
20111018 boost and gtest20111018 boost and gtest
20111018 boost and gtest
 
Unit Testing RPG with JUnit
Unit Testing RPG with JUnitUnit Testing RPG with JUnit
Unit Testing RPG with JUnit
 
Extending Python - FOSDEM 2015
Extending Python - FOSDEM 2015Extending Python - FOSDEM 2015
Extending Python - FOSDEM 2015
 
JUnit Pioneer
JUnit PioneerJUnit Pioneer
JUnit Pioneer
 
MNE group analysis presentation @ Biomag 2016 conf.
MNE group analysis presentation @ Biomag 2016 conf.MNE group analysis presentation @ Biomag 2016 conf.
MNE group analysis presentation @ Biomag 2016 conf.
 

Ähnlich wie TPOT: The data science assistant

PIDtuningsoftwareApracticalreview.pdf
PIDtuningsoftwareApracticalreview.pdfPIDtuningsoftwareApracticalreview.pdf
PIDtuningsoftwareApracticalreview.pdfAbdulSalamSagir1
 
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerFaster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerPyData
 
Enjoying the Journey from Puppet 3.x to Puppet 4.x (PuppetConf 2016)
Enjoying the Journey from Puppet 3.x to Puppet 4.x (PuppetConf 2016)Enjoying the Journey from Puppet 3.x to Puppet 4.x (PuppetConf 2016)
Enjoying the Journey from Puppet 3.x to Puppet 4.x (PuppetConf 2016)Robert Nelson
 
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...Edureka!
 
Monitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntopMonitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntopPyCon Italia
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSTulipp. Eu
 
python interview prep question , 52 questions
python interview prep question , 52 questionspython interview prep question , 52 questions
python interview prep question , 52 questionsgokul174578
 
BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)
BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)
BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)Linaro
 
Python Evolution
Python EvolutionPython Evolution
Python EvolutionQuintagroup
 
Ecet 380 Success Begins / snaptutorial.com
Ecet 380  Success Begins / snaptutorial.comEcet 380  Success Begins / snaptutorial.com
Ecet 380 Success Begins / snaptutorial.comWilliamsTaylorzl
 
PuppetConf 2016: Enjoying the Journey from Puppet 3.x to 4.x – Rob Nelson, AT&T
PuppetConf 2016: Enjoying the Journey from Puppet 3.x to 4.x – Rob Nelson, AT&T PuppetConf 2016: Enjoying the Journey from Puppet 3.x to 4.x – Rob Nelson, AT&T
PuppetConf 2016: Enjoying the Journey from Puppet 3.x to 4.x – Rob Nelson, AT&T Puppet
 
L2 - Install Python.pptx
L2 - Install Python.pptxL2 - Install Python.pptx
L2 - Install Python.pptxEloAOgardo
 
Where's the source, Luke? : How to find and debug the code behind Plone
Where's the source, Luke? : How to find and debug the code behind PloneWhere's the source, Luke? : How to find and debug the code behind Plone
Where's the source, Luke? : How to find and debug the code behind PloneVincenzo Barone
 
Write unit test from scratch
Write unit test from scratchWrite unit test from scratch
Write unit test from scratchWen-Shih Chao
 
Real time trend and failure analysis using TTA-Anand Bagmar & Aasawaree Deshmukh
Real time trend and failure analysis using TTA-Anand Bagmar & Aasawaree DeshmukhReal time trend and failure analysis using TTA-Anand Bagmar & Aasawaree Deshmukh
Real time trend and failure analysis using TTA-Anand Bagmar & Aasawaree Deshmukhbhumika2108
 
EuroPython 2013 - Python3 TurboGears Training
EuroPython 2013 - Python3 TurboGears TrainingEuroPython 2013 - Python3 TurboGears Training
EuroPython 2013 - Python3 TurboGears TrainingAlessandro Molina
 

Ähnlich wie TPOT: The data science assistant (20)

PIDtuningsoftwareApracticalreview.pdf
PIDtuningsoftwareApracticalreview.pdfPIDtuningsoftwareApracticalreview.pdf
PIDtuningsoftwareApracticalreview.pdf
 
Python made easy
Python made easy Python made easy
Python made easy
 
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerFaster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
 
Enjoying the Journey from Puppet 3.x to Puppet 4.x (PuppetConf 2016)
Enjoying the Journey from Puppet 3.x to Puppet 4.x (PuppetConf 2016)Enjoying the Journey from Puppet 3.x to Puppet 4.x (PuppetConf 2016)
Enjoying the Journey from Puppet 3.x to Puppet 4.x (PuppetConf 2016)
 
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
 
Monitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntopMonitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntop
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
 
Python Orientation
Python OrientationPython Orientation
Python Orientation
 
python interview prep question , 52 questions
python interview prep question , 52 questionspython interview prep question , 52 questions
python interview prep question , 52 questions
 
BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)
BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)
BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)
 
Python Evolution
Python EvolutionPython Evolution
Python Evolution
 
Istqb ctal tm
Istqb ctal tmIstqb ctal tm
Istqb ctal tm
 
Ecet 380 Success Begins / snaptutorial.com
Ecet 380  Success Begins / snaptutorial.comEcet 380  Success Begins / snaptutorial.com
Ecet 380 Success Begins / snaptutorial.com
 
PuppetConf 2016: Enjoying the Journey from Puppet 3.x to 4.x – Rob Nelson, AT&T
PuppetConf 2016: Enjoying the Journey from Puppet 3.x to 4.x – Rob Nelson, AT&T PuppetConf 2016: Enjoying the Journey from Puppet 3.x to 4.x – Rob Nelson, AT&T
PuppetConf 2016: Enjoying the Journey from Puppet 3.x to 4.x – Rob Nelson, AT&T
 
Understanding linport
Understanding linportUnderstanding linport
Understanding linport
 
L2 - Install Python.pptx
L2 - Install Python.pptxL2 - Install Python.pptx
L2 - Install Python.pptx
 
Where's the source, Luke? : How to find and debug the code behind Plone
Where's the source, Luke? : How to find and debug the code behind PloneWhere's the source, Luke? : How to find and debug the code behind Plone
Where's the source, Luke? : How to find and debug the code behind Plone
 
Write unit test from scratch
Write unit test from scratchWrite unit test from scratch
Write unit test from scratch
 
Real time trend and failure analysis using TTA-Anand Bagmar & Aasawaree Deshmukh
Real time trend and failure analysis using TTA-Anand Bagmar & Aasawaree DeshmukhReal time trend and failure analysis using TTA-Anand Bagmar & Aasawaree Deshmukh
Real time trend and failure analysis using TTA-Anand Bagmar & Aasawaree Deshmukh
 
EuroPython 2013 - Python3 TurboGears Training
EuroPython 2013 - Python3 TurboGears TrainingEuroPython 2013 - Python3 TurboGears Training
EuroPython 2013 - Python3 TurboGears Training
 

Mehr von Hoffman Lab

GNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkGNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkHoffman Lab
 
Efficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with ggetEfficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with ggetHoffman Lab
 
WashU Epigenome Browser
WashU Epigenome BrowserWashU Epigenome Browser
WashU Epigenome BrowserHoffman Lab
 
Wireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelWireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelHoffman Lab
 
Plotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornPlotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornHoffman Lab
 
Go Get Data (GGD)
Go Get Data (GGD)Go Get Data (GGD)
Go Get Data (GGD)Hoffman Lab
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorHoffman Lab
 
R markdown and Rmdformats
R markdown and RmdformatsR markdown and Rmdformats
R markdown and RmdformatsHoffman Lab
 
File searching tools
File searching toolsFile searching tools
File searching toolsHoffman Lab
 
Better BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroBetter BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroHoffman Lab
 
Awk primer and Bioawk
Awk primer and BioawkAwk primer and Bioawk
Awk primer and BioawkHoffman Lab
 
Terminals and Shells
Terminals and ShellsTerminals and Shells
Terminals and ShellsHoffman Lab
 
BioRender & Glossary/Acronym
BioRender & Glossary/AcronymBioRender & Glossary/Acronym
BioRender & Glossary/AcronymHoffman Lab
 
BioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyBioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyHoffman Lab
 
Get Good With Git
Get Good With GitGet Good With Git
Get Good With GitHoffman Lab
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserHoffman Lab
 
MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...Hoffman Lab
 
dreamRs: interactive ggplot2
dreamRs: interactive ggplot2dreamRs: interactive ggplot2
dreamRs: interactive ggplot2Hoffman Lab
 

Mehr von Hoffman Lab (20)

GNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkGNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talk
 
TCRpower
TCRpowerTCRpower
TCRpower
 
Efficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with ggetEfficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with gget
 
WashU Epigenome Browser
WashU Epigenome BrowserWashU Epigenome Browser
WashU Epigenome Browser
 
Wireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelWireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network Tunnel
 
Plotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornPlotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seaborn
 
Go Get Data (GGD)
Go Get Data (GGD)Go Get Data (GGD)
Go Get Data (GGD)
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processor
 
R markdown and Rmdformats
R markdown and RmdformatsR markdown and Rmdformats
R markdown and Rmdformats
 
File searching tools
File searching toolsFile searching tools
File searching tools
 
Better BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroBetter BibTeX (BBT) for Zotero
Better BibTeX (BBT) for Zotero
 
Awk primer and Bioawk
Awk primer and BioawkAwk primer and Bioawk
Awk primer and Bioawk
 
Terminals and Shells
Terminals and ShellsTerminals and Shells
Terminals and Shells
 
BioRender & Glossary/Acronym
BioRender & Glossary/AcronymBioRender & Glossary/Acronym
BioRender & Glossary/Acronym
 
Linters in R
Linters in RLinters in R
Linters in R
 
BioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyBioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biology
 
Get Good With Git
Get Good With GitGet Good With Git
Get Good With Git
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome Browser
 
MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...
 
dreamRs: interactive ggplot2
dreamRs: interactive ggplot2dreamRs: interactive ggplot2
dreamRs: interactive ggplot2
 

Kürzlich hochgeladen

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Kürzlich hochgeladen (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

TPOT: The data science assistant

  • 1. Tech Talk: TPOT “The Data Science Assisstant” Francis Nguyen Hoffman Lab July, 2017
  • 2. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions TPOT logo from official documentation @ http://rhiever.github.io/tpot/ Introduction: What is TPOT?
  • 3. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions TPOT logo from official documentation @ http://rhiever.github.io/tpot/ Introduction: What is TPOT?
  • 4. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Image adapted from official TPOT documentation @ http://rhiever.github.io/tpot/ Introduction: How TPOT works
  • 5. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Image adapted from official TPOT documentation @ http://rhiever.github.io/tpot/ Introduction: How TPOT works Automated by scikit-learn Manual Steps Manual Step
  • 6. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Image adapted from official TPOT documentation @ http://rhiever.github.io/tpot/ Introduction: How TPOT works Manual Steps Manual Step Automated by scikit-learn
  • 7. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Introduction: Scikit built-ins Exhaustive Grid Search Randomized Parameter Optimization
  • 8. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Introduction: Scikit built-ins Exhaustive Grid Search Randomized Parameter Optimization Both methods... ● ...help find optimal hyperparameters for a given model ● ...are very easy to use ● ...are easily parallelizable
  • 9. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Introduction: Scikit built-ins Exhaustive Grid Search Parameter search Classifier training Randomized Parameter Optimization
  • 10. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Introduction: Scikit built-ins Exhaustive Grid Search Kernel Error Penalty (C) linear 1 linear 10 linear 100 rbf 1 rbf 10 Randomized Parameter Optimization Can be very slow!
  • 11. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Introduction: Scikit built-ins ● Used when exhaustive grid searches are too computationally intensive ● Random sampling means that adding more parameters doesn’t reduce performance per se Exhaustive Grid Search Randomized Parameter Optimization Screenshots of official scikit-learn documentation @ http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
  • 12. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Introduction: TPOT Screenshots of official scikit-learn documentation @ http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html Exhaustive Grid Search Randomized Parameter Optimization Tree-based Pipeline OpTimization Speed Very slow Scalable to project constraints Scalable to project constraints Breadth Searches all possible solutions Randomly selects solutions Approaches best solution via genetic programming Steps Required Data cleanup; model and hyperparameter choice Data cleanup; model and hyperparameter choice Data cleanup
  • 13. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Introduction: TPOT Screenshots of official scikit-learn documentation @ http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
  • 14. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Figure from Olson et al., EvoApplications (2016) pp123-137 Introduction: How TPOT works
  • 15. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Figure from Olson et al., EvoApplications (2016) pp123-137 Introduction: How TPOT works Feature Selection Or Construction
  • 16. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Figure from Olson et al., EvoApplications (2016) pp123-137 Introduction: How TPOT works Combination
  • 17. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Figure from Olson et al., EvoApplications (2016) pp123-137 Introduction: How TPOT works Classification
  • 18. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Figure from Olson et al., EvoApplications (2016) pp123-137 Introduction: How TPOT works
  • 19. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Figure from Olson et al., EvoApplications (2016) pp123-137 Introduction: How TPOT works
  • 20. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Image from official TPOT documentation @ http://rhiever.github.io/tpot/ Introduction: Genetic Programming Step 1: Create population_size (default 100) random classification algorithms
  • 21. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Image from official TPOT documentation @ http://rhiever.github.io/tpot/ Introduction: Genetic Programming Step 1: Create population_size (default 100) random classification algorithms
  • 22. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Introduction: Genetic Programming Step 2: Evaluate their performance on the metric specified by scoring (default: “accuracy”, but can do “f1”, “recall” etc.)
  • 23. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Step 2: Evaluate their performance on the metric specified by scoring (default: “accuracy”, but can do “f1”, “recall” etc.) Step 3: Create new population out of: ● 10% copies of the best performing algorithm ● 90% based on “three-way tournaments” on the rest of the data ○ Accuracy and simplicity are optimized for here Image from official TPOT documentation @ http://rhiever.github.io/tpot/ Introduction: Genetic Programming
  • 24. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Introduction: Genetic Programming Step 2: Evaluate their performance on the metric specified by scoring (default: “accuracy”, but can do “f1”, “recall” etc.) Step 3: Create new population out of: ● 10% copies of the best performing algorithm ● 90% based on “three-way tournaments” on the rest of the data ○ Accuracy and simplicity are optimized for here Step 4: Mutate pipelines according to mutation_rate and crossover_rate: ● Similarly to mutations in DNA, pipeline operators may be replaced, inserted, or deleted according to the mutation_rate parameter Image from official TPOT documentation @ http://rhiever.github.io/tpot/
  • 25. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Introduction: Genetic Programming Step 2: Evaluate their performance on the metric specified by scoring (default: “accuracy”, but can do “f1”, “recall” etc.) Step 3: Create new population out of: ● 10% copies of the best performing algorithm ● 90% based on “three-way tournaments” on the rest of the data ○ Accuracy and simplicity are optimized for here Step 4: Mutate pipelines according to mutation_rate and crossover_rate: ● Similarly to mutations in DNA, pipeline operators may be replaced, inserted, or deleted according to the mutation_rate parameter ● Crossover mutations, where parts of one pipeline are cut-and-pasted into another pipeline, can be controlled via the crossover_rate parameter
  • 26. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Introduction: Genetic Programming Step 2: Evaluate their performance on the metric specified by scoring (default: “accuracy”, but can do “f1”, “recall” etc.) Step 3: Create new population out of: ● 10% copies of the best performing algorithm ● 90% based on “three-way tournaments” on the rest of the data ○ Accuracy and simplicity are optimized for here Step 4: Mutate pipelines according to mutation_rate and crossover_rate: ● Similarly to mutations in DNA, pipeline operators may be replaced, inserted, or deleted according to the mutation_rate parameter ● Crossover mutations, where parts of one pipeline are cut-and-pasted into another pipeline, can be controlled via the crossover_rate parameter Step 5: Repeat steps 2-4 n times (where n is controlled via the generations parameter) ● Subsequent generations will only be offspring_size large ● In total, TPOT evaluates population_size + generations * offspring_size pipelines
  • 27. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Usage: installation Requires: ● numpy, scipy, scikit-learn (via pip or conda) ● deap, update_checker, tqdm (via pip) ● Py-xgboost (via pip) (Optional) (Warning: crashes on download.q, ill-behaved) ● Tpot (via pip) Will install a command-line utility along with the python library
  • 28. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Usage: two types of problems
  • 29. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Image from official TPOT documentation @ http://rhiever.github.io/tpot/ Usage: Python example
  • 30. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Image from official TPOT documentation @ http://rhiever.github.io/tpot/ Usage: Python example
  • 31. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Image from official TPOT documentation @ http://rhiever.github.io/tpot/ Usage: Python example
  • 32. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Image from official TPOT documentation @ http://rhiever.github.io/tpot/ Usage: Python example
  • 33. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Image from official TPOT documentation @ http://rhiever.github.io/tpot/ Usage: Command-line
  • 34. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Image from official TPOT documentation @ http://rhiever.github.io/tpot/ Usage: Command-line Things to note with the command-line interface: ● -is should be specified ● The input file should have column names; -target should be the classification column name ● -njobs is meant to be used within a parallel environment: ○ When using qlogin, qsub, or qrsh, use -pe smp <n> to reserve <n> cores on your target machine
  • 35. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Limitations: ● Only finds solutions with scikit-learn ● Only works on supervised classification/regression problems ● Long run times - documentation recommends running it for days or longer for best results ● Strangely difficult (but possible) to install on the cluster - has many dependencies which must be installed in order, one of which will run into memory issues on download.q ● Gives no insight on why particular model/hyperparameters were chosen
  • 36. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Conclusions:
  • 38. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Image from official TPOT documentation @ http://rhiever.github.io/tpot/ Introduction: How TPOT works
  • 39. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Image from official TPOT documentation @ http://rhiever.github.io/tpot/ Introduction: Genetic Programming Three-way tournament: Given three random pipelines from the existing population...
  • 40. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Image from official TPOT documentation @ http://rhiever.github.io/tpot/ Introduction: Genetic Programming Three-way tournament: Given three random pipelines from the existing population… ...remove the worst performing one...
  • 41. TPOT 1. Introduction 2. Usage 3. Limitations 4. Conclusions Image from official TPOT documentation @ http://rhiever.github.io/tpot/ Introduction: Genetic Programming Three-way tournament: Given three random pipelines from the existing population… ...remove the worst performing one… ...then remove the most complex of the two

Hinweis der Redaktion

  1. GECCO
  2. GECCO
  3. Apparently used if you have a computational load budget that you can’t exceed - also adding more parameters doesn’t reduce performance
  4. Apparently used if you have a computational load budget that you can’t exceed - also adding more parameters doesn’t reduce performance
  5. Apparently used if you have a computational load budget that you can’t exceed - also adding more parameters doesn’t reduce performance
  6. EvoApplications 2016: Applications of Evolutionary Computation pp 123-137
  7. EvoApplications 2016: Applications of Evolutionary Computation pp 123-137
  8. EvoApplications 2016: Applications of Evolutionary Computation pp 123-137
  9. EvoApplications 2016: Applications of Evolutionary Computation pp 123-137
  10. EvoApplications 2016: Applications of Evolutionary Computation pp 123-137
  11. EvoApplications 2016: Applications of Evolutionary Computation pp 123-137