SlideShare a Scribd company logo
1 of 36
Download to read offline
Tips and tricks for data
science projects with Python
José Manuel Ortega
Python Developer
Jose Manuel Ortega
Software engineer,
Freelance
1. Introducing Python for machine learning projects
2. Stages of a machine learning project
3. Selecting the best python library for your project
for each stage
4. Python tools for deep learning in data science
projects
Introducing Python for machine learning projects
● Simple and consistent
● Understandable by humans
● General-purpose programming language
● Extensive selection of libraries and
frameworks
Introducing Python for machine learning projects
● Spam filters
● Recommendation systems
● Search engines
● Ppersonal assistants
● Fraud detection systems
Introducing Python for machine learning projects
● Machine learning ● Keras, TensorFlow, and
Scikit-learn
● High-performance
scientific computing
● Numpy, Scipy
● Computer vision ● OpenCV
● Data analysis ● Numpy, Pandas
● Natural language
processing
● NLTK, spaCy
Introducing Python for machine learning projects
Introducing Python for machine learning projects
Introducing Python for machine learning projects
● Reading/writing many different data formats
● Selecting subsets of data
● Calculating across rows and down columns
● Finding and filling missing data
● Applying operations to independent groups within the data
● Reshaping data into different forms
● Combing multiple datasets together
● Advanced time-series functionality
● Visualization through Matplotlib and Seaborn
Introducing Python for machine learning projects
Introducing Python for machine learning projects
import pandas as pd
import pandas_profiling
# read the dataset
data = pd.read_csv('your-data')
prof = pandas_profiling.ProfileReport(data)
prof.to_file(output_file='output.html')
Stages of a machine learning project
Stages of a machine learning project
Stages of a machine learning project
Python libraries
Python libraries
● Supervised and unsupervised machine learning
● Classification, regression, Support Vector Machine
● Clustering, Kmeans, DBSCAN
● Random Forest
Python libraries
● Pipelines
● Grid-search
● Validation curves
● One-hot encoding of categorial data
● Dataset generators
● Principal Component Analysis (PCA)
Python libraries
Pipelines
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.naive_bayes import MultinomialNB
>>> from sklearn.preprocessing import Binarizer
>>> make_pipeline(Binarizer(), MultinomialNB())
Pipeline(steps=[('binarizer', Binarizer()),
('multinomialnb', MultinomialNB())])
http://scikit-learn.org/stable/modules/pipeline.html
Python libraries
Grid-search
estimator.get_params()
A search consists of:
● an estimator (regressor or classifier such as
sklearn.svm.SVC())
● a parameter space
● a method for searching or sampling candidates
● a cross-validation scheme
● a score function
https://scikit-learn.org/stable/modules/grid_search.html#grid-search
Python libraries
Validation curves
https://scikit-learn.org/stable/modules/learning_curve.html
Python libraries
Validation curves
>>> train_scores, valid_scores = validation_curve(
... Ridge(), X, y, param_name="alpha", param_range=np.logspace(-7, 3, 3),
... cv=5)
>>> train_scores
array([[0.93..., 0.94..., 0.92..., 0.91..., 0.92...],
[0.93..., 0.94..., 0.92..., 0.91..., 0.92...],
[0.51..., 0.52..., 0.49..., 0.47..., 0.49...]])
>>> valid_scores
array([[0.90..., 0.84..., 0.94..., 0.96..., 0.93...],
[0.90..., 0.84..., 0.94..., 0.96..., 0.93...],
[0.46..., 0.25..., 0.50..., 0.49..., 0.52...]])
Python libraries
One-hot encoding
https://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features
# importing sklearn one hot encoding
from sklearn.preprocessing import
OneHotEncoder
# initializing one hot encoding
encoding = OneHotEncoder()
# applying one hot encoding in python
transformed_data =
encoding.fit_transform(data[['Status']])
# head
print(transformed_data.toarray())
Python libraries
Dataset generators
https://scikit-learn.org/stable/datasets/sample_generators.html
Python libraries
Principal Component Analysis (PCA)
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
Python libraries
Principal Component Analysis (PCA)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)
Python libraries
Python tools for deep learning
Python tools for deep learning
Python tools for deep learning
Python tools for deep learning
Python tools for deep learning
Python tools for deep learning
Python tools for deep learning
TensorFlow Keras Pytorch
API Level High and Low High Low
Architecture Not easy to use Simple, concise,
readable
Complex, less
readable
Speed Fast,
high-performance
Slow, low
performance
Fast,
high-performance
Trained
Models
Yes Yes Yes
Python tools for deep learning
● tight integration with NumPy – Use numpy.ndarray in Theano-compiled
functions.
● transparent use of a GPU – Perform data-intensive computations much faster
than on a CPU.
● efficient symbolic differentiation – Theano does your derivatives for
functions with one or many inputs.
● speed and stability optimizations – Get the right answer for log(1+x) even
when x is really tiny.
● dynamic C code generation – Evaluate expressions faster.
● extensive unit-testing and self-verification – Detect and diagnose many
types of error
Python tools for deep learning
● Synkhronos Extension to Theano for multi-GPU data
parallelism
● Theano-MPI Theano-MPI a distributed framework for training
models built in Theano based on data-parallelism.
● Platoon Multi-GPU mini-framework for Theano, single node.
● Elephas Distributed Deep Learning with Keras & Spark.
Tips and tricks for data
science projects with Python
@jmortegac
https://www.linkedin.com/in/jmortega1

More Related Content

Similar to Tips and tricks for data science projects with Python

Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 
Python and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthroughPython and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthrough
gabriellekuruvilla
 
How to integrate python into a scala stack
How to integrate python into a scala stackHow to integrate python into a scala stack
How to integrate python into a scala stack
Fliptop
 
A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientists
Stitch Fix Algorithms
 
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Simplilearn
 

Similar to Tips and tricks for data science projects with Python (20)

Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Pyhton-1a-Basics.pdf
Pyhton-1a-Basics.pdfPyhton-1a-Basics.pdf
Pyhton-1a-Basics.pdf
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
Lecture 4: Deep Learning Frameworks
Lecture 4: Deep Learning FrameworksLecture 4: Deep Learning Frameworks
Lecture 4: Deep Learning Frameworks
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance Analysis
 
New Capabilities in the PyData Ecosystem
New Capabilities in the PyData EcosystemNew Capabilities in the PyData Ecosystem
New Capabilities in the PyData Ecosystem
 
Automation tools: making things go... (March 2019)
Automation tools: making things go... (March 2019)Automation tools: making things go... (March 2019)
Automation tools: making things go... (March 2019)
 
PPT6: Neuron Demo
PPT6: Neuron DemoPPT6: Neuron Demo
PPT6: Neuron Demo
 
Introduction to-python
Introduction to-pythonIntroduction to-python
Introduction to-python
 
Python and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthroughPython and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthrough
 
How to integrate python into a scala stack
How to integrate python into a scala stackHow to integrate python into a scala stack
How to integrate python into a scala stack
 
A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientists
 
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
 
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
 
Python in the real world : from everyday applications to advanced robotics
Python in the real world : from everyday applications to advanced roboticsPython in the real world : from everyday applications to advanced robotics
Python in the real world : from everyday applications to advanced robotics
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPragueSql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
 

More from Jose Manuel Ortega Candel

Evolution of security strategies in K8s environments- All day devops
Evolution of security strategies in K8s environments- All day devops Evolution of security strategies in K8s environments- All day devops
Evolution of security strategies in K8s environments- All day devops
Jose Manuel Ortega Candel
 

More from Jose Manuel Ortega Candel (20)

Asegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdf
Asegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdfAsegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdf
Asegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdf
 
PyGoat Analizando la seguridad en aplicaciones Django.pdf
PyGoat Analizando la seguridad en aplicaciones Django.pdfPyGoat Analizando la seguridad en aplicaciones Django.pdf
PyGoat Analizando la seguridad en aplicaciones Django.pdf
 
Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...
Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...
Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...
 
Evolution of security strategies in K8s environments- All day devops
Evolution of security strategies in K8s environments- All day devops Evolution of security strategies in K8s environments- All day devops
Evolution of security strategies in K8s environments- All day devops
 
Evolution of security strategies in K8s environments.pdf
Evolution of security strategies in K8s environments.pdfEvolution of security strategies in K8s environments.pdf
Evolution of security strategies in K8s environments.pdf
 
Implementing Observability for Kubernetes.pdf
Implementing Observability for Kubernetes.pdfImplementing Observability for Kubernetes.pdf
Implementing Observability for Kubernetes.pdf
 
Computación distribuida usando Python
Computación distribuida usando PythonComputación distribuida usando Python
Computación distribuida usando Python
 
Seguridad en arquitecturas serverless y entornos cloud
Seguridad en arquitecturas serverless y entornos cloudSeguridad en arquitecturas serverless y entornos cloud
Seguridad en arquitecturas serverless y entornos cloud
 
Construyendo arquitecturas zero trust sobre entornos cloud
Construyendo arquitecturas zero trust sobre entornos cloud Construyendo arquitecturas zero trust sobre entornos cloud
Construyendo arquitecturas zero trust sobre entornos cloud
 
Sharing secret keys in Docker containers and K8s
Sharing secret keys in Docker containers and K8sSharing secret keys in Docker containers and K8s
Sharing secret keys in Docker containers and K8s
 
Implementing cert-manager in K8s
Implementing cert-manager in K8sImplementing cert-manager in K8s
Implementing cert-manager in K8s
 
Python para equipos de ciberseguridad(pycones)
Python para equipos de ciberseguridad(pycones)Python para equipos de ciberseguridad(pycones)
Python para equipos de ciberseguridad(pycones)
 
Python para equipos de ciberseguridad
Python para equipos de ciberseguridad Python para equipos de ciberseguridad
Python para equipos de ciberseguridad
 
Shodan Tips and tricks. Automatiza y maximiza las búsquedas shodan
Shodan Tips and tricks. Automatiza y maximiza las búsquedas shodanShodan Tips and tricks. Automatiza y maximiza las búsquedas shodan
Shodan Tips and tricks. Automatiza y maximiza las búsquedas shodan
 
ELK para analistas de seguridad y equipos Blue Team
ELK para analistas de seguridad y equipos Blue TeamELK para analistas de seguridad y equipos Blue Team
ELK para analistas de seguridad y equipos Blue Team
 
Monitoring and managing Containers using Open Source tools
Monitoring and managing Containers using Open Source toolsMonitoring and managing Containers using Open Source tools
Monitoring and managing Containers using Open Source tools
 
Python Memory Management 101(Europython)
Python Memory Management 101(Europython)Python Memory Management 101(Europython)
Python Memory Management 101(Europython)
 
SecDevOps containers
SecDevOps containersSecDevOps containers
SecDevOps containers
 
Python memory managment. Deeping in Garbage collector
Python memory managment. Deeping in Garbage collectorPython memory managment. Deeping in Garbage collector
Python memory managment. Deeping in Garbage collector
 
Machine Learning para proyectos de seguridad(Pycon)
Machine Learning para proyectos de seguridad(Pycon)Machine Learning para proyectos de seguridad(Pycon)
Machine Learning para proyectos de seguridad(Pycon)
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Tips and tricks for data science projects with Python

  • 1. Tips and tricks for data science projects with Python José Manuel Ortega Python Developer
  • 2. Jose Manuel Ortega Software engineer, Freelance
  • 3. 1. Introducing Python for machine learning projects 2. Stages of a machine learning project 3. Selecting the best python library for your project for each stage 4. Python tools for deep learning in data science projects
  • 4. Introducing Python for machine learning projects ● Simple and consistent ● Understandable by humans ● General-purpose programming language ● Extensive selection of libraries and frameworks
  • 5. Introducing Python for machine learning projects ● Spam filters ● Recommendation systems ● Search engines ● Ppersonal assistants ● Fraud detection systems
  • 6. Introducing Python for machine learning projects ● Machine learning ● Keras, TensorFlow, and Scikit-learn ● High-performance scientific computing ● Numpy, Scipy ● Computer vision ● OpenCV ● Data analysis ● Numpy, Pandas ● Natural language processing ● NLTK, spaCy
  • 7. Introducing Python for machine learning projects
  • 8. Introducing Python for machine learning projects
  • 9. Introducing Python for machine learning projects ● Reading/writing many different data formats ● Selecting subsets of data ● Calculating across rows and down columns ● Finding and filling missing data ● Applying operations to independent groups within the data ● Reshaping data into different forms ● Combing multiple datasets together ● Advanced time-series functionality ● Visualization through Matplotlib and Seaborn
  • 10. Introducing Python for machine learning projects
  • 11. Introducing Python for machine learning projects import pandas as pd import pandas_profiling # read the dataset data = pd.read_csv('your-data') prof = pandas_profiling.ProfileReport(data) prof.to_file(output_file='output.html')
  • 12. Stages of a machine learning project
  • 13. Stages of a machine learning project
  • 14. Stages of a machine learning project
  • 16. Python libraries ● Supervised and unsupervised machine learning ● Classification, regression, Support Vector Machine ● Clustering, Kmeans, DBSCAN ● Random Forest
  • 17. Python libraries ● Pipelines ● Grid-search ● Validation curves ● One-hot encoding of categorial data ● Dataset generators ● Principal Component Analysis (PCA)
  • 18. Python libraries Pipelines >>> from sklearn.pipeline import make_pipeline >>> from sklearn.naive_bayes import MultinomialNB >>> from sklearn.preprocessing import Binarizer >>> make_pipeline(Binarizer(), MultinomialNB()) Pipeline(steps=[('binarizer', Binarizer()), ('multinomialnb', MultinomialNB())]) http://scikit-learn.org/stable/modules/pipeline.html
  • 19. Python libraries Grid-search estimator.get_params() A search consists of: ● an estimator (regressor or classifier such as sklearn.svm.SVC()) ● a parameter space ● a method for searching or sampling candidates ● a cross-validation scheme ● a score function https://scikit-learn.org/stable/modules/grid_search.html#grid-search
  • 21. Python libraries Validation curves >>> train_scores, valid_scores = validation_curve( ... Ridge(), X, y, param_name="alpha", param_range=np.logspace(-7, 3, 3), ... cv=5) >>> train_scores array([[0.93..., 0.94..., 0.92..., 0.91..., 0.92...], [0.93..., 0.94..., 0.92..., 0.91..., 0.92...], [0.51..., 0.52..., 0.49..., 0.47..., 0.49...]]) >>> valid_scores array([[0.90..., 0.84..., 0.94..., 0.96..., 0.93...], [0.90..., 0.84..., 0.94..., 0.96..., 0.93...], [0.46..., 0.25..., 0.50..., 0.49..., 0.52...]])
  • 22. Python libraries One-hot encoding https://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features # importing sklearn one hot encoding from sklearn.preprocessing import OneHotEncoder # initializing one hot encoding encoding = OneHotEncoder() # applying one hot encoding in python transformed_data = encoding.fit_transform(data[['Status']]) # head print(transformed_data.toarray())
  • 24. Python libraries Principal Component Analysis (PCA) https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
  • 25. Python libraries Principal Component Analysis (PCA) from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test) from sklearn.decomposition import PCA pca = PCA(n_components=2) X_train = pca.fit_transform(X_train) X_test = pca.transform(X_test)
  • 27. Python tools for deep learning
  • 28. Python tools for deep learning
  • 29. Python tools for deep learning
  • 30. Python tools for deep learning
  • 31. Python tools for deep learning
  • 32. Python tools for deep learning
  • 33. Python tools for deep learning TensorFlow Keras Pytorch API Level High and Low High Low Architecture Not easy to use Simple, concise, readable Complex, less readable Speed Fast, high-performance Slow, low performance Fast, high-performance Trained Models Yes Yes Yes
  • 34. Python tools for deep learning ● tight integration with NumPy – Use numpy.ndarray in Theano-compiled functions. ● transparent use of a GPU – Perform data-intensive computations much faster than on a CPU. ● efficient symbolic differentiation – Theano does your derivatives for functions with one or many inputs. ● speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny. ● dynamic C code generation – Evaluate expressions faster. ● extensive unit-testing and self-verification – Detect and diagnose many types of error
  • 35. Python tools for deep learning ● Synkhronos Extension to Theano for multi-GPU data parallelism ● Theano-MPI Theano-MPI a distributed framework for training models built in Theano based on data-parallelism. ● Platoon Multi-GPU mini-framework for Theano, single node. ● Elephas Distributed Deep Learning with Keras & Spark.
  • 36. Tips and tricks for data science projects with Python @jmortegac https://www.linkedin.com/in/jmortega1