SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
WHY PYTHON IS BETTER
FOR DATA SCIENCE
ÍCARO MEDEIROS
São Paulo Big Data Meetup

São Paulo - SP, 25/11/2015
DATA SCIENTISTS SHOULD DO…
http://berkeleysciencereview.com/article/first-rule-data-science/
WHY PYTHON?
▸ General purpose

▸ Smooth learning curve

▸ REPL (IPython!)

▸ Programmer productivity

▸ Popular and mature

▸ Glue language (high level API, low level C/Fortran bindings)

▸ Science ecosystem (growing!)
PYTHON IS POPULAR: IT MEANS WIDESPREAD KNOWLEDGE AND MANY TOOLS
http://githut.info/
PYTHON IS POPULAR: IT MEANS WIDESPREAD KNOWLEDGE AND MANY TOOLS
pypl.github.io/PYPL.html
AVOID THE TWO LANGUAGE PROBLEM
PYTHON CAN BE USED IN WHOLE DATA SCIENCE WORKFLOW
https://speakerdeck.com/chdoig/the-state-of-python-for-data-science-pyss-2015?slide=22
AUTHOR A MULTISTAGE PROCESSING PIPELINE IN
PYTHON, DESIGN A HYPOTHESIS TEST, PERFORM A
REGRESSION ANALYSIS OVER DATA SAMPLES WITH R,
DESIGN AND IMPLEMENT AN ALGORITHM FOR SOME
DATA-INTENSIVE PRODUCT OR SERVICE IN HADOOP,
OR COMMUNICATE THE RESULTS OF OUR ANALYSES
Jeff Hammerbacher
ONE DAY AT FACEBOOK’S DATA SCIENCE TEAM, A MEMBER COULD…
http://berkeleysciencereview.com/scientific-collaborations-uc-berkeley-data-driven-cover/
OPTIONS FOR PROCESSING PIPELINE
Airflow
https://github.com/airbnb/airflow
https://github.com/spotify/luigi
AIRFLOW EXAMPLE
https://github.com/airbnb/airflow
REGRESSION ANALYSIS IN PYTHON: EASY
http://statsmodels.sourceforge.net/devel/examples/notebooks/generated/ols.html
PYTHON <3 BIG DATA
map reduce in python
pure python HDFS client
fast and general engine for large-scale
data processing
mrjob
http://spark.apache.org
https://github.com/spotify/snakebite
https://pythonhosted.org/mrjob
…
OH, BUT SCALA/JAVA IS FASTER. PYTHON IS 2 *FASTER: [WRITING, RUNNING]
DataFrame operations are optimized and compiled into JVM bytecode
https://databricks.com/blog/2015/04/24/recent-performance-improvements-in-apache-spark-sql-python-
dataframes-and-more.html
RDD AVERAGE: EXAMPLE FROM ‘LEARNING SPARK'
RDD AVERAGE: EXAMPLE FROM ‘LEARNING SPARK'
SO CONCISE
COMMUNICATE RESULTS WITH IPYTHON / JUPYTER
Language agnostic :)
COMMUNICATE RESULTS WITH IPYTHON / JUPYTER
DEMO
TIME
MATPLOTLIB / SEABORN / PLOT.LY / BOKEH: SUCH VISUALIZATION!!
PYTHON FITS ALL!
PYTHON FITS ALL!
PYTHON FOR
SCIENCE IS
GROWING
SCIENCE IS GETTING MORE AND MORE IMPORTANT FOR PYTHON COMMUNITY
# module imports imports/numpy
1 sys 2437939 5.85
2 os 2009086 4.82
3 re 1303009 3.12
4 numpy 416981 1.00
5 warnings 371345 0.89
6 subprocess 344934 0.83
7 django 282097 0.68
8 math 281987 0.68
11 matplotlib 146913 0.35
13 pylab 77817 0.19
14 scipy 69092 0.17
22 pandas 18928 0.05
24 theano 5482 0.051
6/25 MOST POPULAR LIBRARIES ARE FOR DATA SCIENCE
https://www.python.org/dev/peps/pep-0465/#but-isn-t-matrix-multiplication-a-pretty-niche-requirement
SCIENCE IS IMPORTANT FOR PYTHON: MATRIX MULTIPLICATION
https://www.python.org/dev/peps/pep-0465/#but-isn-t-matrix-multiplication-a-pretty-niche-requirement
import numpy as np
from numpy.linalg import inv, solve
# Using dot function:
S = np.dot((np.dot(H, beta) - r).T,
np.dot(inv(np.dot(np.dot(H, V), H.T)),
np.dot(H, beta) - r))
# With the @ operator
S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r)
S = ( H β − r ) T ( H V H T ) − 1 ( H β − r )
PEP 0465: PROPOSED FEB/14. SINCE PY 3.5 (SEP/15)
2013: 7 INTERNATIONAL CONFERENCES ON NUMERICAL PYTHON
AT PYCON 2014, ~20% OF THE TUTORIALS INVOLVED THE USE OF MATRICES
SCIENCE STACK IS GETTING BETTER EACH DAY
https://speakerdeck.com/jakevdp/the-state-of-the-stack-scipy-2015-keynote?slide=8
SCIENCE STACK IS ALWAYS EVOLVING…
https://speakerdeck.com/jakevdp/the-state-of-the-stack-scipy-2015-keynote?slide=29
CONDA: AUTOMATING ENVIRONMENTS
https://speakerdeck.com/chdoig/the-state-of-python-for-data-science-pyss-2015?slide=60
THE STACK IS STILL GETTING NEW MEMBERS…
http://www.tensorflow.org/
TAKEAWAY MESSAGE
TRY PYTHON. IT WILL BE
A ONE WAY TRIP!
slides
icaromedeiros.com.br
slideshare.net/icaromedeiros
@icaromedeiros

Weitere ähnliche Inhalte

Was ist angesagt?

Python Programming Language
Python Programming LanguagePython Programming Language
Python Programming Language
Laxman Puri
 
Python quick guide1
Python quick guide1Python quick guide1
Python quick guide1
Kanchilug
 

Was ist angesagt? (20)

Introduction of python
Introduction of pythonIntroduction of python
Introduction of python
 
Why Python?
Why Python?Why Python?
Why Python?
 
Python final ppt
Python final pptPython final ppt
Python final ppt
 
Python
Python Python
Python
 
1901200100000 presentation short term mini project on python
1901200100000 presentation short term mini project on python1901200100000 presentation short term mini project on python
1901200100000 presentation short term mini project on python
 
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
Python for Science and Engineering: a presentation to A*STAR and the Singapor...Python for Science and Engineering: a presentation to A*STAR and the Singapor...
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
 
Python Tutorial | Python Tutorial for Beginners | Python Training | Edureka
Python Tutorial | Python Tutorial for Beginners | Python Training | EdurekaPython Tutorial | Python Tutorial for Beginners | Python Training | Edureka
Python Tutorial | Python Tutorial for Beginners | Python Training | Edureka
 
Getting Started with Python
Getting Started with PythonGetting Started with Python
Getting Started with Python
 
Python in Data Science Work
Python in Data Science WorkPython in Data Science Work
Python in Data Science Work
 
Python: the Project, the Language and the Style
Python: the Project, the Language and the StylePython: the Project, the Language and the Style
Python: the Project, the Language and the Style
 
Python Programming Language
Python Programming LanguagePython Programming Language
Python Programming Language
 
Basics of python
Basics of pythonBasics of python
Basics of python
 
Chapter 8 getting started with python
Chapter 8 getting started with pythonChapter 8 getting started with python
Chapter 8 getting started with python
 
Which programming language to learn R or Python - MeasureCamp XII
Which programming language to learn R or Python - MeasureCamp XIIWhich programming language to learn R or Python - MeasureCamp XII
Which programming language to learn R or Python - MeasureCamp XII
 
Python quick guide1
Python quick guide1Python quick guide1
Python quick guide1
 
Machine learning libraries with python
Machine learning libraries with pythonMachine learning libraries with python
Machine learning libraries with python
 
Python presentation by Monu Sharma
Python presentation by Monu SharmaPython presentation by Monu Sharma
Python presentation by Monu Sharma
 
Python Introduction | JNTUA | R19 | UNIT 1
Python Introduction | JNTUA | R19 | UNIT 1 Python Introduction | JNTUA | R19 | UNIT 1
Python Introduction | JNTUA | R19 | UNIT 1
 
Python training
Python trainingPython training
Python training
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 

Ähnlich wie Why Python is better for Data Science

Python and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python ArchitecturePython and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Skillspeed
 

Ähnlich wie Why Python is better for Data Science (20)

Programming for data science in python
Programming for data science in pythonProgramming for data science in python
Programming for data science in python
 
Python webinar 4th june
Python webinar 4th junePython webinar 4th june
Python webinar 4th june
 
Python PPT
Python PPTPython PPT
Python PPT
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of Python
 
what is python ?
what is python ? what is python ?
what is python ?
 
Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016
 
Exploring and Using the Python Ecosystem
Exploring and Using the Python EcosystemExploring and Using the Python Ecosystem
Exploring and Using the Python Ecosystem
 
Time travel: Let’s learn from the history of Python packaging!
Time travel: Let’s learn from the history of Python packaging!Time travel: Let’s learn from the history of Python packaging!
Time travel: Let’s learn from the history of Python packaging!
 
Micropython for the iot
Micropython for the iotMicropython for the iot
Micropython for the iot
 
Pi, Python, and Paintball??? Innovating with Affordable Tech!
Pi, Python, and Paintball??? Innovating with Affordable Tech!Pi, Python, and Paintball??? Innovating with Affordable Tech!
Pi, Python, and Paintball??? Innovating with Affordable Tech!
 
Why Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageWhy Python Should Be Your First Programming Language
Why Python Should Be Your First Programming Language
 
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
 
Old Dogs and New Tricks
Old Dogs and New TricksOld Dogs and New Tricks
Old Dogs and New Tricks
 
Mag pi18 Citation "PhotoReportage"
Mag pi18 Citation "PhotoReportage"Mag pi18 Citation "PhotoReportage"
Mag pi18 Citation "PhotoReportage"
 
Python 101 For The Net Developer
Python 101 For The Net DeveloperPython 101 For The Net Developer
Python 101 For The Net Developer
 
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
 
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python ArchitecturePython and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python Architecture
 
Python – The Fastest Growing Programming Language
Python – The Fastest Growing Programming LanguagePython – The Fastest Growing Programming Language
Python – The Fastest Growing Programming Language
 
Python in the Atmospheric sciences
Python in the Atmospheric sciencesPython in the Atmospheric sciences
Python in the Atmospheric sciences
 

Mehr von Ícaro Medeiros

Engenharia de ontologias
Engenharia de ontologiasEngenharia de ontologias
Engenharia de ontologias
Ícaro Medeiros
 
R2R Framework: Ontology Mapping
R2R Framework: Ontology MappingR2R Framework: Ontology Mapping
R2R Framework: Ontology Mapping
Ícaro Medeiros
 
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
Ícaro Medeiros
 
Tag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of KnowledgeTag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of Knowledge
Ícaro Medeiros
 

Mehr von Ícaro Medeiros (15)

Data Science and Culture
Data Science and CultureData Science and Culture
Data Science and Culture
 
Statistics: the grammar of Data Science
Statistics: the grammar of Data ScienceStatistics: the grammar of Data Science
Statistics: the grammar of Data Science
 
Linked Data, Big Data, and User Science at Globo.com
Linked Data, Big Data, and User Science at Globo.comLinked Data, Big Data, and User Science at Globo.com
Linked Data, Big Data, and User Science at Globo.com
 
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
 
Web Semântica na Globo.com (Novas Mídias UFRJ)
Web Semântica na Globo.com (Novas Mídias UFRJ)Web Semântica na Globo.com (Novas Mídias UFRJ)
Web Semântica na Globo.com (Novas Mídias UFRJ)
 
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
 
Engenharia de ontologias
Engenharia de ontologiasEngenharia de ontologias
Engenharia de ontologias
 
Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012
 
Ontology matching
Ontology matchingOntology matching
Ontology matching
 
R2R Framework: Ontology Mapping
R2R Framework: Ontology MappingR2R Framework: Ontology Mapping
R2R Framework: Ontology Mapping
 
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
 
Tag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of KnowledgeTag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of Knowledge
 
Expressões regulares no Linux
Expressões regulares no LinuxExpressões regulares no Linux
Expressões regulares no Linux
 
Ontology Learning
Ontology LearningOntology Learning
Ontology Learning
 
Tag Suggestion
Tag SuggestionTag Suggestion
Tag Suggestion
 

Kürzlich hochgeladen

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Kürzlich hochgeladen (20)

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 

Why Python is better for Data Science

  • 1. WHY PYTHON IS BETTER FOR DATA SCIENCE ÍCARO MEDEIROS São Paulo Big Data Meetup São Paulo - SP, 25/11/2015
  • 2. DATA SCIENTISTS SHOULD DO… http://berkeleysciencereview.com/article/first-rule-data-science/
  • 3. WHY PYTHON? ▸ General purpose ▸ Smooth learning curve ▸ REPL (IPython!) ▸ Programmer productivity ▸ Popular and mature ▸ Glue language (high level API, low level C/Fortran bindings) ▸ Science ecosystem (growing!)
  • 4. PYTHON IS POPULAR: IT MEANS WIDESPREAD KNOWLEDGE AND MANY TOOLS http://githut.info/
  • 5. PYTHON IS POPULAR: IT MEANS WIDESPREAD KNOWLEDGE AND MANY TOOLS pypl.github.io/PYPL.html
  • 6. AVOID THE TWO LANGUAGE PROBLEM
  • 7. PYTHON CAN BE USED IN WHOLE DATA SCIENCE WORKFLOW https://speakerdeck.com/chdoig/the-state-of-python-for-data-science-pyss-2015?slide=22
  • 8. AUTHOR A MULTISTAGE PROCESSING PIPELINE IN PYTHON, DESIGN A HYPOTHESIS TEST, PERFORM A REGRESSION ANALYSIS OVER DATA SAMPLES WITH R, DESIGN AND IMPLEMENT AN ALGORITHM FOR SOME DATA-INTENSIVE PRODUCT OR SERVICE IN HADOOP, OR COMMUNICATE THE RESULTS OF OUR ANALYSES Jeff Hammerbacher ONE DAY AT FACEBOOK’S DATA SCIENCE TEAM, A MEMBER COULD… http://berkeleysciencereview.com/scientific-collaborations-uc-berkeley-data-driven-cover/
  • 9. OPTIONS FOR PROCESSING PIPELINE Airflow https://github.com/airbnb/airflow https://github.com/spotify/luigi
  • 11. REGRESSION ANALYSIS IN PYTHON: EASY http://statsmodels.sourceforge.net/devel/examples/notebooks/generated/ols.html
  • 12.
  • 13. PYTHON <3 BIG DATA map reduce in python pure python HDFS client fast and general engine for large-scale data processing mrjob http://spark.apache.org https://github.com/spotify/snakebite https://pythonhosted.org/mrjob …
  • 14. OH, BUT SCALA/JAVA IS FASTER. PYTHON IS 2 *FASTER: [WRITING, RUNNING] DataFrame operations are optimized and compiled into JVM bytecode https://databricks.com/blog/2015/04/24/recent-performance-improvements-in-apache-spark-sql-python- dataframes-and-more.html
  • 15. RDD AVERAGE: EXAMPLE FROM ‘LEARNING SPARK'
  • 16. RDD AVERAGE: EXAMPLE FROM ‘LEARNING SPARK' SO CONCISE
  • 17. COMMUNICATE RESULTS WITH IPYTHON / JUPYTER Language agnostic :)
  • 18. COMMUNICATE RESULTS WITH IPYTHON / JUPYTER DEMO TIME
  • 19. MATPLOTLIB / SEABORN / PLOT.LY / BOKEH: SUCH VISUALIZATION!!
  • 23. SCIENCE IS GETTING MORE AND MORE IMPORTANT FOR PYTHON COMMUNITY # module imports imports/numpy 1 sys 2437939 5.85 2 os 2009086 4.82 3 re 1303009 3.12 4 numpy 416981 1.00 5 warnings 371345 0.89 6 subprocess 344934 0.83 7 django 282097 0.68 8 math 281987 0.68 11 matplotlib 146913 0.35 13 pylab 77817 0.19 14 scipy 69092 0.17 22 pandas 18928 0.05 24 theano 5482 0.051 6/25 MOST POPULAR LIBRARIES ARE FOR DATA SCIENCE https://www.python.org/dev/peps/pep-0465/#but-isn-t-matrix-multiplication-a-pretty-niche-requirement
  • 24. SCIENCE IS IMPORTANT FOR PYTHON: MATRIX MULTIPLICATION https://www.python.org/dev/peps/pep-0465/#but-isn-t-matrix-multiplication-a-pretty-niche-requirement import numpy as np from numpy.linalg import inv, solve # Using dot function: S = np.dot((np.dot(H, beta) - r).T, np.dot(inv(np.dot(np.dot(H, V), H.T)), np.dot(H, beta) - r)) # With the @ operator S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r) S = ( H β − r ) T ( H V H T ) − 1 ( H β − r ) PEP 0465: PROPOSED FEB/14. SINCE PY 3.5 (SEP/15) 2013: 7 INTERNATIONAL CONFERENCES ON NUMERICAL PYTHON AT PYCON 2014, ~20% OF THE TUTORIALS INVOLVED THE USE OF MATRICES
  • 25. SCIENCE STACK IS GETTING BETTER EACH DAY https://speakerdeck.com/jakevdp/the-state-of-the-stack-scipy-2015-keynote?slide=8
  • 26. SCIENCE STACK IS ALWAYS EVOLVING… https://speakerdeck.com/jakevdp/the-state-of-the-stack-scipy-2015-keynote?slide=29
  • 28. THE STACK IS STILL GETTING NEW MEMBERS… http://www.tensorflow.org/
  • 29. TAKEAWAY MESSAGE TRY PYTHON. IT WILL BE A ONE WAY TRIP!