This document summarizes an introduction to data analysis in Python using Wakari. It discusses why Python is a good language for data analysis, highlighting key Python packages like NumPy, Pandas, Matplotlib and IPython. It also introduces Wakari, a browser-based Python environment for collaborative data analysis and reproducible research. Wakari allows sharing of code, notebooks and data through a web link. The document recommends several talks at the PyData conference on efficient computing, machine learning and interactive plotting.
14. Putting Science back in Comp Sci
⢠Much of the software stack is for systems
programming --- C++, Java, .NET, ObjC, web
- Complex numbers?
- Vectorized primitives?
⢠Software stack for scientists is not as helpful
as it should be
⢠Fortran is still where many scientists end up
23. ⢠âPython is good for data cleanup, R for
statistical modelsâ
âWhich is the better Data Analysis language? R or Python?â Quora.
http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python
24. ⢠âPython is good for data cleanup, R for
statistical modelsâ
⢠âR is quirky and weird but the statisticians love
it and there really isnât any compelling reason
to switchâ
âWhich is the better Data Analysis language? R or Python?â Quora.
http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python
25. ⢠âPython is good for data cleanup, R for
statistical modelsâ
⢠âR is quirky and weird but the statisticians love
it and there really isnât any compelling reason
to switchâ
⢠âYouâre running an MCMC simulation on a
laptop? Perhaps you should write it in
C++/FORTRANâ
âWhich is the better Data Analysis language? R or Python?â Quora.
http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python
26. âYouâre running an MCMC simulation on
a laptop? Perhaps you should write it in
C++/FORTRANâ
Ready for DATA, and then some
62. Talks to see
⢠Jack Vanderplas (Washington)
â Efficient computing with Numpy
⢠29th Floor combo 3pm (Right now, next door!)
⢠Julia Evans (N/A)
â A practical introduction to IPython Notebook &
pandas
⢠Here, 4:45pm.
63. Talks to see
⢠Sarah Guido (Michigan)
â A Beginnerâs Guide to Machine Learning with
scikit-learn
⢠Imram Haque (Counsyl)
â Beyond the dict
⢠Peter Wang (Continuum)
â Bokeh Workshop
How many of you use python on a daily basis for data analysis?In the past year, raise your hand if youâve worked primarily in python.
Domain-specific librariesStatsmodels => statistical computingScikit-image => image manipulationOpenCV => Image processing with interface that can accept NumPy arraysPyTables => HDF5 integrationNumexpr => you can write expressions on your data with cache-aware expressions, itâs very efficient.There are more packages in the python scientific stack than just these. But, itâs good to know numpy so you can get down and dirty with your data and manipulate it if need be.
PACKAGES!Occasional programmers can jump on
PACKAGES!
THIS SHOULD NEVER HAPPEN.At continuum analytics, we never want these words to be uttered again.
Python in 60 secondsNumPyScipyPandasMatplotlibScikit-learn