2. Why?
• Some times you just want to sling data
• Text is still king; Lowest common denominator
• Machines are pretty honking big now
3. This Presentation
• List of some good collections of cmd-line tools
• Call out and describe a few in particular
• The PyDataTool of my desire
4. Sources
• From author of “Data Science at the Command
Line”: http://jeroenjanssens.com/2013/09/19/seven-
command-line-tools-for-data-science.html (larger
list at http://datascienceatthecommandline.com/)
• HN discussion: https://news.ycombinator.com/
item?id=6412190
• https://github.com/bitly/data_hacks
7. The PyDataTool of My Desire
• Support for csv, json, sql, xls, hdf5; image formats; network
formats (pcap etc.)
• Capability of:
• csvkit, jq, dt, “cols” tool
• unix tools: sed, sort, shuf, split, tr, tee, uniq, wc, head,
tail, bc
• netpbm, imagemagick for images
• Work in streaming mode (netcat, wget, curl)
• First-class support for dask, spark
• Basic plotting via gnuplot, mpl, bokeh
• Built-in SQLite to do in-memory support for queries
8. Continuum Is Hiring!
• Creators of Anaconda, conda, bokeh, blaze, dask,
holoviews, numba, phosphorJS
• Maintainers/contributors to Jupyter, JupyterLab,
Spyder, pandas, conda-forge, …
• 150+ ppl, 80 in Austin
• Venture backed
• Enterprise product, OSS community innovation,
consulting, training
9. Continuum Is Hiring
• Enterprise Product Team:
• Dev Manager (reports to CTO, runs product engineering)
• QA Lead Engineer - creates test plans, coordinates with
product mgmt, dev, and testing team
• Senior Python Developer - enterprise product development;
backend, web tech; full stack preferred
• DevOps and Operations - enterprise product, anaconda.org,
Anaconda build system
• Email careers@continuum.io