Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Data visualization tools in
Python
Roman Merkulov
Data Scientist at InData Labs
r_merkulov@indatalabs.com
merkylovecom@mai...
Content
- why dataviz is important
- dataviz libraries in python
- facets tool
- interactive maps
- Apache Superset
data visualization
- EDA & understanding the data
- fix data
- show insights
- models validation
- analytics & reporting
Plots vs descriptive statistics
Anscombe's quartet
*https://en.wikipedia.org/wiki/Anscombe%27s_quartet
Plots vs descriptive statistics
Anscombe's quartet
*https://en.wikipedia.org/wiki/Anscombe%27s_quartet
Property Value Accu...
*http://blog.revolutionanalytics.com/2017/05/the-datasaurus-dozen.html
*https://matplotlib.org/gallery.html
Pros:
- very powerful
- large community, long history
Doesn’t look simple enough...
Cons:
- imperative API
- poor support for interactivity
Just to add a popup...
matplotlib based solutions
*https://speakerdeck.com/jakevdp/pythons-visualization-landscape-pycon-2017
matplotlib based solutions
http://yhat.github.io/ggpy/
http://scitools.org.uk/cartopy/docs/latest/gallery.html
https://sea...
javascript based solutions
*https://speakerdeck.com/jakevdp/pythons-visualization-landscape-pycon-2017
folium
bqplot
*https://plot.ly/python/
Pros:
- interactivity
- lots of visualization
types
- both declarative and
imperative capabilitie...
bokeh
Pros:
- interactivity
- lots of visualization
types
- both declarative and
imperative capabilities
Cons:
- limited v...
Datashader
when you have millions and billions of points
NYC Taxi
US Census 2010
*https://datashader.readthedocs.io/en/lat...
Altair
(based on Vega-Lite)
Fully declarative paradigm
*https://altair-viz.github.io/#
Facets
Overview
Dive
Quick Draw Dataset https://pair-code.github.io/facets/quickdraw.html
*https://pair-code.github.io/fac...
*https://pair-code.github.io/facets/quickdraw.html
https://research.googleblog.com/2017/07/facets-open-source-visualization-tool.html
Folium
*https://github.com/python-visualization/folium
https://indatalabs.com/discover-hong-kong-through-the-lense-of-instagram/
https://indatalabs.com/brands-on-london-instagra...
Apache Superset
*https://superset.incubator.apache.org/
Apache Superset
Whatever!
if SQLAlchemy dialect is available for your DB
*https://github.com/apache/incubator-superset
Apache Superset
Who uses:
Airbnb Amino Brilliant.org Clark.de Digit Game Studios Douban
Endress+Hauser FBK - ICT center Fa...
Thanks for your attention!
some examples shown are available here
https://github.com/merkylove/data_visualisations_for_dat...
Роман Меркулов. In data labs. Прикладные инструменты визуализации данных в python.
Nächste SlideShare
Wird geladen in …5
×

Роман Меркулов. In data labs. Прикладные инструменты визуализации данных в python.

393 Aufrufe

Veröffentlicht am

Любой человек, ежедневно работающий с данными, рано или поздно сталкивается с необходимостью их визуализации. В своём рассказе я приведу примеры использования простых, но в то же время мощных инструментов визуализации данных, которые могут сэкономить время data scientist'у на всех фазах проекта. А в дополнение к этому рассмотрим современный open-source business intelligence инструмент Apache Superset.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Роман Меркулов. In data labs. Прикладные инструменты визуализации данных в python.

  1. 1. Data visualization tools in Python Roman Merkulov Data Scientist at InData Labs r_merkulov@indatalabs.com merkylovecom@mail.ru
  2. 2. Content - why dataviz is important - dataviz libraries in python - facets tool - interactive maps - Apache Superset
  3. 3. data visualization - EDA & understanding the data - fix data - show insights - models validation - analytics & reporting
  4. 4. Plots vs descriptive statistics Anscombe's quartet *https://en.wikipedia.org/wiki/Anscombe%27s_quartet
  5. 5. Plots vs descriptive statistics Anscombe's quartet *https://en.wikipedia.org/wiki/Anscombe%27s_quartet Property Value Accuracy Mean of X 9 exact Sample variance of X 11 exact Mean of y 7.5 2 decimal places Sample variance of y 4.125 +- 0.003 Correlation coef. 0.816 3 decimal places Linear regression y = 3.00 + 0.5x 2 decimal places Determ. coef. 0.67 2 decimal places
  6. 6. *http://blog.revolutionanalytics.com/2017/05/the-datasaurus-dozen.html
  7. 7. *https://matplotlib.org/gallery.html
  8. 8. Pros: - very powerful - large community, long history
  9. 9. Doesn’t look simple enough...
  10. 10. Cons: - imperative API - poor support for interactivity Just to add a popup...
  11. 11. matplotlib based solutions *https://speakerdeck.com/jakevdp/pythons-visualization-landscape-pycon-2017
  12. 12. matplotlib based solutions http://yhat.github.io/ggpy/ http://scitools.org.uk/cartopy/docs/latest/gallery.html https://seaborn.pydata.org/examples/index.html https://networkx.github.io/documentation/networkx-1.9.1/examples/drawing/random_geometric_graph.html
  13. 13. javascript based solutions *https://speakerdeck.com/jakevdp/pythons-visualization-landscape-pycon-2017 folium bqplot
  14. 14. *https://plot.ly/python/ Pros: - interactivity - lots of visualization types - both declarative and imperative capabilities Cons: - paid features
  15. 15. bokeh Pros: - interactivity - lots of visualization types - both declarative and imperative capabilities Cons: - limited vector graphic export
  16. 16. Datashader when you have millions and billions of points NYC Taxi US Census 2010 *https://datashader.readthedocs.io/en/latest/
  17. 17. Altair (based on Vega-Lite) Fully declarative paradigm *https://altair-viz.github.io/#
  18. 18. Facets Overview Dive Quick Draw Dataset https://pair-code.github.io/facets/quickdraw.html *https://pair-code.github.io/facets/ https://github.com/PAIR-code/facets
  19. 19. *https://pair-code.github.io/facets/quickdraw.html
  20. 20. https://research.googleblog.com/2017/07/facets-open-source-visualization-tool.html
  21. 21. Folium *https://github.com/python-visualization/folium
  22. 22. https://indatalabs.com/discover-hong-kong-through-the-lense-of-instagram/ https://indatalabs.com/brands-on-london-instagram Visualization of the week according to InsideBigData https://insidebigdata.com/2016/02/03/visualization-of-the-week-hong-kong-social-media-data-map/
  23. 23. Apache Superset *https://superset.incubator.apache.org/
  24. 24. Apache Superset Whatever! if SQLAlchemy dialect is available for your DB *https://github.com/apache/incubator-superset
  25. 25. Apache Superset Who uses: Airbnb Amino Brilliant.org Clark.de Digit Game Studios Douban Endress+Hauser FBK - ICT center Faasos GfK Data Lab InData Labs Maieutical Labs Qunar Shopkick Tails.com Tobii Tooploox Udemy Yahoo! Zalando Panoramix Caravel Superset *https://github.com/apache/incubator-superset Article on Superset benefits and limitations https://indatalabs.com/blog/data-strategy/open- source-data-visualization-tool-superset Roaring Elephant podcast Episode 41 https://roaringelephant.org/2017/04/25/episode-41- news-news-and-some-more-news/
  26. 26. Thanks for your attention! some examples shown are available here https://github.com/merkylove/data_visualisations_for_datathon_2017 https://www.slideshare.net/RomanMerkulov/data-visualization-tools-in-python/1 Contacts: r_merkulov@indatalabs.com merkylovecom@mail.ru https://www.linkedin.com/in/roman-merkulov-a61804a4/

×