Slides used in my presentation at http://thedevelopersconference.com.br in the #ruby track this year in são Paulo,
Talking a little about data science, what are the alternatives to do it in ruby, how to integrate ruby and python and what are the best solutions available.
How to Troubleshoot Apps for the Modern Connected Worker
Data science in ruby, is it possible? is it fast? should we use it?
1. Data Science in
Ruby? Is it possible?
Is it Fast? Should we
use it?
• Rodrigo Urubatan
• rodrigo@urubatan.dev
• http://urubatan.dev
• http://twitter.com/urubatan
2. Anyone here work
with Data Science?
• DataScientist?
• DataEngineer?
• Developers of application that uses Data?
• Statisticians?
3. What exactly
is Data
Science?
The process of extractingmeaning from and interpret
data
The usage of statisticsand machine learning to clean
and manipulate data
The usage of computer software to collect, clean,
manipulate and interpret data
A cool name for the combination of Data Mining and
Business Intelligence (other buzz words thatwere used
for a long time for exactly what we call Data Science
today, but with more expensive tool sets)
5. Can Ruby do
Data Science?
(Long Answer)
• Standing on the shoulders of giants
• pycall — Bridgeinto the Python world.
• rserve-client— Ruby connector
for Rserve, R's binary server.
• Data Manipulation
• kiba — lightweight Ruby ETL (Extract-
Transform-Load) framework.
• jongleur — Workflowmanager using
DAG definitions to execute ETL tasks.
• Distributed Computing
• ruby-spark — Ruby Interface to Apache
Spark 1.x.x.
• Data Structures
• daru — Data Frame and Vector
structures with comprehensive
manipulatingand visualization
methods.
• numo-narray — n-dimensional
Numerical Array for Ruby.
• nmatrix — dense and sparselinear
algebra library for Ruby via SciRuby.
• Datasets
• rdatasets — Data sets available in R
via Rdatasets.
• red-datasets — Growing collectionof publicly
available data sets suchas CIFAR-10,Iris,MNIST
etc
• Statistics
• rb-gsl — Ruby interfacetotheGNU Scientific
Library. [dep: GLS]
• simple_stats — Enumerablepatches for
descriptive statistics.
• enumerable-statistics — fastimplementation of
descriptive statistics for theEnumerablemodule.
• Visualization
• matplotlib — Rubybased wrapper
around matplotlib. [dep: matplotlib]
• mathematical — PNGand MathML renderings for
your equations.
• daru-view — daru-view is interactive plotting
gem for web application (any Ruby web
applicationframework like
Rails/Sinatra/Nanoc/Hanami) &IRubynotebook.
It is a plugin gemfor daru.
• daru-plotly — Plotly basedvisualization for Daru.
8. Other libraries
• ruby-spark — RubyInterface to Apache Spark 1.x.x.
• The project is almost dead,not commits in ages
• kiba — lightweight RubyETL (Extract-Transform-Load)framework.
• Great frameworkto load and transformdata,great performance
• enumerable-statistics — fast implementation ofdescriptivestatistics for the
Enumerable module.
• Very handyforsmall statisticalcalculations in yourapplication
• iruby — Rubykernel for Jupyter.
• The easiest wayto use Ruby in your Jupyter Notebook
• decisiontree - Decision Tree ID3 Algorithmin pure Ruby
• Easydecision tree implementation,and a very fast to train
10. Ruby and Ruby on Rails are
way better to write business
web applications!
11. We can even do
really good Machine
Learning with Ruby
(but that is subject
for another
presentation)
12. And my objective is to
help ruby developers to use
the best tools for each job so
they can solve hard
problems, with less bugs and
have more free time.
13. pycall to the
rescue
pycall lets you use Python libraries from
your ruby code very naturally, as if you
were calling a Ruby library
pycall consists of one ruby binding
library for libpython.so and an Object-
oriented protocol for communication
between Ruby and Python
16. Ok, so what
are the best
work
patterns?
Python is way better than Ruby for
Data Science
Ruby is better for web business
applications
Best patterns for integration are
(IMHO)
• Pointing both applications to the same
database
• Exchanging data through JSON or some similar
serialization
• Calling Python directly through pycall
17. References
• Ruby Conf 2017 – Using Ruby in DataScience by Kenta Murata (@mrkn)
• Big Data analysis in Ruby
• Lets do some (Data) Science in Ruby by Dan Carpenter (@dan_alyst)
• Progress of Ruby/Numo: Numerical Computing for Ruby
• SciRuby
• Ruby::Numo
• Ruby Machine Learning resources
• Ruby Data Science Resources
• PyCall
18. Any questions? Talk to
me!
• @urubatan
• https://urubatan.dev
• rodrigo@urubatan.dev